When not to use compare(x,y)

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

When not to use compare(x,y)

Post by PhilHibbs »

I am coming to the conclusion that using =, <>, < or > is mostly harmful. There are occasions when it is OK, such as when one side of the comparison is a literal string containing non-numeric values (e.g. flag="Y"), or other circumstances when you know for certain that the values will never be entirely numeric, but in the majority of cases all comparisons should be done with the compare() function. Therefore the default comparison mechanism should be compare() and any use of the operators must be justified with clear and unimpeachable logic.

I am now faced with a large codebase containing many hundreds of uses of the comparison operators, the majority of which need to be replaced with the compare() function in order to avoid "001" and "01" or "1234567890123456" and "1234567890123457" being treated as equivalent, although I can fix the latter by changing EXACTNUMERIC in uvconfig.

Update: The title is intended to be a question - is it ever safe to use the operators, or should I always use compare()?

Phil Hibbs.
Last edited by PhilHibbs on Wed Feb 22, 2006 5:25 am, edited 1 time in total.
Phil Hibbs | Capgemini
Technical Consultant
parag.s.27
Participant
Posts: 221
Joined: Fri Feb 17, 2006 3:38 am
Location: India
Contact:

Re: When not to use compare(x,y)

Post by parag.s.27 »

PhilHibbs wrote:I am coming to the conclusion that using =, <>, < or > is mostly harmful. There are occasions when it is OK, such as when one side of the comparison is a literal string containing non-numeric values (e.g. flag="Y"), or other circumstances when you know for certain that the values will never be entirely numeric, but in the majority of cases all comparisons should be done with the compare() function. Therefore the default comparison mechanism should be compare() and any use of the operators must be justified with clear and unimpeachable logic.

I am now faced with a large codebase containing many hundreds of uses of the comparison operators, the majority of which need to be replaced with the compare() function in order to avoid "001" and "01" or "1234567890123456" and "1234567890123457" being treated as equivalent, although I can fix the latter by changing EXACTNUMERIC in uvconfig.

Phil Hibbs.



Hi,

Can't we use the word "Matches" for this purpose

i.e. If(Expression A Matches Expression) then .......
Thanks & Regards
Parag Saundattikar
Certified for Infosphere DataStage v8.0
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Phil,

I think you need to look at your statement and rephrase it so that it doesn't seem as if a compare in datastage won't work.

If you have a numeric comparison of digits longer than the EXACTNUMERIC setting there might be some internal loss of precision.

When does a normal comparison of "001" and "01" using the operators you mentioned not work as you expect? If either the left or right-hand side of a condition are internally stored as a string then a string comparison is performed (in which case "001" and "01" are unlike) on equality and you can always ensure string comparisons by prepending a 1-character string to any variables in a condition.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

ArndW wrote:If you have a numeric comparison of digits longer than the EXACTNUMERIC setting there might be some internal loss of precision.
When does a normal comparison of "001" and "01" using the operators you mentioned not work as you expect? If either the left or right-hand side of a condition are internally stored as a string then a string comparison is performed (in which case "001" and "01" are unlike) on equality and you can always ensure string comparisons by prepending a 1-character string to any variables in a condition.
The problem is that I don't know what the rules are that determine whether a string or a numeric comparison is performed are. I am faced with a large amount of code that was written in ignorance of this issue. Going forward, I have the choice between two guidelines:

1. If one or other value will always contain non-numerics and is always less than 16 characters, or is going to be represented internally as a string and therefore compared as a string, then it is safe to use = and <>, otherwise use compare(). Alternatively, you could prepend "X" to each value in order to force string comparison and then use = or <>.

2. Just use compare() all the time.

I am leaning towards 2 for reasons that should be obvious. Looking back at the code that we already have, the choice is similar - identify all uses of =, <>, < or > and replace them with compare(). It's a huge task, which is what makes me wish I had made the decision to use 2 a lot earlier. It could be made a lot easier by taking last night's full .dsx export, changing the code in the file, and importing it all. I don't have a text editor that can handle files with lines that are 349695 characters long, but I have a pair of scripts that break the long lines and then put them back together again after it's been edited.

Phil Hibbs.
Phil Hibbs | Capgemini
Technical Consultant
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Phil,

I've been using DataStage since version 0 of the product, and have been using the underlying UniVerse engine (actually doing product support for the manufacturers & consulting) since 1983. In all these years I have only come up with one or two cases where I was forced to use "compare()". So I am still quite skeptical about your assertion that a normal condition doesn't work and that you need to change huge volumes of program code to replace normal condition comparisons.

You still haven't stated exactly where you've had this problem. Could you (just for discussion's sake) give me an example of a normal condition that doesn't work correctly so that we have a basis to work with?
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

ArndW wrote:...In all these years I have only come up with one or two cases where I was forced to use "compare()"...

You still haven't stated exactly where you've had this problem. Could you (just for discussion's sake) give me an example of a normal condition that doesn't work correctly so that we have a basis to work with?
I could well be over-stating the problem. To be honest, we have only had this problem in practice with strings that are longer than 15 characters, and we can fix that with the uvconfig parameter. I was sure that I had encountered the "001"="01" situation, and thought that this would always return true until it was pointed out on this thread that this is not always the case. I am prepared to accept that I may have over-reacted.
Phil Hibbs | Capgemini
Technical Consultant
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Over-reaction is not a problem - we all do that occasionally.

But even with strings that contain only numeric characters the normal conditional operators will only cause an incorrect result if a conversion to numeric is forced; and that would only happen if the other operator is already internally represented as a number. I think even a simple concatenation of an empty string ('') would force an implicit string conversion and even that condition will be correct. But as you've stated, the length of correctly converted numbers can be changed in the configuration.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

OK I now have an example where it goes badly wrong. I have an input file with the following key values:

Code: Select all

"01   "
"01"
"1"
"1    "
When I feed them through a transformer with key change logic:

Code: Select all

svKey = DSLink3.key
svKeyChanged = svKey <> svOldKey
svOldKey = svKey
and a constraint of "svKeyChanged" on the output, I only get one row on the output. If I change it to "compare(svKey, svOldKey)<>0" I get all 4 rows.

In answer to your request that I clarify my statement, it's more of a question "when is it ever safe to use comparison operators instead of compare()".
Phil Hibbs | Capgemini
Technical Consultant
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Phil,

odd, I don't have a test env right now, but what happens if you do a

Code: Select all

svKey = DSLink3.key:'' 
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

Still just the one row of output.
Phil Hibbs | Capgemini
Technical Consultant
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

... and DSLink3.key is defined as what datatype in your job?
-craig

"You can never have too many knives" -- Logan Nine Fingers
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

chulett wrote:... and DSLink3.key is defined as what datatype in your job?
SQL type Char, Length 10
Phil Hibbs | Capgemini
Technical Consultant
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I am going to have to find some time to look at the actual code generated when you do this; since it seems as if DataStage is doing an implicit conversion to numeric for both sides of the conditional equation; which surprises me.
Post Reply