When not to use compare(x,y)
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
When not to use compare(x,y)
I am coming to the conclusion that using =, <>, < or > is mostly harmful. There are occasions when it is OK, such as when one side of the comparison is a literal string containing non-numeric values (e.g. flag="Y"), or other circumstances when you know for certain that the values will never be entirely numeric, but in the majority of cases all comparisons should be done with the compare() function. Therefore the default comparison mechanism should be compare() and any use of the operators must be justified with clear and unimpeachable logic.
I am now faced with a large codebase containing many hundreds of uses of the comparison operators, the majority of which need to be replaced with the compare() function in order to avoid "001" and "01" or "1234567890123456" and "1234567890123457" being treated as equivalent, although I can fix the latter by changing EXACTNUMERIC in uvconfig.
Update: The title is intended to be a question - is it ever safe to use the operators, or should I always use compare()?
Phil Hibbs.
I am now faced with a large codebase containing many hundreds of uses of the comparison operators, the majority of which need to be replaced with the compare() function in order to avoid "001" and "01" or "1234567890123456" and "1234567890123457" being treated as equivalent, although I can fix the latter by changing EXACTNUMERIC in uvconfig.
Update: The title is intended to be a question - is it ever safe to use the operators, or should I always use compare()?
Phil Hibbs.
Last edited by PhilHibbs on Wed Feb 22, 2006 5:25 am, edited 1 time in total.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
-
- Participant
- Posts: 221
- Joined: Fri Feb 17, 2006 3:38 am
- Location: India
- Contact:
Re: When not to use compare(x,y)
PhilHibbs wrote:I am coming to the conclusion that using =, <>, < or > is mostly harmful. There are occasions when it is OK, such as when one side of the comparison is a literal string containing non-numeric values (e.g. flag="Y"), or other circumstances when you know for certain that the values will never be entirely numeric, but in the majority of cases all comparisons should be done with the compare() function. Therefore the default comparison mechanism should be compare() and any use of the operators must be justified with clear and unimpeachable logic.
I am now faced with a large codebase containing many hundreds of uses of the comparison operators, the majority of which need to be replaced with the compare() function in order to avoid "001" and "01" or "1234567890123456" and "1234567890123457" being treated as equivalent, although I can fix the latter by changing EXACTNUMERIC in uvconfig.
Phil Hibbs.
Hi,
Can't we use the word "Matches" for this purpose
i.e. If(Expression A Matches Expression) then .......
Thanks & Regards
Parag Saundattikar
Certified for Infosphere DataStage v8.0
Parag Saundattikar
Certified for Infosphere DataStage v8.0
Phil,
I think you need to look at your statement and rephrase it so that it doesn't seem as if a compare in datastage won't work.
If you have a numeric comparison of digits longer than the EXACTNUMERIC setting there might be some internal loss of precision.
When does a normal comparison of "001" and "01" using the operators you mentioned not work as you expect? If either the left or right-hand side of a condition are internally stored as a string then a string comparison is performed (in which case "001" and "01" are unlike) on equality and you can always ensure string comparisons by prepending a 1-character string to any variables in a condition.
I think you need to look at your statement and rephrase it so that it doesn't seem as if a compare in datastage won't work.
If you have a numeric comparison of digits longer than the EXACTNUMERIC setting there might be some internal loss of precision.
When does a normal comparison of "001" and "01" using the operators you mentioned not work as you expect? If either the left or right-hand side of a condition are internally stored as a string then a string comparison is performed (in which case "001" and "01" are unlike) on equality and you can always ensure string comparisons by prepending a 1-character string to any variables in a condition.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
ArndW wrote:If you have a numeric comparison of digits longer than the EXACTNUMERIC setting there might be some internal loss of precision.
The problem is that I don't know what the rules are that determine whether a string or a numeric comparison is performed are. I am faced with a large amount of code that was written in ignorance of this issue. Going forward, I have the choice between two guidelines:When does a normal comparison of "001" and "01" using the operators you mentioned not work as you expect? If either the left or right-hand side of a condition are internally stored as a string then a string comparison is performed (in which case "001" and "01" are unlike) on equality and you can always ensure string comparisons by prepending a 1-character string to any variables in a condition.
1. If one or other value will always contain non-numerics and is always less than 16 characters, or is going to be represented internally as a string and therefore compared as a string, then it is safe to use = and <>, otherwise use compare(). Alternatively, you could prepend "X" to each value in order to force string comparison and then use = or <>.
2. Just use compare() all the time.
I am leaning towards 2 for reasons that should be obvious. Looking back at the code that we already have, the choice is similar - identify all uses of =, <>, < or > and replace them with compare(). It's a huge task, which is what makes me wish I had made the decision to use 2 a lot earlier. It could be made a lot easier by taking last night's full .dsx export, changing the code in the file, and importing it all. I don't have a text editor that can handle files with lines that are 349695 characters long, but I have a pair of scripts that break the long lines and then put them back together again after it's been edited.
Phil Hibbs.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
Phil,
I've been using DataStage since version 0 of the product, and have been using the underlying UniVerse engine (actually doing product support for the manufacturers & consulting) since 1983. In all these years I have only come up with one or two cases where I was forced to use "compare()". So I am still quite skeptical about your assertion that a normal condition doesn't work and that you need to change huge volumes of program code to replace normal condition comparisons.
You still haven't stated exactly where you've had this problem. Could you (just for discussion's sake) give me an example of a normal condition that doesn't work correctly so that we have a basis to work with?
I've been using DataStage since version 0 of the product, and have been using the underlying UniVerse engine (actually doing product support for the manufacturers & consulting) since 1983. In all these years I have only come up with one or two cases where I was forced to use "compare()". So I am still quite skeptical about your assertion that a normal condition doesn't work and that you need to change huge volumes of program code to replace normal condition comparisons.
You still haven't stated exactly where you've had this problem. Could you (just for discussion's sake) give me an example of a normal condition that doesn't work correctly so that we have a basis to work with?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
I could well be over-stating the problem. To be honest, we have only had this problem in practice with strings that are longer than 15 characters, and we can fix that with the uvconfig parameter. I was sure that I had encountered the "001"="01" situation, and thought that this would always return true until it was pointed out on this thread that this is not always the case. I am prepared to accept that I may have over-reacted.ArndW wrote:...In all these years I have only come up with one or two cases where I was forced to use "compare()"...
You still haven't stated exactly where you've had this problem. Could you (just for discussion's sake) give me an example of a normal condition that doesn't work correctly so that we have a basis to work with?
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
Over-reaction is not a problem - we all do that occasionally.
But even with strings that contain only numeric characters the normal conditional operators will only cause an incorrect result if a conversion to numeric is forced; and that would only happen if the other operator is already internally represented as a number. I think even a simple concatenation of an empty string ('') would force an implicit string conversion and even that condition will be correct. But as you've stated, the length of correctly converted numbers can be changed in the configuration.
But even with strings that contain only numeric characters the normal conditional operators will only cause an incorrect result if a conversion to numeric is forced; and that would only happen if the other operator is already internally represented as a number. I think even a simple concatenation of an empty string ('') would force an implicit string conversion and even that condition will be correct. But as you've stated, the length of correctly converted numbers can be changed in the configuration.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
OK I now have an example where it goes badly wrong. I have an input file with the following key values:
When I feed them through a transformer with key change logic:
and a constraint of "svKeyChanged" on the output, I only get one row on the output. If I change it to "compare(svKey, svOldKey)<>0" I get all 4 rows.
In answer to your request that I clarify my statement, it's more of a question "when is it ever safe to use comparison operators instead of compare()".
Code: Select all
"01 "
"01"
"1"
"1 "
Code: Select all
svKey = DSLink3.key
svKeyChanged = svKey <> svOldKey
svOldKey = svKey
In answer to your request that I clarify my statement, it's more of a question "when is it ever safe to use comparison operators instead of compare()".
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
Phil,
odd, I don't have a test env right now, but what happens if you do a
odd, I don't have a test env right now, but what happens if you do a
Code: Select all
svKey = DSLink3.key:''
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I am going to have to find some time to look at the actual code generated when you do this; since it seems as if DataStage is doing an implicit conversion to numeric for both sides of the conditional equation; which surprises me.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>