Page 1 of 1

DataStage internal/implicit conversion of string to number

Posted: Wed Jul 24, 2013 9:46 am
by sportclimber
DataStage will perform an implicit conversion from a string to a number and vice-versa.

We have database fields that are defined as varchar and may contain only numbers (for example, a US zip code). In these cases, it's important that the data be treated as varchar and not as numbers when doing field value comparisons.

For example, when we source the data in a DS Job, and then compare the zipcode column to a column in a hashed file lookup (both typed as Varchar), we want to make sure that the value "01234" is evaluated as different from "1234".

Unfortunately, due to DataStage's implicit conversion, our default result of this kind of comparison yields "01324" = "1234". This is troublesome.

While there are ways to force this issue on a specific comparison (eg, we could concatenate a character to each side of the comparison), that only works if we know in advance that a specific varchar field may contain only digits.

Is there any kind of internal DS setting that can force Varchar field types to be treated as Varchars when doing comparisons in Server Jobs?

Posted: Wed Jul 24, 2013 10:09 am
by ArndW
There is no internal setting to change this behaviour.

DataStage won't perform any data type conversions that it doesn't have to do, i.e. if both operators are strings then a string comparison will be performed.

In your case you are doing a conversion from string "01234" to a number "1234" and then back to a string "1234" somewhere and this is where you need to correct the processing. Have you determined where this is occurring?

Re: DataStage internal/implicit conversion of string to numb

Posted: Wed Jul 24, 2013 10:10 am
by chulett
sportclimber wrote:DataStage will perform an implicit conversion from a string to a number and vice-versa.
I don't believe this to be true in a Server job, that's more a feature of Parallel when moving data from one datatype to another. From what I recall, Server will happily move characters into a number field without conversion and do math on them, it's just that the results are unpredictable and cause issues once you actually move the data to a target that cares, like a database. And we usually see the opposite situation where people expect '1234' and '01234' to compare as equal, which they won't do as strings.

Bottom line is I'm surprised you are having this issue, seeing this behaviour. I'm wondering what else might be going on. Just to clarify things, you're saying the key column in the hashed file is a varchar and the incoming data used for the key lookup in the reference link is a varchar with the values you've shown and the lookup succeeds?

Posted: Wed Jul 24, 2013 1:10 pm
by sportclimber
Thanks to both of you for your input.

I have proved out that DS is definitely doing an implicit conversion of the varchar fields when it is comparing two values that only contain numbers (because it is saying zipcode 01234 = 1234, which would be true if these were 2 number fields, but NOT for varchar fields).

Unfortunately, we need to prevent this issue from occurring in any of our jobs. And since we don't always know which fields that have been defined as varchar *may* contain only numbers on occasion, we're in a bit of a bind (since we have thousands of jobs and many millions of data rows from nearly 100 data sources).

In short, I'm looking for a way to globally fix this issue, without touching every job.

Thanks in advance!

Posted: Wed Jul 24, 2013 4:38 pm
by ray.wurlod
There is no global solution. DataStage server jobs inherently do not have data types. Context determines how comparisons are done (numeric or string, right- or left-justified). If both operands are numeric, a numeric comparison is performed.

There IS a string comparison function (Compare(string1,string2,justification) - for example Compare("01234", "1234", "L") will return 1), but to use that would require touching every job where you need to make the change (or going back in a time machine and using the function from the outset!).

Posted: Wed Jul 24, 2013 10:36 pm
by chulett
ray.wurlod wrote:There is no global solution. DataStage server jobs inherently do not have data types. Context determines how comparisons are done (numeric or string, right- or left-justified). If both operands are numeric, a numeric comparison is performed.
Interesting... I don't recall running afoul of that little nuance but it has been quite some time. Seems to explain the issue being seen - we had some offline conversations when their account was inactivated for some reason until Dennis fixed it. For example:

Code: Select all

svDiffOrig = NullToEmpty(lkpProviderAddress.POSTAL_CODE) <> NullToEmpty(getProvAddData.POSTAL_CODE)

svDiffProof = 'A':NullToEmpty(lkpProviderAddress.POSTAL_CODE) <> 'A':NullToEmpty(getProvAddData.POSTAL_CODE)

Results output to file:

svDiffOrig = False

svDiffProof = True

lkpProviderAddress.POSTAL_CODE = '02189'

getProvAddData.POSTAL_CODE = '2189'
So using the string comparison function Compare() seems to be the only solution?

Posted: Wed Jul 24, 2013 10:46 pm
by ray.wurlod
Well, no, prepending a non-numeric to each string will force a string comparison to occur, as per your example.
But I think the Compare() function is better self-documenting.

Posted: Thu Jul 25, 2013 12:28 pm
by sportclimber
Yay, thanks to Dennis I now have my account reactivated so I can finally read all of the content posted in your replies.

Unfortunately it does appear that we'll have to touch every job where we do a comparison between varchar fields. I agree, the compare is much better self-documenting. The append of a dummy character was the only other way I could think of.

Now for the fun task. Evaluating all of our thousands of jobs. Boy, I'd love that time machine right about now!

Thanks to you both for your insights and information.