DataStage internal/implicit conversion of string to number

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sportclimber
Charter Member
Charter Member
Posts: 15
Joined: Mon Jan 31, 2005 1:53 pm
Location: CO, USA

DataStage internal/implicit conversion of string to number

Post by sportclimber »

DataStage will perform an implicit conversion from a string to a number and vice-versa.

We have database fields that are defined as varchar and may contain only numbers (for example, a US zip code). In these cases, it's important that the data be treated as varchar and not as numbers when doing field value comparisons.

For example, when we source the data in a DS Job, and then compare the zipcode column to a column in a hashed file lookup (both typed as Varchar), we want to make sure that the value "01234" is evaluated as different from "1234".

Unfortunately, due to DataStage's implicit conversion, our default result of this kind of comparison yields "01324" = "1234". This is troublesome.

While there are ways to force this issue on a specific comparison (eg, we could concatenate a character to each side of the comparison), that only works if we know in advance that a specific varchar field may contain only digits.

Is there any kind of internal DS setting that can force Varchar field types to be treated as Varchars when doing comparisons in Server Jobs?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There is no internal setting to change this behaviour.

DataStage won't perform any data type conversions that it doesn't have to do, i.e. if both operators are strings then a string comparison will be performed.

In your case you are doing a conversion from string "01234" to a number "1234" and then back to a string "1234" somewhere and this is where you need to correct the processing. Have you determined where this is occurring?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: DataStage internal/implicit conversion of string to numb

Post by chulett »

sportclimber wrote:DataStage will perform an implicit conversion from a string to a number and vice-versa.
I don't believe this to be true in a Server job, that's more a feature of Parallel when moving data from one datatype to another. From what I recall, Server will happily move characters into a number field without conversion and do math on them, it's just that the results are unpredictable and cause issues once you actually move the data to a target that cares, like a database. And we usually see the opposite situation where people expect '1234' and '01234' to compare as equal, which they won't do as strings.

Bottom line is I'm surprised you are having this issue, seeing this behaviour. I'm wondering what else might be going on. Just to clarify things, you're saying the key column in the hashed file is a varchar and the incoming data used for the key lookup in the reference link is a varchar with the values you've shown and the lookup succeeds?
Last edited by chulett on Wed Jul 24, 2013 1:22 pm, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sportclimber
Charter Member
Charter Member
Posts: 15
Joined: Mon Jan 31, 2005 1:53 pm
Location: CO, USA

Post by sportclimber »

Thanks to both of you for your input.

I have proved out that DS is definitely doing an implicit conversion of the varchar fields when it is comparing two values that only contain numbers (because it is saying zipcode 01234 = 1234, which would be true if these were 2 number fields, but NOT for varchar fields).

Unfortunately, we need to prevent this issue from occurring in any of our jobs. And since we don't always know which fields that have been defined as varchar *may* contain only numbers on occasion, we're in a bit of a bind (since we have thousands of jobs and many millions of data rows from nearly 100 data sources).

In short, I'm looking for a way to globally fix this issue, without touching every job.

Thanks in advance!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is no global solution. DataStage server jobs inherently do not have data types. Context determines how comparisons are done (numeric or string, right- or left-justified). If both operands are numeric, a numeric comparison is performed.

There IS a string comparison function (Compare(string1,string2,justification) - for example Compare("01234", "1234", "L") will return 1), but to use that would require touching every job where you need to make the change (or going back in a time machine and using the function from the outset!).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:There is no global solution. DataStage server jobs inherently do not have data types. Context determines how comparisons are done (numeric or string, right- or left-justified). If both operands are numeric, a numeric comparison is performed.
Interesting... I don't recall running afoul of that little nuance but it has been quite some time. Seems to explain the issue being seen - we had some offline conversations when their account was inactivated for some reason until Dennis fixed it. For example:

Code: Select all

svDiffOrig = NullToEmpty(lkpProviderAddress.POSTAL_CODE) <> NullToEmpty(getProvAddData.POSTAL_CODE)

svDiffProof = 'A':NullToEmpty(lkpProviderAddress.POSTAL_CODE) <> 'A':NullToEmpty(getProvAddData.POSTAL_CODE)

Results output to file:

svDiffOrig = False

svDiffProof = True

lkpProviderAddress.POSTAL_CODE = '02189'

getProvAddData.POSTAL_CODE = '2189'
So using the string comparison function Compare() seems to be the only solution?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Well, no, prepending a non-numeric to each string will force a string comparison to occur, as per your example.
But I think the Compare() function is better self-documenting.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sportclimber
Charter Member
Charter Member
Posts: 15
Joined: Mon Jan 31, 2005 1:53 pm
Location: CO, USA

Post by sportclimber »

Yay, thanks to Dennis I now have my account reactivated so I can finally read all of the content posted in your replies.

Unfortunately it does appear that we'll have to touch every job where we do a comparison between varchar fields. I agree, the compare is much better self-documenting. The append of a dummy character was the only other way I could think of.

Now for the fun task. Evaluating all of our thousands of jobs. Boy, I'd love that time machine right about now!

Thanks to you both for your insights and information.
Post Reply