Stage Variable Size Limit

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
jjrbikes
Premium Member
Premium Member
Posts: 117
Joined: Tue Nov 25, 2003 11:09 am
Location: Minneapolis

Stage Variable Size Limit

Post by jjrbikes »

Quick and Basic question: what is the maximum number of characters a stage variable can hold? Is there a limit?

Thanks!

Jennifer
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

According to page 2-1 of the Universe Basic manual, character strings are only limited by the amount of available memory.

http://www-306.ibm.com/software/data/u2 ... /Basic.pdf
jjrbikes
Premium Member
Premium Member
Posts: 117
Joined: Tue Nov 25, 2003 11:09 am
Location: Minneapolis

Post by jjrbikes »

Thanks Chuck - but I get that part. My question is not about how long/big a character string can be, but is there some internal limitation to the length of that character string within the actual Stage Variable (within a transformer). I'm not sure how else to state this....ummm, kind of like if you have a character string that is 3GB in size, you can (should) only put 2GB or less of that string into a hash file. How much can I put in a Stage Variable?

Thanks Again!

Jennifer
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

If you are trying to key in a constant character string into the stage variable, there are compiler limitations relating to the length of a Basic statement.

If you are concatenating data and building a long string row by row, the documentation implies we will have no problem. Our limits would be physical or those enforced by the operating system (e.g. unix ulimit).

Still, once we build this big string, what do we do with it?
jjrbikes
Premium Member
Premium Member
Posts: 117
Joined: Tue Nov 25, 2003 11:09 am
Location: Minneapolis

Post by jjrbikes »

Ahhh - the plot thickens! Here's what's going on: I am reading in a file and want to know if a certain field changes from record to record. To track that I set up the following stage variables:

NewSysId = If Input.SysId <> SavedSysId Then @True Else @False
SavedSysId = Input.sys_id

This was not generating the expected results and a closer look suggested only part of the Input.SysId was being stored in the StageVariable. So, I changed the variables to the following:

NewSysIdStr1 = If string(Input.SysId, 1, 11) <> SavedSysIdStr1 Then @True Else @False
NewSysIdStr2 = If string(Input.SysId, 12, 11 ) <> SavedSysIdStr2 Then @True Else @False
SavedSysIdStr1 = string(Input.sys_id, 1, 11)
SavedSysIdStr2 = string(Input.sys_id, 12, 11)

This set-up works PERFECTLY! But WHY???

The field being checked/saved is defined as numeric(22) with "physical" contents being 19 characters long.

Any ideas???

Thanks again

Jennifer
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

Tell me about the string() function. I guess it is returning a substring. Anything else?

Can you give me an example of unexpected results?

I know I have expected string comparison when DataStage / Universe performed its default numeric comparison. So, "00" matched "0000".

Sometimes, I have resorted to adding zero to values before compating then. :?:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Keep in mind the Compare() function. You can force string comparison.

There is no practical limit to the length of a character string value in DataStage, and that includes as the value of a stage variable.

Finally, you don't need If expression Then @TRUE Else @FALSE; any expression that uses a comparison operator returns true/false so the expression alone will suffice. (Indeed, since there are no data types, in a Boolean context any expression will return true/false; 0 or "" is false, any other non-null value is true).

Code: Select all

svNewSysID     Compare(Input.SysID, svSavedSysId, "L") <> 0 And @INROWNUM <> 1
svSavedSysID   Input.SysID
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jjrbikes
Premium Member
Premium Member
Posts: 117
Joined: Tue Nov 25, 2003 11:09 am
Location: Minneapolis

Post by jjrbikes »

Yep string(mybigdog, 3, 6) will return bigdog - the substring beginning at position 3 for a length of 6.

Unexpected results: the stage variable was not storing the entire input value each time. It seemed to work fine for maybe the first 1/3 to 1/2 of the records processed but then would drop maybe the last digit or two and therefore no more records "matched". At first I thought about the memory thing - but whether this data comes into one variable or two - it's going to use the same memory. So my next thought was maybe there is some limit to some temp buffer used by each Stage Variable??? Now, if it was somehow doing something funky where there were leading or trailing zeroes I'm not sure, but why would it not do that "funkiness" just because I split it into two parts?

Confused yet???

Thanks

Jennifer
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

Constantly confused.

One idea is that the string function may be forcing a string comparison, and is working because the problem was related to numeric comparisons.

Another could be that these large decimal numbers are being converted to floating point values and we are having some precision issues.

As Ray suggested, I would try the Compare() function on the entire string and see what we get.

Interesting problem! :wink:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can you create an extra output link to a text file and capture the values of Input.SysID and the values of the stage variables?

If what you say is true it's a bug, with potentially catastrophic implications.

There is no technical reason (apart from running out of memory) for losing information from the ends of strings.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dakjr64
Participant
Posts: 2
Joined: Mon Nov 22, 2004 2:10 pm

Post by dakjr64 »

Jennifer was nice enough to post this for me...but now that I am here I can type for myself.

After noticing the problem with our "real" data, I did the following test to confirm.

Created a seq file with 1 column of 19 digits as a Numeric 22. ex:
1111111111111111111
1111111111111111111
1111111111111111112
1111111111111111112
1111111111111111113
1111111111111111113
1111111111111111114

I have a boolean stage variable that is set based on whether the input value equals the saved value and i have a stage variable to hold the saved value. I then have a constraint referencing the boolean to only output when changed.

I have an output file with 2 columns. A direct mapping of the input value and a mapping for the saved value stage variable. All 19 digits show up in the file for both values. When the value change is in pos 19, only row one shows up (no break determined). If the value change is in pos 18 the results are intermittant and if the value change is in pos 17 it appears to determine the change in value ok. Can anyone else prove this to be true in their instance of DataStage 7.0.1?
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

This is not a DS issue, rather it is a configuration issue within UV. Please ensure that the configuration parameter EXACTNUMERIC is equal to or larger than 22. The default is 15 and the max is 57. You change this value in the uvconfig file and then you must regen the UV shell. (stop the services, regen and then restart).

You could try all day long to get the numbers to compare (as 1 = 1) and it will not happen until you increase this value.

The original logic you posted will work once this change is made.

NOTE - I tested this (using your input file) in both a DS job and a BASIC program created in a TCL session with EXACTNUMERIC set to 15 and 57.

Regards,
dakjr64
Participant
Posts: 2
Joined: Mon Nov 22, 2004 2:10 pm

Post by dakjr64 »

Thanks Michael! That was it. Had the system config people change it from 15 to 20 and works just fine!

DK
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

Just had an issue similar to this and used this thread to find the answer.

Using standard technique described elsewhere in this forum to compare keys values between rows, i.e.

Stage variables
ThisRowValue var1 : var2 : var3
IsDuplicate if ThisRowValue = PrevRowValue then 1 else 0
PrevRowValue = ThisRowValue

Var1, Var2 and Var3 are declared as Varchar, but hold numeric values.

All rows were coming back as duplicates.

After spending 3.5 (@^#%@! :evil: ) hours debugging, tried using compare(ThisRowValue, PrevRowValue, 'R'). Now it works.

Not the difference: incoming and outgoing ports were declared as varchar, but they held numerics only. Seemed to pick up the first 8-10 characters of ThisRowValue and PrevRowValue only.

Rob W.
Rob Wierdsma
Toronto, Canada
bartonbishop.com
Post Reply