Page 1 of 1

Stage Variable Size Limit

Posted: Thu Nov 18, 2004 10:34 am
by jjrbikes
Quick and Basic question: what is the maximum number of characters a stage variable can hold? Is there a limit?

Thanks!

Jennifer

Posted: Thu Nov 18, 2004 11:30 am
by chucksmith
According to page 2-1 of the Universe Basic manual, character strings are only limited by the amount of available memory.

http://www-306.ibm.com/software/data/u2 ... /Basic.pdf

Posted: Thu Nov 18, 2004 12:02 pm
by jjrbikes
Thanks Chuck - but I get that part. My question is not about how long/big a character string can be, but is there some internal limitation to the length of that character string within the actual Stage Variable (within a transformer). I'm not sure how else to state this....ummm, kind of like if you have a character string that is 3GB in size, you can (should) only put 2GB or less of that string into a hash file. How much can I put in a Stage Variable?

Thanks Again!

Jennifer

Posted: Thu Nov 18, 2004 12:13 pm
by chucksmith
If you are trying to key in a constant character string into the stage variable, there are compiler limitations relating to the length of a Basic statement.

If you are concatenating data and building a long string row by row, the documentation implies we will have no problem. Our limits would be physical or those enforced by the operating system (e.g. unix ulimit).

Still, once we build this big string, what do we do with it?

Posted: Thu Nov 18, 2004 1:53 pm
by jjrbikes
Ahhh - the plot thickens! Here's what's going on: I am reading in a file and want to know if a certain field changes from record to record. To track that I set up the following stage variables:

NewSysId = If Input.SysId <> SavedSysId Then @True Else @False
SavedSysId = Input.sys_id

This was not generating the expected results and a closer look suggested only part of the Input.SysId was being stored in the StageVariable. So, I changed the variables to the following:

NewSysIdStr1 = If string(Input.SysId, 1, 11) <> SavedSysIdStr1 Then @True Else @False
NewSysIdStr2 = If string(Input.SysId, 12, 11 ) <> SavedSysIdStr2 Then @True Else @False
SavedSysIdStr1 = string(Input.sys_id, 1, 11)
SavedSysIdStr2 = string(Input.sys_id, 12, 11)

This set-up works PERFECTLY! But WHY???

The field being checked/saved is defined as numeric(22) with "physical" contents being 19 characters long.

Any ideas???

Thanks again

Jennifer

Posted: Thu Nov 18, 2004 2:14 pm
by chucksmith
Tell me about the string() function. I guess it is returning a substring. Anything else?

Can you give me an example of unexpected results?

I know I have expected string comparison when DataStage / Universe performed its default numeric comparison. So, "00" matched "0000".

Sometimes, I have resorted to adding zero to values before compating then. :?:

Posted: Thu Nov 18, 2004 2:50 pm
by ray.wurlod
Keep in mind the Compare() function. You can force string comparison.

There is no practical limit to the length of a character string value in DataStage, and that includes as the value of a stage variable.

Finally, you don't need If expression Then @TRUE Else @FALSE; any expression that uses a comparison operator returns true/false so the expression alone will suffice. (Indeed, since there are no data types, in a Boolean context any expression will return true/false; 0 or "" is false, any other non-null value is true).

Code: Select all

svNewSysID     Compare(Input.SysID, svSavedSysId, "L") <> 0 And @INROWNUM <> 1
svSavedSysID   Input.SysID

Posted: Thu Nov 18, 2004 2:58 pm
by jjrbikes
Yep string(mybigdog, 3, 6) will return bigdog - the substring beginning at position 3 for a length of 6.

Unexpected results: the stage variable was not storing the entire input value each time. It seemed to work fine for maybe the first 1/3 to 1/2 of the records processed but then would drop maybe the last digit or two and therefore no more records "matched". At first I thought about the memory thing - but whether this data comes into one variable or two - it's going to use the same memory. So my next thought was maybe there is some limit to some temp buffer used by each Stage Variable??? Now, if it was somehow doing something funky where there were leading or trailing zeroes I'm not sure, but why would it not do that "funkiness" just because I split it into two parts?

Confused yet???

Thanks

Jennifer

Posted: Thu Nov 18, 2004 3:06 pm
by chucksmith
Constantly confused.

One idea is that the string function may be forcing a string comparison, and is working because the problem was related to numeric comparisons.

Another could be that these large decimal numbers are being converted to floating point values and we are having some precision issues.

As Ray suggested, I would try the Compare() function on the entire string and see what we get.

Interesting problem! :wink:

Posted: Thu Nov 18, 2004 3:09 pm
by ray.wurlod
Can you create an extra output link to a text file and capture the values of Input.SysID and the values of the stage variables?

If what you say is true it's a bug, with potentially catastrophic implications.

There is no technical reason (apart from running out of memory) for losing information from the ends of strings.

Posted: Mon Nov 22, 2004 5:04 pm
by dakjr64
Jennifer was nice enough to post this for me...but now that I am here I can type for myself.

After noticing the problem with our "real" data, I did the following test to confirm.

Created a seq file with 1 column of 19 digits as a Numeric 22. ex:
1111111111111111111
1111111111111111111
1111111111111111112
1111111111111111112
1111111111111111113
1111111111111111113
1111111111111111114

I have a boolean stage variable that is set based on whether the input value equals the saved value and i have a stage variable to hold the saved value. I then have a constraint referencing the boolean to only output when changed.

I have an output file with 2 columns. A direct mapping of the input value and a mapping for the saved value stage variable. All 19 digits show up in the file for both values. When the value change is in pos 19, only row one shows up (no break determined). If the value change is in pos 18 the results are intermittant and if the value change is in pos 17 it appears to determine the change in value ok. Can anyone else prove this to be true in their instance of DataStage 7.0.1?

Posted: Mon Nov 22, 2004 6:53 pm
by mhester
This is not a DS issue, rather it is a configuration issue within UV. Please ensure that the configuration parameter EXACTNUMERIC is equal to or larger than 22. The default is 15 and the max is 57. You change this value in the uvconfig file and then you must regen the UV shell. (stop the services, regen and then restart).

You could try all day long to get the numbers to compare (as 1 = 1) and it will not happen until you increase this value.

The original logic you posted will work once this change is made.

NOTE - I tested this (using your input file) in both a DS job and a BASIC program created in a TCL session with EXACTNUMERIC set to 15 and 57.

Regards,

Posted: Tue Nov 23, 2004 8:21 am
by dakjr64
Thanks Michael! That was it. Had the system config people change it from 15 to 20 and works just fine!

DK

Posted: Mon Feb 06, 2006 12:29 pm
by rwierdsm
Just had an issue similar to this and used this thread to find the answer.

Using standard technique described elsewhere in this forum to compare keys values between rows, i.e.

Stage variables
ThisRowValue var1 : var2 : var3
IsDuplicate if ThisRowValue = PrevRowValue then 1 else 0
PrevRowValue = ThisRowValue

Var1, Var2 and Var3 are declared as Varchar, but hold numeric values.

All rows were coming back as duplicates.

After spending 3.5 (@^#%@! :evil: ) hours debugging, tried using compare(ThisRowValue, PrevRowValue, 'R'). Now it works.

Not the difference: incoming and outgoing ports were declared as varchar, but they held numerics only. Seemed to pick up the first 8-10 characters of ThisRowValue and PrevRowValue only.

Rob W.