Float/Real/Numeric Values (SQL)

PhilipDuPlessis · Post by **PhilipDuPlessis** » Tue Nov 01, 2005 8:25 am

As per most databases designed on SQL, it is always amazing to see exactly how the tables are created when it comes to number datatypes. The problem is, when for example creating a hash file of database entries, it is a well known fact that you are inevitably bound to spending hours and hours on "massaging" the data and actually getting it to work in datastage. This is especially true with float numbers, and out of personal experience, real numbers are not far off (as experienced today where more than 4 hours was spent on taking - get this - a float, a numeric, a real as well as a decimal and performing a mathematical calculation)

So my question is this:
What is the best possible approach to use when you have such a mixed basket of datatypes that you have to use somewhere in DataStage?

kcbland · Post by **kcbland** » Tue Nov 01, 2005 8:32 am

They don't matter. DS Server stores all data in hash files as strings (see the recent newsletter for a little discussion about hash files).

DS Server is amazingly flexible with datatypes, since everthing is a string. Dates are strings, numbers are strings, etc. You don't have to cast a string to a number to do math, it does it silently for you. So, if:

Code: Select all

x="FRED"
y=x+1

DS will show a warning message that a non-numeric value was used in a math expression and zero was substituted, but it won't blow up. It's up to you to use valid values in any of your expressions.

Enjoy!

ray.wurlod · Post by **ray.wurlod** » Tue Nov 01, 2005 2:19 pm

DataStage (server) recognizes four data classes.

Numeric - you can do arithmetic and Num(x) returns true

String - any data including numeric but not opaque

Null - all operations except certain dynamic array operations on these return Null, IsNull() returns true, will abort jobs if used in fixed arguments in functions, such as second argument of Oconv()

Opaque - things like file handles, subroutine handles, connections, etc.

PhilipDuPlessis · Post by **PhilipDuPlessis** » Tue Nov 01, 2005 10:13 pm

Thanks so much.

Note that on float values, if you do a direct compare on the data types, there may be problems. Not quite sure why, but even the datastage consultants here strongly advise against the use of float values within datastage.

Any specific technical reason behind this?

ray.wurlod · Post by **ray.wurlod** » Wed Nov 02, 2005 3:08 am

Yep.

Floating point numbers (such as pi) can not, by definition, be stored with infinite precision in a finite number of bits. Therefore comparisons between them may not be exact.

To get around this, DataStage uses a "close enough" algorithm (called "wide zero"). It is set up in uvconfig as a mask on the IEEE representation of a floating point number as the WIDEZERO parameter. By default it represents a wide zero of approximately plus or minus 2^-19. That is, if the two floating point numbers are within 2^-19 of each other, they will be regarded as equal ("close enough") in comparisons.

There is a white paper on the IBM U2 web site that discusses the WIDEZERO parameter in excruciating detail.

PhilipDuPlessis · Post by **PhilipDuPlessis** » Wed Nov 02, 2005 3:20 am

thanks man