Join Key column data type decision

SBSA_DW2 · Post by **SBSA_DW2** » Fri Oct 15, 2010 4:05 am

Hi Guys

I'm trying to decide if we are overusing varchar data types when doing joins/lookups in Datastage. In 99% of situations we cast data to same varchar types and lengths before doing a join, merge, lookup etc.

It may be we're not experienced in using decimals/integers etc, but my real question is 'what is the performance impact of converting and joining using varchar instead of a number datatype (decimal/integer etc)'?

in database joins joining by using varchar as keys is expensive, does the same apply in datastage?

are other developers experiencing similar 'over-use' of varchar?

thanks!

ArndW · Post by **ArndW** » Fri Oct 15, 2010 5:05 am

One of the basic tenets for performance is never to perform unnecessary conversions. If your source and target type is numeric for a given column then performing 2 conversion to VarChar on the way is inefficient. Likewise, a comparison of two numbers is far more efficient than comparing two strings. Numbers are far more compact than strings and by using them you save space; particularly as data streams between stages are not always just in memory but sometimes also get buffered to disk. Two bytes can store numbers from -32767 to +32768, but storing "-32767" in a string takes 7 or 8 bytes (depending upon what type of string you have). A BigInt representation of -2147483647 with 4 bytes is compact compared to 13 bytes for string "-2147483647" - with one million rows that is a difference in size of 9Mb.