CRC32 & Null values

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
aravindk
Participant
Posts: 41
Joined: Tue Dec 09, 2003 8:59 am

CRC32 & Null values

Post by aravindk »

Hello all,

I am trying to setup a dirty dimension for which i dont have a natural key to specify. I am thinking of using the checksum value of the row as a field in this table and use that as a natural key.

Now, here are my questions:

1. is it a good idea to use that as an alternate key? (primary key would be a seq no).
2. sometimes i get a negative value as checksum. is it normal?
3. i am expecting null values in some of the fields. i have heard here before that if there are going to be null values, the best way to handle it to use a delimiter with crc32 function. How do i do that?

Many Thanks in advance,
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

I placed the code for how to do this on the ADN. CRC32 can, and often does, return negative values since the return is a signed 32 bit number. You are correct in your assumption that CRC does not like nulls when cat'ing a string since this will likely return erroneous results or null.

There are many methods for passing columns which contain nulls to CRC.

One is to check each column for null as you cat them.

Another would be to define your metadata in such a way as to have two columns, one for the key and the other for all of the remaining columns and delimited with something that will not occur in the data.

And another would be to use the routine placed on the ADN by another developer that actually reads all of the metadata on a given input link via a BASIC routine and checks each for null and then cat's them together. (this is a neat way to do it, but is tremendously slow and not suited for large data volumes).

Using any of the return values from either CRC or checksum is a poor choice for use as a key value since the same CRC value can (and does) appear more than once in a given run and duplicate checksums are almost a given since the UV checksum returns a 16 bit integer.


Regards,

Michael Hester
Last edited by mhester on Fri Jun 25, 2004 9:52 am, edited 1 time in total.
1stpoint
Participant
Posts: 165
Joined: Thu Nov 13, 2003 2:10 pm
Contact:

surrogate

Post by 1stpoint »

I would not use a CRC32 algorithm as your "natural" key. Instead I would use a Surrogate/sequence generator as the key to ensure uniqueness.
aravindk
Participant
Posts: 41
Joined: Tue Dec 09, 2003 8:59 am

Post by aravindk »

1stpoint,

I already have a surrogate key in this table which is the primary key. I was trying to the checksum as an alternate key.

Michael

I think I will go with the second method you suggested. That is concatenating all the fields and make that as one field. May be I will use the same field as the natural key.

Thanks & Regards,
Post Reply