CRC32 Routine

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
raju_chvr
Premium Member
Premium Member
Posts: 165
Joined: Sat Sep 27, 2003 9:19 am
Location: USA

CRC32 Routine

Post by raju_chvr »

Can anyone forward me any links or information on CRC32 Routine ?[/quote]
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Go to Ascential Developernet and look in the download section. Craig has posted demos of it there.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Actually Vince, much as I'd like to take credit for something like that, it was posted by Michael Hester. The Proud Papa of the routine Michael Hester, in fact. :wink:

Here's a direct link to the correct section of the Library, assuming you are an ADN member.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Whoops, I even got the demo directly from Michael some time back and here I am giving credit to someone else. Sorry Michael. Nice function, used it at a couple client sites now.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

No skin off my back :-)
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Also,

Some information I believe is important when dealing with CRC32 or any routine like this.

I have read some posts on this site and the ADN where a customer refuses to use CRC32 because they feel that a 1 in 4 billion chance of an incorrect CRC is too high. Keep in mind that if you process 4 billion rows of data in single run and use CRC32 against each row that you may or may not observe a failure. The premise is that each row has a 1 in 4 billion chance of failure. I could go into the mathematics to support this, but this is not the correct forum for this. If anyone is really interested in this then please contact me directly (email) and I would be happy to send what I have.

Regards,

Michael Hester
clshore
Charter Member
Charter Member
Posts: 115
Joined: Tue Oct 21, 2003 11:45 am

Post by clshore »

Actually, the chances are far less than 1 in 4B. especially if you are working with large datasets. This is an example of a class of problems called the 'Birthday Paradox' in statistics classes.

"Given a group of people, what are the chances that 2 folks will have the same birthday? Answer depends on the size of the group. If there are 2, odds are about 1:364. If there are 23, the odds are 1:1."

Don't believe me, look it up for yourself.

BTW, I found out the hard way with CRC32 on an EDW project, when asked to find some 'missing records'. Different records, same CRC32! (d'oh!).

Carter

mhester wrote:Also,

Some information I believe is important when dealing with CRC32 or any routine like this.

I have read some posts on this site and the ADN where a customer refuses to use CRC32 because they feel that a 1 in 4 billion chance of an incorrect CRC is too high. Keep in mind that if you process 4 billion rows of data in single run and use CRC32 against each row that you may or may not observe a failure. The premise is that each row has a 1 in 4 billion chance of failure. I could go into the mathematics to support this, but this is not the correct forum for this. If anyone is really interested in this then please contact me directly (email) and I would be happy to send what I have.

Regards,

Michael Hester
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Carter,

Thank you for your reply. While I understand the "Birthday Paradox" and many of the variations I'm not sure how this would apply to CRC. I have forwarded your message to the author of the algorithm used in CRC32 employed by Ascential to get his take on this. I have communicated with him before and he is very receptive to inquiries regarding this subject. He has advanced degrees in mathematics and statistics so we should be able to better understand the mechanics once he replies.

I will reply to you when I receive this information.

Regards,

Michael Hester
Post Reply