Page 1 of 1

CRC32 Routine

Posted: Tue Mar 30, 2004 8:46 pm
by raju_chvr
Can anyone forward me any links or information on CRC32 Routine ?[/quote]

Posted: Tue Mar 30, 2004 9:43 pm
by vmcburney
Go to Ascential Developernet and look in the download section. Craig has posted demos of it there.

Posted: Wed Mar 31, 2004 12:12 am
by chulett
Actually Vince, much as I'd like to take credit for something like that, it was posted by Michael Hester. The Proud Papa of the routine Michael Hester, in fact. :wink:

Here's a direct link to the correct section of the Library, assuming you are an ADN member.

Posted: Wed Mar 31, 2004 10:51 pm
by vmcburney
Whoops, I even got the demo directly from Michael some time back and here I am giving credit to someone else. Sorry Michael. Nice function, used it at a couple client sites now.

Posted: Thu Apr 01, 2004 8:42 am
by mhester
No skin off my back :-)

Posted: Thu Apr 01, 2004 8:55 am
by mhester
Also,

Some information I believe is important when dealing with CRC32 or any routine like this.

I have read some posts on this site and the ADN where a customer refuses to use CRC32 because they feel that a 1 in 4 billion chance of an incorrect CRC is too high. Keep in mind that if you process 4 billion rows of data in single run and use CRC32 against each row that you may or may not observe a failure. The premise is that each row has a 1 in 4 billion chance of failure. I could go into the mathematics to support this, but this is not the correct forum for this. If anyone is really interested in this then please contact me directly (email) and I would be happy to send what I have.

Regards,

Michael Hester

Posted: Thu Apr 01, 2004 2:24 pm
by clshore
Actually, the chances are far less than 1 in 4B. especially if you are working with large datasets. This is an example of a class of problems called the 'Birthday Paradox' in statistics classes.

"Given a group of people, what are the chances that 2 folks will have the same birthday? Answer depends on the size of the group. If there are 2, odds are about 1:364. If there are 23, the odds are 1:1."

Don't believe me, look it up for yourself.

BTW, I found out the hard way with CRC32 on an EDW project, when asked to find some 'missing records'. Different records, same CRC32! (d'oh!).

Carter

mhester wrote:Also,

Some information I believe is important when dealing with CRC32 or any routine like this.

I have read some posts on this site and the ADN where a customer refuses to use CRC32 because they feel that a 1 in 4 billion chance of an incorrect CRC is too high. Keep in mind that if you process 4 billion rows of data in single run and use CRC32 against each row that you may or may not observe a failure. The premise is that each row has a 1 in 4 billion chance of failure. I could go into the mathematics to support this, but this is not the correct forum for this. If anyone is really interested in this then please contact me directly (email) and I would be happy to send what I have.

Regards,

Michael Hester

Posted: Thu Apr 01, 2004 3:32 pm
by mhester
Carter,

Thank you for your reply. While I understand the "Birthday Paradox" and many of the variations I'm not sure how this would apply to CRC. I have forwarded your message to the author of the algorithm used in CRC32 employed by Ascential to get his take on this. I have communicated with him before and he is very receptive to inquiries regarding this subject. He has advanced degrees in mathematics and statistics so we should be able to better understand the mechanics once he replies.

I will reply to you when I receive this information.

Regards,

Michael Hester