CRC32

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
logic
Participant
Posts: 115
Joined: Thu Feb 24, 2005 10:48 am

CRC32

Post by logic »

Hi All,.....Mike,
Is there a upper limit for number of columns that can be compared using CRC32 function and how will be the performance be affected by having around 80 columns to be compared using crc32.Did a quick search on the discussed pls direct me to the link.
Thanks,
Ash.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The crc32 algorithm returns 4 bytes. The longer your input data string the more likely it will be that you will get an encoding collision. So there is no hard limit to adhere to, just a realization that you might be getting a duplicate shown in crc32 that represents different data.
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Post by pnchowdary »

Hi Logic,

Here is an answer to your question from one of the previous post by Michael Hester
"It's actually 2^32 or 1 in 4,294,967,296 and that is for every row. It does not mean that an incorrect CRC will be generated if you process 4294967296 rows of data, rather each row has a 1 in 4294967296 chance. Not likely that this will fail for you.

Starbucks has been using this for 3+ years and to the best of my knowledge it has not yet failed."
_________________
Michael Hester
Thanks,
Naveen
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

The crc32 algorithm returns 4 bytes. The longer your input data string the more likely it will be that you will get an encoding collision.
Of course it returns 4 bytes or 32 bits.

I don't claim to fully understand the mathematics of CRC, but I do believe that string length does not really affect the result. CRC algorithms (all variants) are generally used to produce CRC's for files that can be very large. Each bit is evaluated and then shifted (either right or left) and then the next bit is evaluated until all bits have been evaluated.

Again, if anyone is interested in the mathematics or statistics behind CRC then drop me a line and I will point you in the right direction.

Craig - maybe it's time I write that FAQ????? :lol:
logic
Participant
Posts: 115
Joined: Thu Feb 24, 2005 10:48 am

Post by logic »

Thanks Mike,Naveen and Arnd.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

mhester wrote:Craig - maybe it's time I write that FAQ????? :lol:
Perhaps... perhaps... :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply