Page 1 of 1

CRC Info

Posted: Fri Jul 14, 2006 1:02 am
by dssiddu
Hello

Could you please tell me the Main use of CRC32 function.

Adv thanks....

:roll:

Posted: Fri Jul 14, 2006 2:21 am
by loveojha2
CRC stands for Cyclic Redundancy Code.
For a particular string its calculates its CRC.
Which in ETL we can use as an alternate (restricted alternate) of the string passed to it.

Posted: Fri Jul 14, 2006 2:27 am
by loveojha2
Oh!
Forgot to mention the it generates a 32 bit code, hence the name.

Posted: Fri Jul 14, 2006 3:36 am
by kumar_s
You will get a unique number (checksum) for each input.
This is used for several purpose, for example, duplicate check, dedupliacation, Surrogate key generation, lookup....
You can easily get more information by doing a search on the same keyword.

Posted: Fri Jul 14, 2006 6:40 am
by kduke
Hester recently posted why not to use CRC32 for "Surrogate key generation". He wrote this routine with another Ascential developer. He has lots of posts about how to use this routine. It works best in SCD type 2 calculations.

If you calculate CRC32 on all fields in a dimension record and calculate CRC32 on all new fields on a dimension. If the value is the same then most likely all fields in both records are the same. This is much more powerful and faster than comparing each field one at a time. I know one customer here in Dallas which put every column in a hashed file and made all fields part of the key. Then did a lookup on every field. This was the design of an IBM employee or Ascential at the time. Not good. If the record has changed then do your SCD type 2 insert and update.

I am sure Michael will explain this better soon. Do a search for his posts.

Posted: Fri Jul 14, 2006 7:07 am
by chulett
kumar_s wrote:You will get a unique number (checksum) for each input.
This is used for several purpose, for example, duplicate check, dedupliacation, Surrogate key generation, lookup....
You can easily get more information by doing a search on the same keyword.
:? A unique number fo each input? Nope. Useful for surrogate key generation? Nope. Duplicate check? Maybe.

As noted by Kim and recently brought back to raging life by Michael, use it for Change Data Detection - or CDD. When the CRC32 value for the 'same data' is different, something in it has changed. This can be used to ease SCD jobs work.

Posted: Fri Jul 14, 2006 8:38 am
by mhester
Craig and Kim,

Thanks! I thought I was going to have to open another can of CRC whoop butt!
You will get a unique number (checksum) for each input.
This is used for several purpose, for example, duplicate check, dedupliacation, Surrogate key generation, lookup....
You can easily get more information by doing a search on the same keyword.
Again - statements like these truly show the ignorance people have about CRC and checksum. First - CRC is NOT a checksum, checksum implies addition and CRC is not additive. CRC is based on division and remainders etc...

As stated before, two totally different rows of data can/will/might generate the same CRC value and that's ok - perfectly normal.

I spent an entire evening with a consultant a while back trying to help them fix their warehouse after they had used CRC32 to generate their "unique" surrogate keys argggggh :shock:
raging life by Michael
raging? - Craig - I like to think of it as more frustration then anything. I have simply had an uphill battle educating developers and customers about the virture of CRC in SCD processing. Every time a consultant utilizes the routine in a manner in which it was not developed simply helps to solidify the image that it doesn't work.

Regards,

Posted: Fri Jul 14, 2006 10:14 am
by chulett
Apologies Michael, perhaps a bad choice of word. I wasn't sure how to qualify your frustration in the post and that was the word that came to mind. Hope everyone realizes what I meant.

Perhaps a small edit is in order...

Posted: Fri Jul 14, 2006 10:35 am
by mhester
Craig,

No offense taken and I knew what you meant - I just don't want everyone to think I am a "raging" lunatic! :shock: which I am, but only a select group of people know that :D

Regards

Posted: Fri Jul 14, 2006 10:40 am
by shawn_ramsey
kumar_s wrote:You will get a unique number (checksum) for each input.
This is used for several purpose, for example, duplicate check, dedupliacation, Surrogate key generation, lookup....
You can easily get more information by doing a search on the same keyword.
There is a good Wikipedia entry for CRC. If you read it you will se how wrong you are. http://en.wikipedia.org/wiki/Cyclic_redundancy_check

Posted: Fri Jul 14, 2006 8:55 pm
by kumar_s
Thanks Shawn Ramsey, Craig, Michael,
For point down my mistakes. Apologize for the wrong delivery.

Posted: Sat Jul 15, 2006 12:21 am
by kduke
No big deal Kumar. Keep up the good work. It was just a minor point. This is a powerful and complex routine which is very useful when used correctly.

Posted: Sat Jul 15, 2006 8:58 pm
by shawn_ramsey
kduke wrote:No big deal Kumar. Keep up the good work. It was just a minor point. This is a powerful and complex routine which is very useful when used correctly.
...and very bad when used incorrectly. :)

Posted: Tue Jul 18, 2006 6:04 am
by kduke
All of us have been corrected on this forum if you answer enough questions. Kumar has given a lot of great answers. I want that to continue and not slow down becuase something was learned on this post. I always want to encourage people to post whether or not the question seems important. The important thing is to interact and grow into better developers and communicators. If you challenge yourself to grow and be a voice on this web site instead of just a listener then all of us should benefit. If your answer is missing something then Ray will fill in the gaps. If you want to get to the next level in your job then maybe you need to be to give an answer to a complicated issue. Maybe you have a good example of how to do something. Share it. Never be embarassed or ashamed for trying to participate. Nobody knows it all.

Posted: Tue Jul 18, 2006 10:08 am
by DSguru2B
I agree totally with Kim. We have our top posters, very well experienced in this field, to fill in the blanks or even correct us. Thats one reason I always come back and recheck my answers. And i have been corrected so many times, only to my and the dsxchange user's benefit. And we should welcome and keep an open attitude.
Regards,