CRC Info

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
dssiddu
Participant
Posts: 66
Joined: Mon Nov 07, 2005 10:28 pm
Contact:

CRC Info

Post by dssiddu »

Hello

Could you please tell me the Main use of CRC32 function.

Adv thanks....

:roll:
loveojha2
Participant
Posts: 362
Joined: Thu May 26, 2005 12:59 am

Post by loveojha2 »

CRC stands for Cyclic Redundancy Code.
For a particular string its calculates its CRC.
Which in ETL we can use as an alternate (restricted alternate) of the string passed to it.
Success consists of getting up just one more time than you fall.
loveojha2
Participant
Posts: 362
Joined: Thu May 26, 2005 12:59 am

Post by loveojha2 »

Oh!
Forgot to mention the it generates a 32 bit code, hence the name.
Success consists of getting up just one more time than you fall.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

You will get a unique number (checksum) for each input.
This is used for several purpose, for example, duplicate check, dedupliacation, Surrogate key generation, lookup....
You can easily get more information by doing a search on the same keyword.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Hester recently posted why not to use CRC32 for "Surrogate key generation". He wrote this routine with another Ascential developer. He has lots of posts about how to use this routine. It works best in SCD type 2 calculations.

If you calculate CRC32 on all fields in a dimension record and calculate CRC32 on all new fields on a dimension. If the value is the same then most likely all fields in both records are the same. This is much more powerful and faster than comparing each field one at a time. I know one customer here in Dallas which put every column in a hashed file and made all fields part of the key. Then did a lookup on every field. This was the design of an IBM employee or Ascential at the time. Not good. If the record has changed then do your SCD type 2 insert and update.

I am sure Michael will explain this better soon. Do a search for his posts.
Mamu Kim
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

kumar_s wrote:You will get a unique number (checksum) for each input.
This is used for several purpose, for example, duplicate check, dedupliacation, Surrogate key generation, lookup....
You can easily get more information by doing a search on the same keyword.
:? A unique number fo each input? Nope. Useful for surrogate key generation? Nope. Duplicate check? Maybe.

As noted by Kim and recently brought back to raging life by Michael, use it for Change Data Detection - or CDD. When the CRC32 value for the 'same data' is different, something in it has changed. This can be used to ease SCD jobs work.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Craig and Kim,

Thanks! I thought I was going to have to open another can of CRC whoop butt!
You will get a unique number (checksum) for each input.
This is used for several purpose, for example, duplicate check, dedupliacation, Surrogate key generation, lookup....
You can easily get more information by doing a search on the same keyword.
Again - statements like these truly show the ignorance people have about CRC and checksum. First - CRC is NOT a checksum, checksum implies addition and CRC is not additive. CRC is based on division and remainders etc...

As stated before, two totally different rows of data can/will/might generate the same CRC value and that's ok - perfectly normal.

I spent an entire evening with a consultant a while back trying to help them fix their warehouse after they had used CRC32 to generate their "unique" surrogate keys argggggh :shock:
raging life by Michael
raging? - Craig - I like to think of it as more frustration then anything. I have simply had an uphill battle educating developers and customers about the virture of CRC in SCD processing. Every time a consultant utilizes the routine in a manner in which it was not developed simply helps to solidify the image that it doesn't work.

Regards,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Apologies Michael, perhaps a bad choice of word. I wasn't sure how to qualify your frustration in the post and that was the word that came to mind. Hope everyone realizes what I meant.

Perhaps a small edit is in order...
-craig

"You can never have too many knives" -- Logan Nine Fingers
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Craig,

No offense taken and I knew what you meant - I just don't want everyone to think I am a "raging" lunatic! :shock: which I am, but only a select group of people know that :D

Regards
shawn_ramsey
Participant
Posts: 145
Joined: Fri May 02, 2003 9:59 am
Location: Seattle, Washington. USA

Post by shawn_ramsey »

kumar_s wrote:You will get a unique number (checksum) for each input.
This is used for several purpose, for example, duplicate check, dedupliacation, Surrogate key generation, lookup....
You can easily get more information by doing a search on the same keyword.
There is a good Wikipedia entry for CRC. If you read it you will se how wrong you are. http://en.wikipedia.org/wiki/Cyclic_redundancy_check
Shawn Ramsey

"It is a mistake to think you can solve any major problems just with potatoes."
-- Douglas Adams
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Thanks Shawn Ramsey, Craig, Michael,
For point down my mistakes. Apologize for the wrong delivery.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

No big deal Kumar. Keep up the good work. It was just a minor point. This is a powerful and complex routine which is very useful when used correctly.
Mamu Kim
shawn_ramsey
Participant
Posts: 145
Joined: Fri May 02, 2003 9:59 am
Location: Seattle, Washington. USA

Post by shawn_ramsey »

kduke wrote:No big deal Kumar. Keep up the good work. It was just a minor point. This is a powerful and complex routine which is very useful when used correctly.
...and very bad when used incorrectly. :)
Shawn Ramsey

"It is a mistake to think you can solve any major problems just with potatoes."
-- Douglas Adams
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

All of us have been corrected on this forum if you answer enough questions. Kumar has given a lot of great answers. I want that to continue and not slow down becuase something was learned on this post. I always want to encourage people to post whether or not the question seems important. The important thing is to interact and grow into better developers and communicators. If you challenge yourself to grow and be a voice on this web site instead of just a listener then all of us should benefit. If your answer is missing something then Ray will fill in the gaps. If you want to get to the next level in your job then maybe you need to be to give an answer to a complicated issue. Maybe you have a good example of how to do something. Share it. Never be embarassed or ashamed for trying to participate. Nobody knows it all.
Mamu Kim
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I agree totally with Kim. We have our top posters, very well experienced in this field, to fill in the blanks or even correct us. Thats one reason I always come back and recheck my answers. And i have been corrected so many times, only to my and the dsxchange user's benefit. And we should welcome and keep an open attitude.
Regards,
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply