Should I use checksum
Moderators: chulett, rschirm, roy
Should I use checksum
I need to compare a table row with a row in a file to see if any field has changed. I am thinking to concatenation required fields and do the comparison. I am not sure if I should calculate the checksum then do the comparison. Which one is more efficient? Thank you in advance.
Regards
Jie
Jie
If you plan on using a checksum for change management, then typically one would include a checksum field in the record with the value for the current row. Then you can generate one for the incoming row and compare the old v. new to see if anything had changed. Store the new checksum with the new / updated record. Note that this works just fine as long as you don't care which field changed.
I'd also suggest putting a standard delimiter between each field, anything not a string be converted to a string and ensuring that things like dates are standardized to avoid having something like "6/8/12" and "06/08/2012" being considered as different values. And the delimiter stops silly things like this happening:
Simply cat'd together, "12345" = "12345" so no change. A delimiter like a pipe would show that "12|345" != "123|45".
I'd also suggest putting a standard delimiter between each field, anything not a string be converted to a string and ensuring that things like dates are standardized to avoid having something like "6/8/12" and "06/08/2012" being considered as different values. And the delimiter stops silly things like this happening:
Code: Select all
Col1 Col2
Old: 12 345
New: 123 45
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
While checksum algorithms are pretty efficient, they still take CPU cycles. I would put the CRC value in the data with each record, the will at least halve the number of computations if each record is only compared one time.
Also remember that each field you concatenate needs to be checked for NULLs, which need to be replaced with something. Appending a NULL string to any other string results in a NULL string.
I have used the Generic Stage and the CRC32 stage to dynamically generate a checksum, it uses a list of column names in schema format as input to determine the concatenated string for CRC32 computation.
Also remember that each field you concatenate needs to be checked for NULLs, which need to be replaced with something. Appending a NULL string to any other string results in a NULL string.
I have used the Generic Stage and the CRC32 stage to dynamically generate a checksum, it uses a list of column names in schema format as input to determine the concatenated string for CRC32 computation.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
If you are not planning on storing it, then I don't see the point of computing it, especially twice - simply compare the two concatenated strings. And I forgot to mention the null issue, my bad. My only excuse is working with Informatica for the last two years. It doesn't seem to care about concatenating null fields together and automagically does some sort of 'NullToEmpty' equivalent behind the scenes, so we just paste'm together.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers