Page 1 of 1

Checksum Stage not giving consistent output

Posted: Wed Oct 28, 2015 12:57 pm
by sg33
Hi -

I have a job that does a check sum on certain columns. The stage property has the "Use all columns except those specified" and some columns are defined in the "Exclude Column" list.

As part of a change request i have to by pass a couple of additional columns but i don't want the checksum to get impacted. When i am adding these two columns (COL1 and COL2) in the Exclude column list the checksum changes.

My understanding is that the checksum shouldn't be impacted since the new columns are defined to be excluded from the checksum computation.

Any suggestions are appreciated.

Thanks

Posted: Wed Oct 28, 2015 1:16 pm
by ray.wurlod
I agree with your expectation.

It may be that you have discovered a bug, in which case you report it.

Or it may be that you have RCP enabled on the output link of the stage immediately upstream of the Checksum stage.

Posted: Wed Oct 28, 2015 1:59 pm
by sg33
Thanks Ray, i checked the output link and RCP is not enabled, i will test this fragment of the code with smaller dataset over the next couple of days.

Will let you know what i find.

Posted: Thu Oct 29, 2015 5:52 am
by qt_ky
Also see this technote in case it may be related.

Posted: Mon Nov 02, 2015 9:57 am
by sg33
Found something really weird during the testing, the job i was working on filters the record based on the checksum. It basically checks after the sort if the checksum for the record is equal to the previous value, if not then it sends the record to the output file.

If i create a copy of the job, compile and rerun it, the number of records going into the output differ significantly.
Original job sends 49452 records.
Copy job sends 135852 records to the output.

Doesn't matter how many times i create a copy or rerun the copy or the original job the numbers remain the same as above.

I tried to import the job into another project and i find same issue. Not sure why the copy job is giving a different number. Help Please!

Posted: Wed Nov 25, 2015 8:17 am
by sg33
My observations for this were very unusual, the original job when renamed with a suffix and compiled and rerun the record count was exactly the same as the copy job.

When the job was renamed to the original name and rerun the count was again exactly the same as the copy job. Atleast the count with both the jobs was consistent.

Not sure why it worked after renaming the job to another name and renaming it back to the original again.

Anyways, the original problem reported for the Checksum stage is no longer there so i will mark the topic as resolved.