Page 1 of 1

Generate multiple Checksum/SK in a generic job

Posted: Sun Feb 19, 2017 10:28 pm
by rohit_mca2003
Hi,

I have a requirement to use a generic job to read source file and load data into table.
Each time I have a new source file, the corresponding target may have different number of Checksum/Surrogate Key columns.

Example (Scenario 1):
---------------------------
Source --> File 1 (Col1, Col2, Col3, Col4)
Target --> Table1 (Col1, Col2, Col3, Col4, Checksum(Col1,Col2), Checksum(Col3,Col4))

Example (Scenario 2):
---------------------------
Source --> File 1 (Col1, Col2, Col3, Col4, Col5)
Target --> Table1 (Col1, Col2, Col3, Col4, Checksum(Col1,Col2), Checksum(Col3,Col4), Checksum(Col5))

So each time I receive a source, I have to generate different number of checksum column.
Is there any way I can achieve this using generic job like
(Source --> Generate different number of Checksums --> Target)

Thanks,
Rohit

Posted: Mon Feb 20, 2017 11:23 am
by UCDI
there are a couple of ways... the way we have been doing it is to set up for a good maximum # of columns you might need, for example say you need 3 or 4, then you might set up for 6 or 8.

If a particular input is blank, then the related checksum column on the output would be blank (and no work done, beyond carrying the empty columns around for a short time). Your using job can drop the unused columns.

so if you needed 2, the first 2 columns to your shared code have the data that will be run thru the checksum, the others empty. If you need 3, the first 3, etc. Its a little clunky, but its flexible and worked well for us.

Posted: Mon Feb 20, 2017 11:25 am
by chulett
... and that is valid for a "generic" job? As in one using RCP, I assume.

Posted: Tue Feb 21, 2017 8:11 am
by UCDI
I can't say about valid.
It works with rcp. In this example you need to have the columns you want checksums on exposed and mapped into inputs, but the rest can be rcp.

I am open to a better method ... I don't really like it, but my company has been doing it that way unquestioned for a while.

Posted: Tue Feb 21, 2017 8:14 am
by chulett
Good to know. Was mostly worried about the new source file comment where each "may have different number" of columns to checksum.

Posted: Wed Feb 22, 2017 8:23 am
by rohit_mca2003
Thanks for the replies.
Even we are following similar approach. Putting a maximum number of Checksum.
For each checksum we provide column/s using parameter. This job is RCP.
At the end we Drop unnecessary checksum columns (again controlled by parameters for particular instance/value file).

Thanks for all help.