Surrogate Key using a state file or Database
Posted: Wed Oct 15, 2014 1:06 pm
Hi All,
I am trying to generate surrogate key or a sequence number for one of my job. As I understand there are 3 ways to do it.
Option # 1: Use Database Sequence if your target is a table. Unfortunately, I can't use this.
Option # 2: Use Surrogate key generator or the surrogate key tab in the stages like transformer and then use a Flat File or DB sequence. If I go for the flat file approach with "In blocks of : 1" the job is performing extremely slow. Its like taking 3 extra hours for processing 200K records. I am setting the block to 1 as I need the surrogate key to be generated sequentially when I am using 4 nodes. For DB Sequence, the only provided option are DB2 and Oracle. My XMETA is on DB2 and we also have Netezza for our DW but its not listed. How do I use the DB2 database for this?
Option # 3: Use the following derivation in the transformer. But to use this I am joining/looking up my target to fetch the max number generated in the previous run so as to continue the sequence. That's slowing down the job too.
@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + NullToValue(field_name, 1)
Is there any better option other than above where the performance doesn't take a hit; surrogate key is generated sequentially on a 4-node environment; and it also continues from the point where it stopped in the previous run?
Thanks
I am trying to generate surrogate key or a sequence number for one of my job. As I understand there are 3 ways to do it.
Option # 1: Use Database Sequence if your target is a table. Unfortunately, I can't use this.
Option # 2: Use Surrogate key generator or the surrogate key tab in the stages like transformer and then use a Flat File or DB sequence. If I go for the flat file approach with "In blocks of : 1" the job is performing extremely slow. Its like taking 3 extra hours for processing 200K records. I am setting the block to 1 as I need the surrogate key to be generated sequentially when I am using 4 nodes. For DB Sequence, the only provided option are DB2 and Oracle. My XMETA is on DB2 and we also have Netezza for our DW but its not listed. How do I use the DB2 database for this?
Option # 3: Use the following derivation in the transformer. But to use this I am joining/looking up my target to fetch the max number generated in the previous run so as to continue the sequence. That's slowing down the job too.
@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + NullToValue(field_name, 1)
Is there any better option other than above where the performance doesn't take a hit; surrogate key is generated sequentially on a 4-node environment; and it also continues from the point where it stopped in the previous run?
Thanks