Page 1 of 1

Generate Sequence Number

Posted: Mon Oct 10, 2005 5:48 pm
by Abhinav
Hi

Is there a way i could generate a sequence number in parallel jobs as we do in Server jobs using KeyManagementGetNextValue.

It should pick the last generated sequence number from prevoius run and start from there.

Thanks

Abhinav

Posted: Mon Oct 10, 2005 8:11 pm
by vmcburney
Parallel jobs have the surrogate key generation stage, you can also use a counter in a transformation stage as shown in the FAQ forum. Both methods will ensure you have unique generated numbers across your instances.

You have to make sure you have a correct starting value for your sequence. You can do this at the sequence job level by retrieving the current maximum value from the target database table via a shell script and then passing that into your parallel job as a job parameter. Use the job parameter as the initial value of your counter or surrogate key stage.

Re: Generate Sequence Number

Posted: Tue Oct 11, 2005 2:15 pm
by sun rays
How about accomplishing this at the database level, like using a sequence generator. Does this have any disadvantages compared to the one generated by the Datastage.

Re: Generate Sequence Number

Posted: Tue Oct 11, 2005 2:22 pm
by kcbland
sun rays wrote:How about accomplishing this at the database level, like using a sequence generator. Does this have any disadvantages compared to the one generated by the Datastage.
That sticks a choke point in the database. You have a process individually handing out the next key ala a sequence. If you are simultaneously loading multiple pipelines of data (partitioned parallelism) into the table, they will all congest around getting that sequence. You're better off using a generator in your tool to insure that each pipeline can achieve maximum throughput because it's working with its own range of keys from which to assign and need not worry another pipeline is using the same number. Gaps in surrogate key assignments are okay, which is what you end up with because of range allocation to each pipeline.

And I'm not even talking about having to reference back the just assigned keys so that you can now embed them as foreign surrogate keys. Kind of silly to jam data into a database and then get it back out again for the next layer of data to load. Better to jam it all in at the end.