Generate Sequence Number

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Abhinav
Premium Member
Premium Member
Posts: 65
Joined: Tue Jun 29, 2004 10:26 am
Location: California

Generate Sequence Number

Post by Abhinav »

Hi

Is there a way i could generate a sequence number in parallel jobs as we do in Server jobs using KeyManagementGetNextValue.

It should pick the last generated sequence number from prevoius run and start from there.

Thanks

Abhinav
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Parallel jobs have the surrogate key generation stage, you can also use a counter in a transformation stage as shown in the FAQ forum. Both methods will ensure you have unique generated numbers across your instances.

You have to make sure you have a correct starting value for your sequence. You can do this at the sequence job level by retrieving the current maximum value from the target database table via a shell script and then passing that into your parallel job as a job parameter. Use the job parameter as the initial value of your counter or surrogate key stage.
sun rays
Charter Member
Charter Member
Posts: 57
Joined: Wed Jun 08, 2005 3:35 pm
Location: Denver, CO

Re: Generate Sequence Number

Post by sun rays »

How about accomplishing this at the database level, like using a sequence generator. Does this have any disadvantages compared to the one generated by the Datastage.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Re: Generate Sequence Number

Post by kcbland »

sun rays wrote:How about accomplishing this at the database level, like using a sequence generator. Does this have any disadvantages compared to the one generated by the Datastage.
That sticks a choke point in the database. You have a process individually handing out the next key ala a sequence. If you are simultaneously loading multiple pipelines of data (partitioned parallelism) into the table, they will all congest around getting that sequence. You're better off using a generator in your tool to insure that each pipeline can achieve maximum throughput because it's working with its own range of keys from which to assign and need not worry another pipeline is using the same number. Gaps in surrogate key assignments are okay, which is what you end up with because of range allocation to each pipeline.

And I'm not even talking about having to reference back the just assigned keys so that you can now embed them as foreign surrogate keys. Kind of silly to jam data into a database and then get it back out again for the next layer of data to load. Better to jam it all in at the end.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply