Sequence or Unique identifer across tables in PX ?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Havoc
Participant
Posts: 110
Joined: Fri Nov 24, 2006 8:26 am

Sequence or Unique identifer across tables in PX ?

Post by Havoc »

Hi ,


I have a scenario in which a unique identifier or a sequence number must be generated across jobs i.e. there are multiple jobs inserting on the same table and should insert the next sequence number/unique identifier per job. This unique identifier should then be passed and be inserted in TWO other tables.Tables are residing in an Oracle database.

Can someone please provide a solution/method on how one can go about this ?
Last edited by Havoc on Fri Feb 02, 2007 7:59 am, edited 1 time in total.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Why not use JOBNO? This is stored in DS_JOBS. You can easily look it up based on the job name.
Mamu Kim
Havoc
Participant
Posts: 110
Joined: Fri Nov 24, 2006 8:26 am

Post by Havoc »

kduke wrote:Why not use JOBNO? This is stored in DS_JOBS. You can easily look it up based on the job name.
How can we leverage the JOBNO to create a unique identifier?
One more query is this JOBNO dynamic or static?

Basically there are multiple jobs and we have to avoid clashes when generating this unique identifer for this ONE table and pass same value to TWO other tables per Job.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I think using job name is better - since the JOBNO for a job may change (re-import to another project, saving of a copy, etc.).
Are you looking at something like an Oracle Sequence, but driven from DataStage?
Havoc
Participant
Posts: 110
Joined: Fri Nov 24, 2006 8:26 am

Post by Havoc »

ArndW wrote:I think using job name is better - since the JOBNO for a job may change (re-import to another project, saving of a copy, etc.).
Are you looking at something like an Oracle Sequence, but driven from DataStage?
Well that is the point but we have to load this same sequence number for three other jobs. I'll give you an example with sample data to give u a clearer understanding of what we're trying to achieve.

We have Table 1 and Table 2 which are loaded in the same job. Each of these tables share one common key which has to be a unique identifier/sequence number. I'll give u an example of the table data. It
looks like below:

ID COL1 COL2 COL3
----------------------------
1 A B C
2 D E F
3 G H I
4 J K L


ID COL
-----------
1 A
1 B
1 C
2 D
2 E
2 F

The above two tables are inserted in multiple jobs and the ID column should be uniquely generated across jobs.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

PX has a surrogate key stage; but it generates unique keys only within the job itself. But you can specify the starting value dynamically, i.e. through a parameter to the job. If you can think of an algorithm that ensures each job only uses a given range of numbers you can use that method. The generated key can be up to an int*64 so that gives you a lot of numbers to use.

Using a reference lookup to a unique Oracle sequence is another option, but it will severely impact performance.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Will these jobs be running consecutively or simultaneously?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Havoc
Participant
Posts: 110
Joined: Fri Nov 24, 2006 8:26 am

Post by Havoc »

ray.wurlod wrote:Will these jobs be running consecutively or simultaneously? ...
Hi ray,

these jobs will be running simultaneously, still havent found a solution as to how we can go about implementing this. The unique identifer should be a sequence too :(
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

ArndW wrote:...Using a reference lookup to a unique Oracle sequence is another option, but it will severely impact performance...
Havoc - this is a solution
Havoc
Participant
Posts: 110
Joined: Fri Nov 24, 2006 8:26 am

Post by Havoc »

ArndW wrote:
ArndW wrote:...Using a reference lookup to a unique Oracle sequence is another option, but it will severely impact performance...
Havoc - this is a solution
Hey Arnd,

Just tried out the above method. But got a question..On what key can we join the input data with the Oracle sequence to get the nextval?

Tried this method out with a lookup to an Oracle stage with the SQL:

select TEST_SEQUENCE.nextval from dual;

But this resulted in all 0s in the 'nextval' field in our output sequential file.

:(
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I'll assume that the sql works from the command line. You need to specify a lookup stage and use sparse lookups; I'm not at work so can't give you the exact syntax but you need to specify a dummy input key column and use that in the SQL (so that DataStage actually performs the query for each row).
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The sequence does have to exist and to have been initialized.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Create a dummy column along with the Nextval, say with the constant value '1' and lookup on that. If you MINVALUE starts with negative, there is a possibility to get 0's, else no.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Havoc
Participant
Posts: 110
Joined: Fri Nov 24, 2006 8:26 am

Post by Havoc »

kumar_s wrote:Create a dummy column along with the Nextval, say with the constant value '1' and lookup on that. If you MINVALUE starts with negative, there is a possibility to get 0's, else no. ...
Thanks ArndW and Kumar, this approach works. :) I get your point about the performance overhead but right now there is no other go.
Post Reply