How to read the max_key value and increment by 1 to the key

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

laxmi_etl
Charter Member
Charter Member
Posts: 117
Joined: Thu Sep 28, 2006 9:10 am

How to read the max_key value and increment by 1 to the key

Post by laxmi_etl »

Hi,

Can someone help me on reading max_key from one table and increment by 1 to every row , and insert in the target table. After the increment i neeed to update the max_key value.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Most databases have sequences that you can use to do this in simpler manner; you could also manage your keys from within datastage. What database are you using? Can you implement sequences? If DataStage is the only application that is creating new keys for this table then the most efficient and fastest method is to use the builtin key management routines.
shamshad
Premium Member
Premium Member
Posts: 147
Joined: Wed Aug 25, 2004 1:39 pm
Location: Detroit,MI

How to Get Max Key

Post by shamshad »

If my table is Employee and I need to get MAX EMPID and then increment it, here is how I would do.

SELECT MAX(EMPID) from Employee
Send the EMPID to Transformer
Have a "DUMMYKEY" as another column in Transformer (value 'X')
Now send EMPID and DummyKey Column to Hash file
Lookup on above Hash file and get EMPID matching on DummyKey with 'X' as lookup value
In the main transformer do EMPID+@OUTROWNUM, This value goes to
the KEY COLUMN in TARGET.

This is a safest way to make sure the MAX KEY works EVERYTIME.
laxmi_etl
Charter Member
Charter Member
Posts: 117
Joined: Thu Sep 28, 2006 9:10 am

Post by laxmi_etl »

Hi,

We are using Oracle database. we can not implement sequence in database. We need datastage to create the keys.
Can you give me more details on using the builtin key management routines.

Thanks
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Oracle does have sequences - is that restriction due to policy at your site? Search this forum for "KeyMgt" and you'll find lots of posts on the subject. The KeyMgt routines are part of the SDK and I don't know where that is documented.
laxmi_etl
Charter Member
Charter Member
Posts: 117
Joined: Thu Sep 28, 2006 9:10 am

Post by laxmi_etl »

Hi ArndW,

Yes, we have restriction at the site. I am not able find any posting on this issue. Any alternates, I am using 7.5 PX on AIX.
DeepakCorning
Premium Member
Premium Member
Posts: 503
Joined: Wed Jun 29, 2005 8:14 am

Post by DeepakCorning »

I implemented this the following way --

1. Select MAX_KEY from the table and put it in a HASHED file and use this HASHED file as a Dummy Lookup.
2. Declare a stage Variable say Count and If it is the first row which is getting processed then assign Count the MAX Value.
3. If it is not the first row then Assign Count as Count+1.

This way you do not have to store the MAX value eveytime when the job is finsihed. You will be picking it up when the job is triggered.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Sorry, I just realized that this was a PX question, in which case neither hashed files nor the KeyMgt works.

In PX this is even easier, since you actually have a surrogate key stage to take care of this for you!
laxmi_etl
Charter Member
Charter Member
Posts: 117
Joined: Thu Sep 28, 2006 9:10 am

Post by laxmi_etl »

I cant use surrogate key stage either as there is restrction. Let me explain the scenario in detail.

I have to move the data from source table to destination table on following condition.

Compare the IDNT field on source and target. if matches then do a update . if not,
get a maxvalue key from different table increment the key column by 1 and perform the insert to target table.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Create a (server?) job to select the max key value and park it somewhere, for example in a file, in that job's user status area.

Run this job and your main job from a job sequence. The job sequence picks up the value, and supplies it as a job parameter to your main job.

Increment the value in a Transformer stage in your main job either by using a derivation such as #StartingMaxValue# + @OUTROWNUM or by maintaining it in a stage variable initialized to the job parameter value and adding one for each row processed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
laxmi_etl
Charter Member
Charter Member
Posts: 117
Joined: Thu Sep 28, 2006 9:10 am

Post by laxmi_etl »

Can you be more specific on this solution. Sorry about that.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Which part don't you understand?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Curious to know the restriction on Surrogatekey generator. Its an internal stage of datastage. Will they restrict @OUTROWNUM as well?
You need to pick up the maximum from a table. Create a seperate job to get the max of the value and pass it to the other job as parameter. To pass it as an parameter, the simple way is to store it in an external file. Read and pass it.
If you got the max value, you can increment the values using transformer.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Surrogate Key Generator stage is a special case of Column Generator stage; it can only generate int16 or int32 values.

My earlier answer assumed sequential operation (in using @OUTROWNUM). You should prefer an expression involving the partition number and count of partitions (both of which are exposed through system variables), as well as row number, to get unique numbers across all processing nodes. This can still be added to the externally-obtained value, since partition number and row number are counted from zero in parallel jobs.

For this case you ought to SELECT MAX(keyvalue) + 1 FROM table; in the preliminary job, as the "next available key".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
laxmi_etl
Charter Member
Charter Member
Posts: 117
Joined: Thu Sep 28, 2006 9:10 am

Post by laxmi_etl »

Adding the stage variable and increment by one, works fine with one node. It does not work with multiple nodes.
Post Reply