Generate seq number in transformer only when the key changes

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Nsg
Premium Member
Premium Member
Posts: 37
Joined: Thu Jan 26, 2006 1:21 pm

Generate seq number in transformer only when the key changes

Post by Nsg »

Hello everyone,

Please help me achieve in datastage a looping logic to increment a counter when the values in the loop changes.
This can be very easily done in PL Sql or cobol but I need to do in datastage and donot know if datastage has good looping capabilities.

Here is an exaple of my input -

key1 Key2 Value
001 001 A
001 001 B
001 002 A
001 002 B
001 002 A

expected output -

key1 Key2 Value Generated_Key
001 001 A Key1 (as key1+Key2 has a new value)
001 001 B Key1 (as key1+Key2 doesnot have a new value)
001 002 A Key2 (as key1+Key2 has a new value)
001 002 B Key2 (as key1+Key2 doesnot have a new value)
001 002 A Key2 (as key1+Key2 doesnot have a new value)

Thanks,
nsg

s
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,
You need to first sort based on key1 and create key change column say "keychane1" then again sort on key2 and create key change column say "changekey2",then apply below logic in transformer stage variable stage:-

Counter= if((keychange1=1 and keychange2=1) or (keychange1=0 and keychange2=1)) then counter+1 else counter
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

or simply sort on both keys and apply the logic using stage variables. Make sure your data is correctly partitioned and sorted.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You should be able to create the key change column over both keys at once since the number should be incremented if either of them change. The counter derivation becomes simpler then:

Code: Select all

Counter: If KeyChange then Counter+1 else Counter
'KeyChange' is the name of the key change column generated by the Sort stage and can be treated as a Boolean. Make sure your initial value of the Counter stage variable is set to zero.

And, of course, handle any partitioning if running on multiple nodes.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Nsg
Premium Member
Premium Member
Posts: 37
Joined: Thu Jan 26, 2006 1:21 pm

Post by Nsg »

Thanks for the response.
What you are saying will work if teh job is run on single node.
But with multiple partitions, it is not working correcttly -

Here are some tests that I had done -
svkeyNew=If Keychange=1 Then (OldKey+PartNum+NumPart) Else svOldKey
svGeneratedKey = svKeyNew
svOldKey = svKeyNew
This works for PartNum = 0 but not the other PartNum

In the above -
svKeyNew=If Keychange=1 Then (PartNum+(NumPart*(svOldKey-1)) +1) Else svOldKey
This works for all partitions but PartNum = 0

I have tried many such combinations as well, but in vain.

I can achieve what I want by using the If statement to condition the 1st row i.e. when svOldKey is null .. but then that means running that additional check for all the records that pass through the transformer. ( I am dealign with 70+ million rows and want to keep the logic as minimum as possible for effecient use of the transformer).

Please help find an effecient way of doing this.

Thansk,
nsg
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It will work on multiple nodes, provided that the data are partitioned using a key-based algorithm (hash or modulus) on the key being used in change detection, or at least the first column thereof.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

svKey Initial value: @PARTITIONNUM
svKey derivation: if keyChange=1 then svKey+@NUMPARTITIONS else svKey

Assuming you have properly partitioned and sorted your data, each combination of Key1 and Key2 values will be given a unique generated key value.

Optionally, you could just use the surrogate key functionality built into the transformer since v8.0 :)

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

prasson_ibm wrote:To avoid any repartition.
No need.

The engine will only insert a partition in front of a stage which would require it (sorts, joins, merges, lookups and other key-requiring stages), if it feels that it's required. Won't happen with a transformer except in certain situations (such as if it's following a seqfile stage running in sequential mode--then you will typically see a RoundRobin partitioner inserted).

In this situation, the data wouldn't be repartitioned unless the developer explicitly coded it to do so.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Nsg
Premium Member
Premium Member
Posts: 37
Joined: Thu Jan 26, 2006 1:21 pm

Post by Nsg »

Thanks everyone for the response.

I am currently coding with sort using the key fields and hash partitioned. Keychange column is also generated.
The transformer is set to same partition as well.

The logic for genrating a sequence number only when the key value changes is coded with a If then else statement to fisrt row separately and all the following rows differently.
However, I was hoping to find a way to avaoid the use of If Then Else statment. Since this condition will be correct for 1st row but false for all the following 70 + million rows.

It is always fun to know if there are other better ways of getting to the desired result. So thanks for all your responses.

Thanks,
Neha
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What makes you think you need to handle the first row differently? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Nsg
Premium Member
Premium Member
Posts: 37
Joined: Thu Jan 26, 2006 1:21 pm

Post by Nsg »

I have tried many versions of partnum, numpart, counter etc but have not been able to generate a rule that will handle first row and all teh subsequent rows for all the partitions.

Just try the two examples I have provided in my earlier email considerign there are 2partitions - num - 0 and num - 1 and generate 4 keys and you will kno wwhat I mean.

Thanks,
nsg
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Explain what you mean by "handle first row". What do you think you need to do differently with it? Is your requirement to start with a particular value for the generated key, or just to generate unique values?

If the former, you could modify the Initial Value logic in my earlier post:

Code: Select all

svKey Initial Value:  @PARTITIONNUM * -1
and you will start of with the value 0 for the generated key, without additional per-row logic to see if it's the first row.

Regards,
Last edited by jwiles on Thu Mar 21, 2013 9:10 am, edited 1 time in total.
- james wiles


All generalizations are false, including this one - Mark Twain.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Two things. One is I can't try squat as I haven't had DataStage access for 3 years now, having moved over to the Dark Side. Second, doesn't the dang product allow you to set the initial value of a stage variable yet on the Parallel side? Been there since Day One for Server and being able to do so would solve your problem, hence my question.
-craig

"You can never have too many knives" -- Logan Nine Fingers
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Been able to do that as long as I've been doing parallel (7.0)
- james wiles


All generalizations are false, including this one - Mark Twain.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Good... then I don't see the need to play first row games and I see that you've posted to that effect. Thanks.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply