Using Stage variables to compare 2 records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vskr72
Premium Member
Premium Member
Posts: 128
Joined: Wed Apr 28, 2004 9:36 pm

Using Stage variables to compare 2 records

Post by vskr72 »

I have a data from src like this:

Code: Select all

ColA, ColB 
 100, ABC
 100, ABC
 100, DEF
 100, DEF
I want to tag a new column like below:

Code: Select all

ColA, ColB, ColC
 100, ABC,   1
 100, ABC,   0
 100, DEF,   1
 100, DEF,   0
I know that this can be acheived through Sort stage with the CreateKeyChange option. There are more then 50 Mil records in Src, so, I was trying to avoid Sort stage. But, I would like to use the transformer stage to acheive this. Pls let me know if this is possible. Thanks,

Satish
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can not avoid sorting data unless the data are already sorted, because you must first have the keys grouped. Unless you want to compare "this record with the previous record" irrespective of keys.

In either case, set up two stage variables. One compares the current row with the previous row, and the other (later in the grid) stores the value of the current row which becomes the "previous" row when the next row is being processed. You also need to handle the special case of row #1, which is always different from the "previous" row because the latter does not exist.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

How many unique values do you expect ?

You can achieve the result directly from your database (if it comes from one).

There are other ways if it is from non-db type.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Still, based on your example it will need to be sorted by something, be it the Sort stage or your source (if a database) or something on the command line. If you do it in your source database, then you can set the stage to "Don't sort, already sorted" but still use it to create the column you need.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

What I meant was, if the db can handle, we can obtain the whole result from one query. Nothing to be done in DataStage.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

True, if the proper analytical functions available, but it's simple enough to do it in the tool as well.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vskr72
Premium Member
Premium Member
Posts: 128
Joined: Wed Apr 28, 2004 9:36 pm

Post by vskr72 »

The reason, I do not want to handle the comparision using DB is the volume is huge. Day 0 load will be close to 80 mil and every 5 mil incrementally. But, I will sort it using DB. Will try using Stage variabls and see how it goes.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If the column(s) by which the data are to be sorted or indexed using a B-tree index, then sorting is at no cost because a B-tree index is stored in sorted order.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply