Finding Similar records & assigning Groups

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

DSFreddie
Participant
Posts: 130
Joined: Wed Nov 25, 2009 2:16 pm

Post by DSFreddie »

Thanks Ray for your reply. Yes I am appending "G".

Here is the output:
-------------------------
100 ABC 1 G1
100 ABC 0 G1
200 CDE 1 G1
200 CDE 0 G1
300 DEF 1 G2
300 DEF 0 G2
400 GHI 1 G3
400 DGI 0 G3
500 XYZ 1 G4
500 XYZ 0 G4

Here records with 100 & 200 key fields has come to the same group (G1) which is incorrect. I could see the group is repeating again for the non similar key values.

Will it be an issue due to the incorrect usage of Stage variable ? (the derivation given in Stage variable is : (if not(DSLink19.keyChange) then (StageVar) else StageVar+1)

Thanks.
Freddie
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That could be a partitioning problem. If you are running on more than one node, a simple experiment should answer that question - run the job on a single node. Is the result correct then?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

And if that works, partition on the key field. Modulus is a good choice for the partitioning algorithm, since you have an integer key.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
suse_dk
Participant
Posts: 93
Joined: Thu Aug 11, 2011 6:18 am
Location: Denmark

Post by suse_dk »

I guess it could be called a partitioning problem... remember that is you are running your job in on multiple nodes then you'll basically be performing the same calculation on each node... that is starting from 1 and adding one each time you'll encounter a new key.

So, you'll need to include the partition # as well, for instance append it in front of the 'G' ....or run the job sequentially...
_________________
- Susanne
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

At the moment, I don't see any other explaination for why it seemingly works for all group changes except from the first to the second. :?

And I don't believe you'll need to worry about the partition number, just partitioning things properly by the group keys should do the trick.
-craig

"You can never have too many knives" -- Logan Nine Fingers
suse_dk
Participant
Posts: 93
Joined: Thu Aug 11, 2011 6:18 am
Location: Denmark

Post by suse_dk »

lets just assume that you are running on a 2-node config file and you have a key based partition on the "join" column.

lets that you'll have the key 100, 300, 400 and 500 running on one node/partition and the key 200 on the second node...

in this case the group numbers generated based on the boolean values would be:

Node 1:
Key Group
100: 1
300: 1+1 = 2
400: 1+2 = 3
500: 1+3 = 4

Node 2:
Key Group
200: 1

To confirm this you could look at the distribution in the monitor... and also, you should be able to get the correct result when running sequential
_________________
- Susanne
Arun Reddy
Participant
Posts: 5
Joined: Wed Nov 02, 2011 9:08 pm
Location: Hyderabad

Re: Finding Similar records & assigning Groups

Post by Arun Reddy »

Hi DSFreddie,

1.funnel stage
2.sort using key based
3.transformer take 2 stage variables stg1,stg2,
4. In stg1 initial value=1, stg2 take key colume which u want to group eg.100,200 key column
5.finally if key column=stg2 then stg1+1 else 1(write this condition at stg1)

u wil get group wise numbers...
100 1
100 2
200 1
200 2
200 3
300 1
300 2....so on ...
Arun
DSFreddie
Participant
Posts: 130
Joined: Wed Nov 25, 2009 2:16 pm

Re: Finding Similar records & assigning Groups

Post by DSFreddie »

Thanks a lot all for all the inputs. It worked fine when i selected the correct partioning.

Regards,
Freddie
suse_dk
Participant
Posts: 93
Joined: Thu Aug 11, 2011 6:18 am
Location: Denmark

Post by suse_dk »

And that would be...modulus or?
_________________
- Susanne
sanjayS
Participant
Posts: 16
Joined: Mon Apr 18, 2011 10:56 pm

Post by sanjayS »

Hi,

In transformer Use 3 stage variables sv1, sv2, sv3 with datatype as integer

sv1 : (Input column which need to be groped)

sv2 : If (sv1=sv3) Then sv2 Else (sv2+1)

sv3 : sv1

Map stage variable sv2 to o/p column GroupId---> 'G':sv2

In transformer select sequential mode in Advance tab

Thanks,
Sanjay.
Post Reply