Finding Similar records & assigning Groups
Moderators: chulett, rschirm, roy
Thanks Ray for your reply. Yes I am appending "G".
Here is the output:
-------------------------
100 ABC 1 G1
100 ABC 0 G1
200 CDE 1 G1
200 CDE 0 G1
300 DEF 1 G2
300 DEF 0 G2
400 GHI 1 G3
400 DGI 0 G3
500 XYZ 1 G4
500 XYZ 0 G4
Here records with 100 & 200 key fields has come to the same group (G1) which is incorrect. I could see the group is repeating again for the non similar key values.
Will it be an issue due to the incorrect usage of Stage variable ? (the derivation given in Stage variable is : (if not(DSLink19.keyChange) then (StageVar) else StageVar+1)
Thanks.
Freddie
Here is the output:
-------------------------
100 ABC 1 G1
100 ABC 0 G1
200 CDE 1 G1
200 CDE 0 G1
300 DEF 1 G2
300 DEF 0 G2
400 GHI 1 G3
400 DGI 0 G3
500 XYZ 1 G4
500 XYZ 0 G4
Here records with 100 & 200 key fields has come to the same group (G1) which is incorrect. I could see the group is repeating again for the non similar key values.
Will it be an issue due to the incorrect usage of Stage variable ? (the derivation given in Stage variable is : (if not(DSLink19.keyChange) then (StageVar) else StageVar+1)
Thanks.
Freddie
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I guess it could be called a partitioning problem... remember that is you are running your job in on multiple nodes then you'll basically be performing the same calculation on each node... that is starting from 1 and adding one each time you'll encounter a new key.
So, you'll need to include the partition # as well, for instance append it in front of the 'G' ....or run the job sequentially...
So, you'll need to include the partition # as well, for instance append it in front of the 'G' ....or run the job sequentially...
_________________
- Susanne
- Susanne
At the moment, I don't see any other explaination for why it seemingly works for all group changes except from the first to the second.
And I don't believe you'll need to worry about the partition number, just partitioning things properly by the group keys should do the trick.
And I don't believe you'll need to worry about the partition number, just partitioning things properly by the group keys should do the trick.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
lets just assume that you are running on a 2-node config file and you have a key based partition on the "join" column.
lets that you'll have the key 100, 300, 400 and 500 running on one node/partition and the key 200 on the second node...
in this case the group numbers generated based on the boolean values would be:
Node 1:
Key Group
100: 1
300: 1+1 = 2
400: 1+2 = 3
500: 1+3 = 4
Node 2:
Key Group
200: 1
To confirm this you could look at the distribution in the monitor... and also, you should be able to get the correct result when running sequential
lets that you'll have the key 100, 300, 400 and 500 running on one node/partition and the key 200 on the second node...
in this case the group numbers generated based on the boolean values would be:
Node 1:
Key Group
100: 1
300: 1+1 = 2
400: 1+2 = 3
500: 1+3 = 4
Node 2:
Key Group
200: 1
To confirm this you could look at the distribution in the monitor... and also, you should be able to get the correct result when running sequential
_________________
- Susanne
- Susanne
-
- Participant
- Posts: 5
- Joined: Wed Nov 02, 2011 9:08 pm
- Location: Hyderabad
Re: Finding Similar records & assigning Groups
Hi DSFreddie,
1.funnel stage
2.sort using key based
3.transformer take 2 stage variables stg1,stg2,
4. In stg1 initial value=1, stg2 take key colume which u want to group eg.100,200 key column
5.finally if key column=stg2 then stg1+1 else 1(write this condition at stg1)
u wil get group wise numbers...
100 1
100 2
200 1
200 2
200 3
300 1
300 2....so on ...
1.funnel stage
2.sort using key based
3.transformer take 2 stage variables stg1,stg2,
4. In stg1 initial value=1, stg2 take key colume which u want to group eg.100,200 key column
5.finally if key column=stg2 then stg1+1 else 1(write this condition at stg1)
u wil get group wise numbers...
100 1
100 2
200 1
200 2
200 3
300 1
300 2....so on ...
Arun
Re: Finding Similar records & assigning Groups
Thanks a lot all for all the inputs. It worked fine when i selected the correct partioning.
Regards,
Freddie
Regards,
Freddie
Hi,
In transformer Use 3 stage variables sv1, sv2, sv3 with datatype as integer
sv1 : (Input column which need to be groped)
sv2 : If (sv1=sv3) Then sv2 Else (sv2+1)
sv3 : sv1
Map stage variable sv2 to o/p column GroupId---> 'G':sv2
In transformer select sequential mode in Advance tab
Thanks,
Sanjay.
In transformer Use 3 stage variables sv1, sv2, sv3 with datatype as integer
sv1 : (Input column which need to be groped)
sv2 : If (sv1=sv3) Then sv2 Else (sv2+1)
sv3 : sv1
Map stage variable sv2 to o/p column GroupId---> 'G':sv2
In transformer select sequential mode in Advance tab
Thanks,
Sanjay.