Page 1 of 1

Grouping in transformer

Posted: Mon May 19, 2014 9:41 am
by manoj_23sakthi
Hi,

could you please help me to solve this .

In particular group if any value changes I have to pass the particular group in a link 2 . If value doesn't change pass it in link1

Input :
-------
Key | Value
A|01
A|01
A|01
B|01
B|01
B|02
C|01
D|02
D|03

Output:
-------

Link 1 :
A|01
A|01
A|01
C|01

Link 2:
B|01
B|01
B|02
D|02
D|03

I tried in transformer looping to achieve this by last row in group , i am not able to achieve this ..

Posted: Mon May 19, 2014 10:07 am
by chulett
Since you need to process the entire group before you know what link any of them should go down, I'd suggest a "fork join" design where you evaluate the groups for the number of distinct values and then use that as a lookup for the main data flow. If that Key has 1 distinct value, then route all in that group to Link 1. More than 1 distinct value? Link 2.

Posted: Mon May 19, 2014 11:53 am
by manoj_23sakthi
Yes
even if i have more than one distinct value i have to send in link 1

Posted: Mon May 19, 2014 12:11 pm
by chulett
No, according to your text and samples more than one distinct value in the "Value" column for a Key makes it go to Link 2. So in your example, A has three values but only one distinct value -> Link 1. B has three values but two distinct values -> Link 2.

The fork join will handle all that for you. The lookup should have one row per Key value with a count of distinct Values that Key contains.

Posted: Mon May 19, 2014 12:36 pm
by manoj_23sakthi
Hi,
Even if it is distinct or duplicate ,i have to retain if a key contains same value in a group in link 1,Else retain the group in link 2

Posted: Mon May 19, 2014 12:45 pm
by chulett
I think there's some confusion around what the word "distinct" means. Regardless, as noted twice now, a fork join design will get you your desired outcome.

Posted: Mon May 19, 2014 1:18 pm
by manoj_23sakthi
distinct (unique in group )

Posted: Mon May 19, 2014 4:57 pm
by ray.wurlod
Use a fork join design, as already indicated. Search DSXchange for details about how to do fork join designs.

Posted: Tue May 20, 2014 1:19 am
by ssnegi
From source have copy stage. create two outputs from copy stage output1 & output2.
From output2--> Remove Duplicate stage --> hash partitioned sorted key, sorted only value
Then from remove duplicate stage -->
Aggregator stage-->partitioning same --> Group Key-->count rows.
Then join output1 from copy to output from aggregator based on key.
Then filter constraint count = 1 in link1 (having same values) and count > 1 in link2 (having different values).