Sorting in Datastage
Posted: Wed Feb 26, 2014 6:16 pm
I have a sequential file with the below data.
column names: a,b,c,d,e,f
101,5,A,20,30,X
101,4,S,10,26,Z
101,1,C,15,24,R
101,3,B,20,44,S
102,7,W,20,20,D
102,2,F,40,53,T
102,1,Z,75,34,A
102,3,V,10,34,S
103,4,V,20,31,X
103,11,W,90,22,D
103,2,C,35,32,D
103,9,S,30,45,S
I am trying to get highest and second highest values for each group. To do that I am using the following steps.
After reading from Seq_File, passing to Sort Stage (key column a, assigning key change column) , filtering 1's and 0's (2 out put links) based of value in key change column and passing records with value 0 in key change column to another sort stage (key column a, assigning key change column) , filtering out 1's and passing joining the records based on key.
I think there should be a better approach than this. Can any one please suggest me the best approach ?
Expected Output:
101,5,A,20,30,X,4,S,10,26,Z
102,7,W,20,20,D,3,V,10,34,S
103,4,V,20,31,X,9,S,30,45,S
Thanks,
Anu
column names: a,b,c,d,e,f
101,5,A,20,30,X
101,4,S,10,26,Z
101,1,C,15,24,R
101,3,B,20,44,S
102,7,W,20,20,D
102,2,F,40,53,T
102,1,Z,75,34,A
102,3,V,10,34,S
103,4,V,20,31,X
103,11,W,90,22,D
103,2,C,35,32,D
103,9,S,30,45,S
I am trying to get highest and second highest values for each group. To do that I am using the following steps.
After reading from Seq_File, passing to Sort Stage (key column a, assigning key change column) , filtering 1's and 0's (2 out put links) based of value in key change column and passing records with value 0 in key change column to another sort stage (key column a, assigning key change column) , filtering out 1's and passing joining the records based on key.
I think there should be a better approach than this. Can any one please suggest me the best approach ?
Expected Output:
101,5,A,20,30,X,4,S,10,26,Z
102,7,W,20,20,D,3,V,10,34,S
103,4,V,20,31,X,9,S,30,45,S
Thanks,
Anu