Need help on grouping and selecting the first record

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Need help on grouping and selecting the first record

Post by pdntsap »

Hello,

We have a file with about 37 columns. We need to group the data based on 15 columns (let us say Column1 - Column 15) and then choose the first record based on Column 13. We are relatively new to DataStage and please let us know the best method of implementing the above logic.

Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

Use a Sort stage to form the groups, and a Transformer stage to preserve the first from each group. Note that a Remove Duplicates stage is not what you want here, since you can't guarantee that the data are sorted by Column13.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

Thanks Ray. I will try it and update my post accordingly.
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

I was looking into the thread below:

viewtopic.php?t=119708

The input is sorted based on 15 keys and I was trying to implement the following transformer logic suggested by Chulett:

svCurrGroup derivation = KEY1 : KEY2: ...KEY15
svIsNewGroup derivation = svCurrGroup <> svPrevGroup
svPrevGroup derivation = svCurrGroup

I guess this method does not work if one of the columns has a null value. Am I on the right track? What is the best method of implementation if one of the columns has a null value?

Thanks.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There should be a NullToEmpty() transform and perhaps a NullToValue() one as well, wrap each element that could be null with one. An empty string will work fine whereas a null will not, that or another value that won't show up on its lonesome in your data - I've used a '?' or a '*' in the past to make them easier to spot.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

Enclosing the nullable column using the NullToEmpty() transform before concatenation seems to have fixed the problem.
Is this the most efficient method of implementing this?

Thanks for the help.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Depends of the "this" you are asking about.

As Ray noted, you can let the Sort stage add a "Key Change Column" to the data which makes it trivial to know the first record of a group. If you are already sorting the data, it would be a little more efficient to take this path but I can't imagine the difference between the two would be significant.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

By 'this", I meant enclosing the nullable column using the NullToEmpty() transform.

Thanks for the help.
Post Reply