Page 1 of 1

Need help on grouping and selecting the first record

Posted: Tue Jul 19, 2011 2:02 pm
by pdntsap
Hello,

We have a file with about 37 columns. We need to group the data based on 15 columns (let us say Column1 - Column 15) and then choose the first record based on Column 13. We are relatively new to DataStage and please let us know the best method of implementing the above logic.

Thanks.

Posted: Tue Jul 19, 2011 4:13 pm
by ray.wurlod
Welcome aboard.

Use a Sort stage to form the groups, and a Transformer stage to preserve the first from each group. Note that a Remove Duplicates stage is not what you want here, since you can't guarantee that the data are sorted by Column13.

Posted: Tue Jul 19, 2011 4:42 pm
by pdntsap
Thanks Ray. I will try it and update my post accordingly.

Posted: Wed Jul 20, 2011 3:40 pm
by pdntsap
I was looking into the thread below:

viewtopic.php?t=119708

The input is sorted based on 15 keys and I was trying to implement the following transformer logic suggested by Chulett:

svCurrGroup derivation = KEY1 : KEY2: ...KEY15
svIsNewGroup derivation = svCurrGroup <> svPrevGroup
svPrevGroup derivation = svCurrGroup

I guess this method does not work if one of the columns has a null value. Am I on the right track? What is the best method of implementation if one of the columns has a null value?

Thanks.

Posted: Wed Jul 20, 2011 3:55 pm
by chulett
There should be a NullToEmpty() transform and perhaps a NullToValue() one as well, wrap each element that could be null with one. An empty string will work fine whereas a null will not, that or another value that won't show up on its lonesome in your data - I've used a '?' or a '*' in the past to make them easier to spot.

Posted: Thu Jul 21, 2011 1:24 pm
by pdntsap
Enclosing the nullable column using the NullToEmpty() transform before concatenation seems to have fixed the problem.
Is this the most efficient method of implementing this?

Thanks for the help.

Posted: Thu Jul 21, 2011 2:32 pm
by chulett
Depends of the "this" you are asking about.

As Ray noted, you can let the Sort stage add a "Key Change Column" to the data which makes it trivial to know the first record of a group. If you are already sorting the data, it would be a little more efficient to take this path but I can't imagine the difference between the two would be significant.

Posted: Tue Jul 26, 2011 2:40 pm
by pdntsap
By 'this", I meant enclosing the nullable column using the NullToEmpty() transform.

Thanks for the help.