Need help on grouping and selecting the first record

pdntsap · Post by **pdntsap** » Tue Jul 19, 2011 2:02 pm

Hello,

We have a file with about 37 columns. We need to group the data based on 15 columns (let us say Column1 - Column 15) and then choose the first record based on Column 13. We are relatively new to DataStage and please let us know the best method of implementing the above logic.

Thanks.

ray.wurlod · Post by **ray.wurlod** » Tue Jul 19, 2011 4:13 pm

Welcome aboard.

Use a Sort stage to form the groups, and a Transformer stage to preserve the first from each group. Note that a Remove Duplicates stage is not what you want here, since you can't guarantee that the data are sorted by Column13.

pdntsap · Post by **pdntsap** » Tue Jul 19, 2011 4:42 pm

Thanks Ray. I will try it and update my post accordingly.

pdntsap · Post by **pdntsap** » Wed Jul 20, 2011 3:40 pm

I was looking into the thread below:

viewtopic.php?t=119708

The input is sorted based on 15 keys and I was trying to implement the following transformer logic suggested by Chulett:

svCurrGroup derivation = KEY1 : KEY2: ...KEY15
svIsNewGroup derivation = svCurrGroup <> svPrevGroup
svPrevGroup derivation = svCurrGroup

I guess this method does not work if one of the columns has a null value. Am I on the right track? What is the best method of implementation if one of the columns has a null value?

Thanks.

chulett · Post by **chulett** » Wed Jul 20, 2011 3:55 pm

There should be a NullToEmpty() transform and perhaps a NullToValue() one as well, wrap each element that could be null with one. An empty string will work fine whereas a null will not, that or another value that won't show up on its lonesome in your data - I've used a '?' or a '*' in the past to make them easier to spot.

pdntsap · Post by **pdntsap** » Thu Jul 21, 2011 1:24 pm

Enclosing the nullable column using the NullToEmpty() transform before concatenation seems to have fixed the problem.
Is this the most efficient method of implementing this?

Thanks for the help.

chulett · Post by **chulett** » Thu Jul 21, 2011 2:32 pm

Depends of the "this" you are asking about.

As Ray noted, you can let the Sort stage add a "Key Change Column" to the data which makes it trivial to know the first record of a group. If you are already sorting the data, it would be a little more efficient to take this path but I can't imagine the difference between the two would be significant.

pdntsap · Post by **pdntsap** » Tue Jul 26, 2011 2:40 pm

By 'this", I meant enclosing the nullable column using the NullToEmpty() transform.

Thanks for the help.