Complex sorting and setting first, last sort column flag

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Complex sorting and setting first, last sort column flag

Post by pdntsap »

Hello,

We have a requirement where we need to sort data based on a certain number of columns and then flag the first and last column values of each sort group. As an example the following data is sorted by city, stage and zipcode and the first and last flags for each of the sort column is set:



City State ZipCode Street FIRST.City LAST.City FIRST.State LAST.State FIRST.ZipCode LAST.ZipCode

Miami FL 33133 Rice St 1 0 1 0 1 0

Miami FL 33133 Tom Ave 0 0 0 0 0 0

Miami FL 33133 Surrey Dr 0 0 0 0 0 1

Miami FL 33146 Nervia St 0 0 0 0 1 0

Miami FL 33146 Corsica St 0 0 0 1 0 1

Miami OH 45056 Myrtle St 0 1 1 1 1 1

Tucson AZ 85730 Glen Pl 1 1 1 1 1 1

A transformer stage will be of help in comparing the present column value with the previous column value and determine if the present column value is the first record of a particular group. But, I am not sure how I can find if the present column value is the last for that particular group. I hope I was able to present my requirement clearly and any help is greatly appreciated.

Thanks.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Do you need to know specifically that it's the last record of the group, or just that it's not the first?

Using the keyChange column created by a sort stage you can determine whether or not it's the first record (keyChange = 1 is first record, = 0 is NOT first record).

To find the last record, you could copy them to a second output link and use remove duplicates to retain the last record. If you need to values from the first record, capture them in stage variables using keyChange to update, then copy to each output record going into the remove duplicates stage.

You could also add a row count to the output records and resort on all keys plus the count (with the count descending). This would place the last record in first place for a group after the resort.

If you were using IS 8.5 or above, you could use the LastRowInGroup() function to identify the last record.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

Thanks James.
Do you need to know specifically that it's the last record of the group, or just that it's not the first?
I need to know that it's not the last record.
Using the keyChange column created by a sort stage you can determine whether or not it's the first record (keyChange = 1 is first record, = 0 is NOT first record).
Yes, this helps me in finding the first/last record based on all the sort keys. But I need to flag each key column as either first or last and not just the record as a whole.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Can you clarify what you mean by first and last columns? Do you mean the column values from the first and last records, or are you attempting to identify the minimum and maximum values of the columns, or perhaps something else?
- james wiles


All generalizations are false, including this one - Mark Twain.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is a technique shown in the version 7 Advanced DataStage class that uses three consecutive Sort stages to accomplish detection of the last record in a group. I'll need to track it down and post details.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

Can you clarify what you mean by first and last columns? Do you mean the column values from the first and last records,....
Yes it is the first and last columns but not for the group as a whole but for each key column in the group. I may not be explaining it properly but please look at the example in my first post. I need a flag for each key column for each record in the group that identifies if that particular key column is the first/last of the group.
I'll need to track it down and post details.


Ray, if you can please do it, it would be great. Is this class different from the learning center videos?(http://shop.dsxchange.net).

Thanks.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Probably something like:

1) Sort in reverse order, apply keyChange--identifies and places last record first
2) Sort in forward order with keyChange as additional key column (Sort in ascending order)--places identified last record last
3) Sort in forward order (Don't Sort, Previously Sorted), apply second keyChange column--identifies first record

Had to do similar years ago to identify a (preferred) kept record among dupes, but overlay data from the dupes to the kept. Replaced a mainframe sort exit where I could just queue the records in memory.
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply