Page 1 of 1

Complex sorting and setting first, last sort column flag

Posted: Tue Nov 22, 2011 10:31 am
by pdntsap
Hello,

We have a requirement where we need to sort data based on a certain number of columns and then flag the first and last column values of each sort group. As an example the following data is sorted by city, stage and zipcode and the first and last flags for each of the sort column is set:



City State ZipCode Street FIRST.City LAST.City FIRST.State LAST.State FIRST.ZipCode LAST.ZipCode

Miami FL 33133 Rice St 1 0 1 0 1 0

Miami FL 33133 Tom Ave 0 0 0 0 0 0

Miami FL 33133 Surrey Dr 0 0 0 0 0 1

Miami FL 33146 Nervia St 0 0 0 0 1 0

Miami FL 33146 Corsica St 0 0 0 1 0 1

Miami OH 45056 Myrtle St 0 1 1 1 1 1

Tucson AZ 85730 Glen Pl 1 1 1 1 1 1

A transformer stage will be of help in comparing the present column value with the previous column value and determine if the present column value is the first record of a particular group. But, I am not sure how I can find if the present column value is the last for that particular group. I hope I was able to present my requirement clearly and any help is greatly appreciated.

Thanks.

Posted: Tue Nov 22, 2011 11:18 am
by jwiles
Do you need to know specifically that it's the last record of the group, or just that it's not the first?

Using the keyChange column created by a sort stage you can determine whether or not it's the first record (keyChange = 1 is first record, = 0 is NOT first record).

To find the last record, you could copy them to a second output link and use remove duplicates to retain the last record. If you need to values from the first record, capture them in stage variables using keyChange to update, then copy to each output record going into the remove duplicates stage.

You could also add a row count to the output records and resort on all keys plus the count (with the count descending). This would place the last record in first place for a group after the resort.

If you were using IS 8.5 or above, you could use the LastRowInGroup() function to identify the last record.

Regards,

Posted: Tue Nov 22, 2011 12:29 pm
by pdntsap
Thanks James.
Do you need to know specifically that it's the last record of the group, or just that it's not the first?
I need to know that it's not the last record.
Using the keyChange column created by a sort stage you can determine whether or not it's the first record (keyChange = 1 is first record, = 0 is NOT first record).
Yes, this helps me in finding the first/last record based on all the sort keys. But I need to flag each key column as either first or last and not just the record as a whole.

Posted: Tue Nov 22, 2011 1:29 pm
by jwiles
Can you clarify what you mean by first and last columns? Do you mean the column values from the first and last records, or are you attempting to identify the minimum and maximum values of the columns, or perhaps something else?

Posted: Tue Nov 22, 2011 2:22 pm
by ray.wurlod
There is a technique shown in the version 7 Advanced DataStage class that uses three consecutive Sort stages to accomplish detection of the last record in a group. I'll need to track it down and post details.

Posted: Tue Nov 22, 2011 3:13 pm
by pdntsap
Can you clarify what you mean by first and last columns? Do you mean the column values from the first and last records,....
Yes it is the first and last columns but not for the group as a whole but for each key column in the group. I may not be explaining it properly but please look at the example in my first post. I need a flag for each key column for each record in the group that identifies if that particular key column is the first/last of the group.
I'll need to track it down and post details.


Ray, if you can please do it, it would be great. Is this class different from the learning center videos?(http://shop.dsxchange.net).

Thanks.

Posted: Tue Nov 22, 2011 3:20 pm
by jwiles
Probably something like:

1) Sort in reverse order, apply keyChange--identifies and places last record first
2) Sort in forward order with keyChange as additional key column (Sort in ascending order)--places identified last record last
3) Sort in forward order (Don't Sort, Previously Sorted), apply second keyChange column--identifies first record

Had to do similar years ago to identify a (preferred) kept record among dupes, but overlay data from the dupes to the kept. Replaced a mainframe sort exit where I could just queue the records in memory.