Page 1 of 1

Using sorter and aggregator to group by 3 columns!!!

Posted: Wed Dec 08, 2004 6:52 pm
by arun_im4u
Hi,

If i have 3 columns with duplicate records in all 3 columns. how will I retrieve 1 record from column1 which has the max value in column2 which in turn has the max value in column3.

column1 column2 column3
6 500 25
6 400 30
6 500 10
7 300 20
7 300 30

From this group of records i have to retrieve row1 and row5. how can i do this in datastage. In my real scenario these 3 columns have more than 100000 records

Thank you.

Posted: Wed Dec 08, 2004 6:59 pm
by xcb
A quick way is to sort your data (with a sort stage or with an order by clause) in ascending order on all three columns and then load the data into a hash file where column1 is the key. As the data is written with destructive writes any previous value that is there for a given key is overwritten. The result will be unique values for column1 with max values for both column2 and column3.

Posted: Wed Dec 08, 2004 10:55 pm
by rasi
Put your values into a temp table. And then use the sql

Select Key_Column, Max(Col2), Max(Col3) from Table
Group by Key_Column;

Using datastage sort and aggregator stage will work fine for small volume of records. Using database for these sort will give you performance in your load.

Thanks

Posted: Thu Dec 09, 2004 12:50 am
by vmcburney
I've always been wary of multiple max functions within an aggregation as it can create fictitious records. Does your requirement call for the max of column B and C even if they occur on different records or does it need to keep a record together?

For example with:
column1 column2 column3
6 500 25
6 400 30
6 500 10

Do you want 6 500 25, a complete record, or 6 500 30, a mix of two records that represent the max values.

I would go with a single aggregation stage and see if the performance is acceptable. This is the easiest approach to build and maintain. If it's too slow then look at RDBMS aggregation and pre-sorting etc.

sort and hash file

Posted: Thu Dec 09, 2004 4:47 am
by changming
I have ever used the sort stag and hash file, as said in previous one said, it wor fine if the data amount is not huge.
if the data is too large, sort using unix script then use hash file or aggrerator to get the last one