Using sorter and aggregator to group by 3 columns!!!

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
arun_im4u
Premium Member
Premium Member
Posts: 104
Joined: Mon Nov 08, 2004 8:42 am

Using sorter and aggregator to group by 3 columns!!!

Post by arun_im4u »

Hi,

If i have 3 columns with duplicate records in all 3 columns. how will I retrieve 1 record from column1 which has the max value in column2 which in turn has the max value in column3.

column1 column2 column3
6 500 25
6 400 30
6 500 10
7 300 20
7 300 30

From this group of records i have to retrieve row1 and row5. how can i do this in datastage. In my real scenario these 3 columns have more than 100000 records

Thank you.
xcb
Premium Member
Premium Member
Posts: 66
Joined: Wed Mar 05, 2003 6:03 pm
Location: Brisbane, Australia
Contact:

Post by xcb »

A quick way is to sort your data (with a sort stage or with an order by clause) in ascending order on all three columns and then load the data into a hash file where column1 is the key. As the data is written with destructive writes any previous value that is there for a given key is overwritten. The result will be unique values for column1 with max values for both column2 and column3.
Cameron Boog
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Put your values into a temp table. And then use the sql

Select Key_Column, Max(Col2), Max(Col3) from Table
Group by Key_Column;

Using datastage sort and aggregator stage will work fine for small volume of records. Using database for these sort will give you performance in your load.

Thanks
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I've always been wary of multiple max functions within an aggregation as it can create fictitious records. Does your requirement call for the max of column B and C even if they occur on different records or does it need to keep a record together?

For example with:
column1 column2 column3
6 500 25
6 400 30
6 500 10

Do you want 6 500 25, a complete record, or 6 500 30, a mix of two records that represent the max values.

I would go with a single aggregation stage and see if the performance is acceptable. This is the easiest approach to build and maintain. If it's too slow then look at RDBMS aggregation and pre-sorting etc.
changming
Participant
Posts: 68
Joined: Wed Oct 13, 2004 3:35 am

sort and hash file

Post by changming »

I have ever used the sort stag and hash file, as said in previous one said, it wor fine if the data amount is not huge.
if the data is too large, sort using unix script then use hash file or aggrerator to get the last one
Post Reply