aggrigating the data

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
deva
Participant
Posts: 104
Joined: Fri Dec 29, 2006 1:54 pm

aggrigating the data

Post by deva »

Hi,
I have one flat file have 114 cols. I have to select the data based on the max(effective date), How to do this rek.

shall i need to use aggregator , if I use aggregator stage I need to group by all 113 columns except max(date) column? or its enough to group by key columns?

or any other simple way to do this?

Thanks in advance...
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Re: aggrigating the data

Post by sachin1 »

do you want all the columns to be propageted.
deva
Participant
Posts: 104
Joined: Fri Dec 29, 2006 1:54 pm

Re: aggrigating the data

Post by deva »

sachin1 wrote:do you want all the columns to be propageted.
I need all columns output
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Re: aggrigating the data

Post by sachin1 »

just to give u an example

SQL> select * from tbb1;

COL1 COL2 DT1
---------- ---------- ---------
1 1 17-JAN-08
1 1 18-JAN-08
1 1 19-JAN-08
1 2 19-JAN-08
2 2 19-JAN-08
1 1 21-JAN-08
---------------------------------------above is input --------------------------

you want an output like


SQL> select col1,col2,max(dt1) from tbb1 group by col1,col2;

COL1 COL2 MAX(DT1)
---------- ---------- ---------
1 1 21-JAN-08
1 2 19-JAN-08
2 2 19-JAN-08

then for above case you don't need to group by col1, col2 in aggregate stage.
deva
Participant
Posts: 104
Joined: Fri Dec 29, 2006 1:54 pm

Re: aggrigating the data

Post by deva »

sachin1 wrote:just to give u an example

SQL> select * from tbb1;

COL1 COL2 DT1
---------- ---------- ---------
1 1 17-JAN-08
1 1 18-JAN-08
1 1 19-JAN-08
1 2 19-JAN-08
2 2 19-JAN-08
1 1 21-JAN-08
---------------------------------------above is input --------------------------

you want an output like


SQL> select col1,col2,max(dt1) from tbb1 group by col1,col2;

COL1 COL2 MAX(DT1)
---------- ---------- ---------
1 1 21-JAN-08
1 2 19-JAN-08
2 2 19-JAN-08

then for above case you don't need to group by col1, col2 in aggregate stage.
If I did in this way I am getting rest of columns which are not group by are getting error'no derivation found' for the other cols.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Re: aggrigating the data

Post by gateleys »

deva wrote:
sachin1 wrote:I am getting rest of columns which are not group by are getting error'no derivation found' for the other cols.
You will have to use 'Last' or 'First', whichever is appropriate) with all the columns that are not used in the Group By clause.
gateleys
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Re: aggrigating the data

Post by sachin1 »

in derivation just put column name as it is and see what do you get without checking group by check box.
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

I have not used an aggregator in a server job. But I don't think you can pass the columns on which you are not grouping through aggregator.

You may have to split the flow into two streams, do aggregation on one stream and then join it back to the main stream.
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

If you have lot many columns, as mentioned it would be easiar wiser to aggregate the required column based on any particular key alone and join it back with the existing stream.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You may find it easier to use an ODBC driver for text files and throw an appropriate SQL query at it.

Code: Select all

SELECT column_list FROM filename T1 WHERE column = (SELECT MAX(column) FROM filename T2 WHERE T1.key = T2.key);
Or you could use two streams, as others have suggested, or two jobs. The issue is joining the two streams back together. Something like:

Code: Select all

                  SeqFile  ----->  HashedFile
                                        :
                                        :
                                        V
SeqFile  ----->  Aggregator  ----->  Transformer  ----->
Both SeqFile stages read your text file. The Aggregator stage forms the Max of the column in question. The key used for grouping and for reference lookup is from the text file; if there isn't one suitable, insert a Transformer stage in each stream to generate one.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply