aggrigating the data

deva · Post by **deva** » Thu Jan 17, 2008 8:50 am

Hi,
I have one flat file have 114 cols. I have to select the data based on the max(effective date), How to do this rek.

shall i need to use aggregator , if I use aggregator stage I need to group by all 113 columns except max(date) column? or its enough to group by key columns?

or any other simple way to do this?

Thanks in advance...

sachin1 · Post by **sachin1** » Thu Jan 17, 2008 9:18 am

do you want all the columns to be propageted.

deva · Post by **deva** » Thu Jan 17, 2008 9:21 am

sachin1 wrote:do you want all the columns to be propageted.

I need all columns output

sachin1 · Post by **sachin1** » Thu Jan 17, 2008 9:36 am

just to give u an example

SQL> select * from tbb1;

COL1 COL2 DT1
---------- ---------- ---------
1 1 17-JAN-08
1 1 18-JAN-08
1 1 19-JAN-08
1 2 19-JAN-08
2 2 19-JAN-08
1 1 21-JAN-08
---------------------------------------above is input --------------------------

you want an output like

SQL> select col1,col2,max(dt1) from tbb1 group by col1,col2;

COL1 COL2 MAX(DT1)
---------- ---------- ---------
1 1 21-JAN-08
1 2 19-JAN-08
2 2 19-JAN-08

then for above case you don't need to group by col1, col2 in aggregate stage.

deva · Post by **deva** » Thu Jan 17, 2008 9:51 am

sachin1 wrote:just to give u an example

SQL> select * from tbb1;

COL1 COL2 DT1
---------- ---------- ---------
1 1 17-JAN-08
1 1 18-JAN-08
1 1 19-JAN-08
1 2 19-JAN-08
2 2 19-JAN-08
1 1 21-JAN-08
---------------------------------------above is input --------------------------

you want an output like

SQL> select col1,col2,max(dt1) from tbb1 group by col1,col2;

COL1 COL2 MAX(DT1)
---------- ---------- ---------
1 1 21-JAN-08
1 2 19-JAN-08
2 2 19-JAN-08

then for above case you don't need to group by col1, col2 in aggregate stage.

If I did in this way I am getting rest of columns which are not group by are getting error'no derivation found' for the other cols.

gateleys · Post by **gateleys** » Thu Jan 17, 2008 9:57 am

deva wrote:
sachin1 wrote:I am getting rest of columns which are not group by are getting error'no derivation found' for the other cols.

You will have to use 'Last' or 'First', whichever is appropriate) with all the columns that are not used in the Group By clause.

sachin1 · Post by **sachin1** » Thu Jan 17, 2008 10:44 am

in derivation just put column name as it is and see what do you get without checking group by check box.

Minhajuddin · Post by **Minhajuddin** » Thu Jan 17, 2008 12:27 pm

I have not used an aggregator in a server job. But I don't think you can pass the columns on which you are not grouping through aggregator.

You may have to split the flow into two streams, do aggregation on one stream and then join it back to the main stream.

kumar_s · Post by **kumar_s** » Thu Jan 17, 2008 12:35 pm

If you have lot many columns, as mentioned it would be easiar wiser to aggregate the required column based on any particular key alone and join it back with the existing stream.

ray.wurlod · Post by **ray.wurlod** » Thu Jan 17, 2008 3:58 pm

You may find it easier to use an ODBC driver for text files and throw an appropriate SQL query at it.

Code: Select all

SELECT column_list FROM filename T1 WHERE column = (SELECT MAX(column) FROM filename T2 WHERE T1.key = T2.key);

Or you could use two streams, as others have suggested, or two jobs. The issue is joining the two streams back together. Something like:

Code: Select all

                  SeqFile  ----->  HashedFile
                                        :
                                        :
                                        V
SeqFile  ----->  Aggregator  ----->  Transformer  ----->

Both SeqFile stages read your text file. The Aggregator stage forms the Max of the column in question. The key used for grouping and for reference lookup is from the text file; if there isn't one suitable, insert a Transformer stage in each stream to generate one.

DSXchange

aggrigating the data

aggrigating the data

Re: aggrigating the data

Re: aggrigating the data

Re: aggrigating the data

Re: aggrigating the data

Re: aggrigating the data

Re: aggrigating the data