aggrigating the data
Moderators: chulett, rschirm, roy
aggrigating the data
Hi,
I have one flat file have 114 cols. I have to select the data based on the max(effective date), How to do this rek.
shall i need to use aggregator , if I use aggregator stage I need to group by all 113 columns except max(date) column? or its enough to group by key columns?
or any other simple way to do this?
Thanks in advance...
I have one flat file have 114 cols. I have to select the data based on the max(effective date), How to do this rek.
shall i need to use aggregator , if I use aggregator stage I need to group by all 113 columns except max(date) column? or its enough to group by key columns?
or any other simple way to do this?
Thanks in advance...
Re: aggrigating the data
do you want all the columns to be propageted.
Re: aggrigating the data
I need all columns outputsachin1 wrote:do you want all the columns to be propageted.
Re: aggrigating the data
just to give u an example
SQL> select * from tbb1;
COL1 COL2 DT1
---------- ---------- ---------
1 1 17-JAN-08
1 1 18-JAN-08
1 1 19-JAN-08
1 2 19-JAN-08
2 2 19-JAN-08
1 1 21-JAN-08
---------------------------------------above is input --------------------------
you want an output like
SQL> select col1,col2,max(dt1) from tbb1 group by col1,col2;
COL1 COL2 MAX(DT1)
---------- ---------- ---------
1 1 21-JAN-08
1 2 19-JAN-08
2 2 19-JAN-08
then for above case you don't need to group by col1, col2 in aggregate stage.
SQL> select * from tbb1;
COL1 COL2 DT1
---------- ---------- ---------
1 1 17-JAN-08
1 1 18-JAN-08
1 1 19-JAN-08
1 2 19-JAN-08
2 2 19-JAN-08
1 1 21-JAN-08
---------------------------------------above is input --------------------------
you want an output like
SQL> select col1,col2,max(dt1) from tbb1 group by col1,col2;
COL1 COL2 MAX(DT1)
---------- ---------- ---------
1 1 21-JAN-08
1 2 19-JAN-08
2 2 19-JAN-08
then for above case you don't need to group by col1, col2 in aggregate stage.
Re: aggrigating the data
If I did in this way I am getting rest of columns which are not group by are getting error'no derivation found' for the other cols.sachin1 wrote:just to give u an example
SQL> select * from tbb1;
COL1 COL2 DT1
---------- ---------- ---------
1 1 17-JAN-08
1 1 18-JAN-08
1 1 19-JAN-08
1 2 19-JAN-08
2 2 19-JAN-08
1 1 21-JAN-08
---------------------------------------above is input --------------------------
you want an output like
SQL> select col1,col2,max(dt1) from tbb1 group by col1,col2;
COL1 COL2 MAX(DT1)
---------- ---------- ---------
1 1 21-JAN-08
1 2 19-JAN-08
2 2 19-JAN-08
then for above case you don't need to group by col1, col2 in aggregate stage.
Re: aggrigating the data
You will have to use 'Last' or 'First', whichever is appropriate) with all the columns that are not used in the Group By clause.deva wrote:sachin1 wrote:I am getting rest of columns which are not group by are getting error'no derivation found' for the other cols.
gateleys
Re: aggrigating the data
in derivation just put column name as it is and see what do you get without checking group by check box.
-
- Participant
- Posts: 467
- Joined: Tue Mar 20, 2007 6:36 am
- Location: Chennai
- Contact:
I have not used an aggregator in a server job. But I don't think you can pass the columns on which you are not grouping through aggregator.
You may have to split the flow into two streams, do aggregation on one stream and then join it back to the main stream.
You may have to split the flow into two streams, do aggregation on one stream and then join it back to the main stream.
Minhajuddin
<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You may find it easier to use an ODBC driver for text files and throw an appropriate SQL query at it.
Or you could use two streams, as others have suggested, or two jobs. The issue is joining the two streams back together. Something like:
Both SeqFile stages read your text file. The Aggregator stage forms the Max of the column in question. The key used for grouping and for reference lookup is from the text file; if there isn't one suitable, insert a Transformer stage in each stream to generate one.
Code: Select all
SELECT column_list FROM filename T1 WHERE column = (SELECT MAX(column) FROM filename T2 WHERE T1.key = T2.key);
Code: Select all
SeqFile -----> HashedFile
:
:
V
SeqFile -----> Aggregator -----> Transformer ----->
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.