Aggregator problem!!

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Aggregator problem!!

Post by splayer »

In my aggregator, I have 10 columns coming in. Under the Grouping Keys
section, I have:

Group = Col1
Group = Col2
Group = Col3
Group = Col4
Group = Col5

Under Aggregations I have:

Column For Calculation = Col6
Sum Output Column = OutputCol6
Column For Calculation = Col7
Sum Output Column = OutputCol7
Column For Calculation = Col8
Sum Output Column = OutputCol8
Column For Calculation = Col9
Sum Output Column = OutputCol9
Column For Calculation = Dummy
Non-Missing Values Count Output Column = tot_cnt
Column For Calculation = Col10
Sum Output Column = OutputCol10

Col1 through Col5 are also keys. I was forced to create a Dummy column because I could not include Col1 in the Group and also for 'Column For Calculation'. I was getting the error:

"It is not legal to set a reducer (Col1) to be the same as a key field"

Now, my problem is, the aggregator is producing 5 rows and I want it to produce 1 row. How do I produce 1 row from the aggregator? Also, I would like to use Col1 to do a 'Non-missing values count output column'. I cannot do it because it is in the group. How do I do the count on the Col1 column? The Dummy column has the exact same value as Col1 when it comes from the transformer. I tried do it on the Dummy column but the result is the same. Once interesting thing, the number of rows out of the aggregator is sometimes different and sometimes same.

Thanks to anyone who responds.
ukyrvd
Premium Member
Premium Member
Posts: 73
Joined: Thu Feb 10, 2005 10:59 am

Post by ukyrvd »

Col1 through Col5 are also keys. I was forced to create a Dummy column because I could not include Col1 in the Group and also for 'Column For Calculation'. I was getting the error:

"It is not legal to set a reducer (Col1) to be the same as a key field"

Now, my problem is, the aggregator is producing 5 rows and I want it to produce 1 row
I would try with creating dummyKeys as dKey1=col1, dKey2=col2 etc.
thank you
- prasad
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

That's what I mentioned in my post that I already created a Dummy column. However, I still got 5 rows out of the aggregator where I should get just one.
ukyrvd
Premium Member
Premium Member
Posts: 73
Joined: Thu Feb 10, 2005 10:59 am

Post by ukyrvd »

I am sorry.. I thought you have created just one dummyColumn like "a" ..
thank you
- prasad
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

Yes, I did create only one column named Dummy. Can the partioning mode of the previous stage have anything to do with it? My previous stage to the aggregator is a Peek stage. After running, the link between the Peek and Aggregator stage shows a count of like this:

10000, 2000 rows/sec

and this causes 5 rows to be output. Sometime it is:

10000, 1666 rows/sec

and this causes 6 rows to be output. This is pretty strange!!
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Can you post few sample rows from your input data? Also, are you sorting the data going into the Aggregator stage?
Kris

Where's the "Any" key?-Homer Simpson
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

What's interesting is, I added one more aggregator stage after the first aggregator. In the 2nd one, I put all 9 columns in the group accept the total_count. I did a sum on the total_count. I still get the same number of rows, i.e., multiple rows. I checked sort. Still the same problem.

I am posting first 5 rows of the data.
---------------------------------------------------------------------------------
OHCLARK 010-00-00000-000-032 PENN CENTRAL TRANS CORP CO 00000000000000000000000000000002004 10 0000000000 0000000000 0000000000 32 BETHEL TOWNSHIP PUBLIC UTILITY REAL PROPERTY U 9216 000000000 39023052005 0000000000000000000000000000000 10 N N N
OHCLARK 010-00-00000-000-041 COLUMBIA GAS OF OHIO INC CO TAX DEPT 200 CIVIC CENTER DR 200 CIVIC CENTER DR COLUMBUS OH432154138 00000000000000000000000000000002004 10 0000000000 0000000000 0000000000 41 BETHEL TOWNSHIP PUBLIC UTILITY PERS PROPERTY P 0010 000000000 COLUMBUS, OH 43215 39023052005 0000000000000000000000000000000 10 N N Y C001
OHCLARK 010-00-00000-000-045 COLUMBIA GAS TRANSMISSION TAX DEPT 200 CIVIC CENTER DR 200 CIVIC CENTER DR COLUMBUS OH432154138 00000000000000000000000000000002004 10 0000000000 0000000000 0000000000 45 BETHEL TOWNSHIP PUBLIC UTILITY PERS PROPERTY P 0010 000000000 COLUMBUS, OH 43215 39023052005 0000000000000000000000000000000 10 N N Y C001
OHCLARK 010-00-00000-000-053 DAYTON POWER & LIGHT TAX DEPT PO BOX 8825 DAYTON OH45401 00000000000000000000000000000002004 10 0000000000 0000000000 0000000000 53 BETHEL TOWNSHIP PUBLIC UTILITY PERS PROPERTY P 0010 000000000 DAYTON, OH 45401 39023052005 0000000000000000000000000000000 10 N N N
OHCLARK 010-00-00000-000-073 OHIO BELL TELEPHONE SBC COMMUNICATIONS INC TAX DPT 1 BELL CTR 1 BELL CTR SAINT LOUIS MO631013004 00000000000000000000000000000002004 10 0000000000 0000000000 0000000000 73 BETHEL TOWNSHIP PUBLIC UTILITY PERS PROPERTY P 0010 000000000 ST LOUIS, MO 63101 39023052005 0000000000000000000000000000000 10 N N YRM 36-M-01 C074
---------------------------------------------------------------------------------

Thank you very much for your response.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

I am sorry but can you post those 5/6 rows from the output of the aggregator stage. I am not able to make out the keys from the input data you are grouping by. What I would like to see are the rows you are complaining about?(5 instead of 1)
Kris

Where's the "Any" key?-Homer Simpson
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

Here is the output from the aggregator:

"IDMA ","OH","0.0","0.0","0.0","0.0","2006-08-29 11:15:39 ","PX_Extract_IDMA_Common ","023","0.0","1.66230000000000000E+04"
"IDMA ","OH","0.0","0.0","0.0","0.0","2006-08-29 11:15:37 ","PX_Extract_IDMA_Common ","023","0.0","1.65800000000000000E+04"
"IDMA ","OH","0.0","0.0","0.0","0.0","2006-08-29 11:15:38 ","PX_Extract_IDMA_Common ","023","0.0","1.66020000000000000E+04"
"IDMA ","OH","0.0","0.0","0.0","0.0","2006-08-29 11:15:36 ","PX_Extract_IDMA_Common ","023","0.0","1.26510000000000000E+04"
"IDMA ","OH","0.0","0.0","0.0","0.0","2006-08-29 11:15:40 ","PX_Extract_IDMA_Common ","023","0.0","4.94300000000000000E+03"

All lines start with "IDMA". Hope this helps. Thank you very much.
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

Are you grouping on the date column? if so that is the one which is giving you 5 different columns.

Sam
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

What are the keys you are grouping by? If TimeStamp is not a key, how are you managing to get a Sum on the TimeStamp column?
Kris

Where's the "Any" key?-Homer Simpson
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

I am grouping on the date column. When I removed it, number of rows went down to 2. It is still 2 rows. I am not doing a sum on the timestamp column. The keys I am grouping by are SrcTag, State, CtyFips, JobName. The interesting thing is, the exact same job on the Server version allows grouping on timestamp and still returns 1 row only. There must be something different on the PX side.
Lotus26
Premium Member
Premium Member
Posts: 48
Joined: Tue Jul 13, 2004 2:09 pm

Post by Lotus26 »

Hi

you have to execute this job serially . I think you got 5 rows because the server may be having 5 partitioned. So in the input link of Aggregator stage set the Execution mode as sequential .Hope that works for you.
Thanks.
Regards
Lotus26
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

Actually, I executed this job in parallel and it is working fine. us1aslam1us 's point was right. The 5 columns were coming due to grouping on date. Now, I am getting 1 column. The only problem, I need to show the timestamp in the output as well. I am using 'Minimum value output column' but that is just displaying the year. I would like to display the minimum of the 5 unique timstamps.
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

Hi,

Think about this, when you specify 'minimum value output column' on the timestamp field, it will give you the sums for this particular timestamp.

Can you please specify the requirement in brief.

Thanks
Sam
Post Reply