Problem with Aggregator

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

somu_june
Premium Member
Premium Member
Posts: 439
Joined: Wed Sep 14, 2005 9:28 am
Location: 36p,reading road

Problem with Aggregator

Post by somu_june »

Hi,

Please help me in getting out from this error . My input has a validfromdate columm with SQL Type is char and length of that field is 8.

When Iam using an aggregator Stage for calculatng Min(validfromdate). Iam getting an error

Aggr_Min_VFD: Error when checking operator: It is not legal to set a reducer (ValidFromDate) to be the same as a key field,
when the type of that field (string) is not a double.


How to over come this. Please help me . Iam using Auto partition and Groupby ValidFromDate
Aggregation Type : calculation and

another question is there any way so that I can remove Groupby and Ineed only Aggregation like Min(validfromDate)


Thanks,
Somaraju
somaraju
aramachandra
Participant
Posts: 55
Joined: Tue Sep 20, 2005 10:58 am

Re: Problem with Aggregator

Post by aramachandra »

The aggregator stage in the parellel engine does not work with the date and char datatype as far as i know. It works well with decimals and numbers only...

I have had that same issue as well, where i was trying to get the max date for a given record set..

In our environment folks have used the sort stage to achieve the same effect
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Problem with Aggregator

Post by kwwilliams »

You should create a static column, something like DUMMY. Give all of the rows for this column the same value. Sort by Dummy and your date column, with the date being a ascending order. Stream this output to a remove duplicate stage. Specify the key as dummy and to only retain the first one. Then you would have your min date. There may be other ways to do it, but that is the method I have used when all I really want is a min or max column.
somu_june
Premium Member
Premium Member
Posts: 439
Joined: Wed Sep 14, 2005 9:28 am
Location: 36p,reading road

Re: Problem with Aggregator

Post by somu_june »

Hi armachandra,

Can you explain in detail how to sort and get min Validfromdate.




Thanks,
Somaraju
somaraju
somu_june
Premium Member
Premium Member
Posts: 439
Joined: Wed Sep 14, 2005 9:28 am
Location: 36p,reading road

Re: Problem with Aggregator

Post by somu_june »

Hi Williams,

Thanks for reply, Let's see if I've got this right. Please correct me if Iam wrong. I have to create a static columm dummy in aggregator stage output and I should assaign all the input rows validfromdate columm to dummy columm. Then I have to sort in aggregator with dummy and validfromdate columm. Then I have to connect the aggregator output to remove duplicate stage and I have to specify key as dummy then I will have my min Validfromdate


Thanks,
somaraju
somaraju
Ultramundane
Participant
Posts: 407
Joined: Mon Jun 27, 2005 8:54 am
Location: Walker, Michigan
Contact:

Re: Problem with Aggregator

Post by Ultramundane »

You could also get your dsadm to download patch 61040 from Ascential which allows you to perform min and max on char and varchar columns. If you get the patch and apply it a new option called preserveType will become available when choosing the columns to aggregate. Select this property and set it to yes to perform a min or max on a char or varchar column.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi Somuraj,

But the actuall compliant of the Aggregator stage is due to the use of field
ValidFromDate as grouping key as well as value field.
You are not allowd to do this. If you just want to just remove the duplicates of the date filed and retain the first record, you can sort using sort stage and make use of the Allow duplicated option to set it to False so that it retains the first value.

-Kumar
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Problem with Aggregator

Post by kwwilliams »

I think you should forget the aggregator. Create a column in the modify stage (should be less overhead than a transofrmer), and populate this column with a string, you could use the jobname if you want. Your modify stage should have two output columns, stream these to a sort stage and sort by the column you created and the date you want the min on in ascending order. Output the sort to a remove dupicate stage. In the properties set the key equal to the column you created (not the column properties) and tell it to retain the first row. Because the data is presorted, it will take the first row and output it. You should be able to handle it from there. Let me know if you have any problems with it.
somu_june
Premium Member
Premium Member
Posts: 439
Joined: Wed Sep 14, 2005 9:28 am
Location: 36p,reading road

Post by somu_june »

Hi Kumar,

My Actuall requirement is to get min value from (ValidFromdatecolumm)

Ex I have dates like 19942008
19942004
19942004
19800205
I want 19800205 as my output which is minum value and I used sort stage and remove duplicate stage but Iam getting min values for each duplicate values. I want min value from entire input. Is there any stage so that I can get min val with out an aggregator stage.

Thanks,
Somaraju
somaraju
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Did you create the dummy value??? Without it, you have no way to remove your duplicate. You need to create a static column so that your data would look like this:

Dummy Date
------------------------------------------
DUMMY 19942008
DUMMY 19942004
DUMMY 19942004
DUMMY 19800205

In the sort stage sort by Dummy and Date. Then remove duplicates based upon dummy and you will get 19942008. You need to have a value that you can use as a false key. You tried using date as the key, remove duplicates removed all of the duplicate dates, which is not what you want. Aggregator needs a column to group by as well. Because you should have all ready sorted before the data set before the aggregator, I said don't use an aggregator. the remove duplicate option will be more efficient.
MTA
Participant
Posts: 37
Joined: Thu Feb 02, 2006 2:25 pm

Post by MTA »

Somu jume,
Be sure to keep the value in the dummy field as a constant so that when you specify group by then all values in the vertual dataset are considered ( assuming you have removed duplicated in the sort stage or remove duplicate stage) then utilise the inbuilt function in Agg EE stage to get the minimum value of your input.
M.T.Anwer
The day the child realizes that all adults are imperfect he becomes an adolescent;
the day he forgives them, he becomes an adult; the day he forgives himself, he becomes wise.
-Aiden Nowlan
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Removing the duplicate will produce the minimum date you shouldn't an aggregator it would just be an extra step. The remove duplicate stage will keep the very first record it finds with DUMMY in the dummy column, which by having sorted the date in asending order will give you the min date.
somu_june
Premium Member
Premium Member
Posts: 439
Joined: Wed Sep 14, 2005 9:28 am
Location: 36p,reading road

Post by somu_june »

Hi williams,

your logic worked thanks for your help in getting out from this problem




Thanks,
srinuraju
somaraju
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Not a problem. Thanks for posting the solution works so next time someone else can search for minimum field and get an answer to their question.
Ultramundane
Participant
Posts: 407
Joined: Mon Jun 27, 2005 8:54 am
Location: Walker, Michigan
Contact:

Post by Ultramundane »

As was mentioned earlier there are multiple solutions. I am just offering another solution (on Unix) for those that might be interested.

Use an external filter stage with the below code for MIN or MAX:

Code: Select all

## Author: Ryan Putnam
## Intent: Get MAX
awk 'BEGIN {getline; MAX=$0;} {if (( $0 > MAX )) { MAX=$0; } } END{print MAX;}'

Code: Select all

## Author: Ryan Putnam
## Intent: Get MIN
awk 'BEGIN {getline; MIN=$0;} {if (( $0 < MIN )) { MIN=$0; } } END{print MIN;}'
Post Reply