Page 1 of 1

Aggregator stage

Posted: Tue Apr 27, 2004 2:19 pm
by iamrajy
I am aggregating/grouping on the first 4 keys to avoid the duplicate value
but for some reason I am getting a duplicate rows. Some rows are eliminted and some are still there.

In the O/P tab I have specified the grouping on first 4 columns

Duplicate record
1026|1024|1000|1010|82|2004-04-27 12:48:33
1026|1024|1000|1010|82|2004-04-27 12:48:33
1026|1024|1000|1010|82|2004-04-27 12:48:33


1006|1022|1000|1010|0|2004-04-27 16:01:27
1006|1022|1000|1010|0|2004-04-27 16:01:27

Does somebody can give some advise.

-Thanks

Posted: Wed Apr 28, 2004 2:06 am
by roy
Hi,
try to see if there is no other column that has the group by checked on it in any other column/s of your table definition.

IHTH,

Posted: Wed Apr 28, 2004 3:52 am
by ray.wurlod
Have you specified grouping on just these four columns, or others as well?

Posted: Wed Apr 28, 2004 8:18 am
by iamrajy
Ray,

I have only grouping on those 4 columns and they are the Primary key in table which I am trying to insert through SQL loader.


Roy,
I am using SQL loader to load the data in database but in the aggregator I load the column information from table defination. I have seen that the table defination is in sink with aggregator.




The reason why I have this aggregator is to avoid duplicate value. If their is some other way to achive it, do let me know

Thanks for your help.

Please advise.

Posted: Wed Apr 28, 2004 4:19 pm
by ray.wurlod
If the input data are sorted on the four grouping columns, you can use stage variables to detect change. This is a much more efficient mechanism than the Aggregator stage. Search the Forum for examples.
Another approach is to use a hashed file keyed on these four columns. All writes to hashed files are destructive overwrites (based on key) so it's a fairly slick means of de-duplication.

Posted: Thu Apr 29, 2004 8:48 am
by iamrajy
Thanks Ray. I am using hash file and the process is working fine.