Aggregator stage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
iamrajy
Participant
Posts: 20
Joined: Mon Apr 26, 2004 10:38 am

Aggregator stage

Post by iamrajy »

I am aggregating/grouping on the first 4 keys to avoid the duplicate value
but for some reason I am getting a duplicate rows. Some rows are eliminted and some are still there.

In the O/P tab I have specified the grouping on first 4 columns

Duplicate record
1026|1024|1000|1010|82|2004-04-27 12:48:33
1026|1024|1000|1010|82|2004-04-27 12:48:33
1026|1024|1000|1010|82|2004-04-27 12:48:33


1006|1022|1000|1010|0|2004-04-27 16:01:27
1006|1022|1000|1010|0|2004-04-27 16:01:27

Does somebody can give some advise.

-Thanks
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
try to see if there is no other column that has the group by checked on it in any other column/s of your table definition.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Have you specified grouping on just these four columns, or others as well?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
iamrajy
Participant
Posts: 20
Joined: Mon Apr 26, 2004 10:38 am

Post by iamrajy »

Ray,

I have only grouping on those 4 columns and they are the Primary key in table which I am trying to insert through SQL loader.


Roy,
I am using SQL loader to load the data in database but in the aggregator I load the column information from table defination. I have seen that the table defination is in sink with aggregator.




The reason why I have this aggregator is to avoid duplicate value. If their is some other way to achive it, do let me know

Thanks for your help.

Please advise.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If the input data are sorted on the four grouping columns, you can use stage variables to detect change. This is a much more efficient mechanism than the Aggregator stage. Search the Forum for examples.
Another approach is to use a hashed file keyed on these four columns. All writes to hashed files are destructive overwrites (based on key) so it's a fairly slick means of de-duplication.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
iamrajy
Participant
Posts: 20
Joined: Mon Apr 26, 2004 10:38 am

Post by iamrajy »

Thanks Ray. I am using hash file and the process is working fine.
Post Reply