handling duplicate records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

handling duplicate records

Post by just4u_sharath »

Guys,
I normally use remove duplicate stage to handle duplicates. There many other options such as sort stage, transformer stage, aggregator and hash file(server edition) to handle duplicates. My question are
1. which is the best way to handle duplictes if volume of data is huge.
2. Is it better to handle duplicate at database level or handle it through datastage.

thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It depends (as always).

If the columns that identify duplicates are indexed, then it's better done in the database, because it can be solved quickly in the index.

A unique sort of some kind removes duplicates fairly efficiently. The Remove Duplicates stage gives you the opportunity to specify the first or last in each group as the record to retain. If this is your business requirement, then use the Remove Duplicates stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

I would say it is best done in the most economical environment which still meets your SLA agreement with the end customer.

If your Database is VERY busy servicing reports for your end users, every CPU second saved on that system is important, then do it in DataStage.

Statistically your database will perform that activity a lot faster and as a result you will transfer less data across your network. Your over all job will be faster due to the filtering happening as far upstream as it possibly could.
Post Reply