Remove duplicates in Datastage MVS edition

Sandeep.pendem · Post by **Sandeep.pendem** » Sat May 31, 2008 2:36 pm

Hi,
I have fixed width file in Datastage mainframe job, When I used aggregator stage my job failes with a SORT error, despite of using SORT stage prior to aggreagtor stage along with a intermediate fixed width file the job fails with the same error message.

Can anyone tell me how to remove duplicates from a file in mainframe edition of datastage, as such we have very few processing stages in mainframe edition.

ray.wurlod · Post by **ray.wurlod** » Sat May 31, 2008 4:14 pm

Welcome aboard.

Add a sort ahead of the Aggregator stage, sorting by the grouping (duplicate identifier) keys. You can use a Sort stage or, if the data are coming from a relational table, specify the ordering in the extraction.

On the Output page General tab, select the Group By option rather than the Control Break option.

Sandeep.pendem · Post by **Sandeep.pendem** » Sun Jun 01, 2008 11:46 am

[
Hi Ray,
Thanks for the support, I have tried using a sort stage before a aggregator stage, but it seems we cant have a sort stage before aggreator stage, since when I link sort stage followed by aggregator stage it gives me compilation error as input should be a file, relational table for aggregator stage. do I need to introduc another flat file after a sort stage?

quote="ray.wurlod"]Welcome aboard.

Add a sort ahead of the Aggregator stage, sorting by the grouping (duplicate identifier) keys. You can use a Sort stage or, if the data are coming from a relational table, specify ...[/quote]

ray.wurlod · Post by **ray.wurlod** » Sun Jun 01, 2008 2:13 pm

Yes, you do need to stage the data. A flat file is as good a way as any.

Sandeep.pendem · Post by **Sandeep.pendem** » Mon Jun 02, 2008 8:46 am

Hi ,

I have already put a flat file after a sort stage then followed by a aggreagtor stage still gets the same sort error message. below is the job design for the same, Do I need to have 2 separate jobs one with a sort stage and other hob with an aggreagator stage or anything specific?

Flat file(i/p) --->Sort --->Flat file--->Aggreagator -->transfromer-->Lookup---> flat file

Thanks,
Sandeep S Pendem

ray.wurlod · Post by **ray.wurlod** » Mon Jun 02, 2008 4:11 pm

As far as I can tell without seeing the detail of your design (sort keys etc) it should be able to remove duplicates satisfactorily. There ought to be no need for more than one job. Are you sorting and aggregating (grouping) the same keys (the ones that identify "duplicates"? How are you getting the other fields (if any) through the Aggregator stage?

rameshrr3 · Post by **rameshrr3** » Mon Jun 02, 2008 5:23 pm

A hob is what i need to bake an alleagator cut..

ray.wurlod · Post by **ray.wurlod** » Mon Jun 02, 2008 5:27 pm

It's the Spanish pronunciation of "job".

Do you get a lot of alligators in California?

rameshrr3 · Post by **rameshrr3** » Mon Jun 02, 2008 5:29 pm

Louisiana is the place to go..