Query on Sort / Remove Duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ag_ram
Premium Member
Premium Member
Posts: 524
Joined: Wed Feb 28, 2007 3:51 am

Query on Sort / Remove Duplicates

Post by ag_ram »

Hello All

I have a parallel job , the input with a set of 5 Keys and 2 value columns. There is a Parallel Sort Stage and a Remove Duplicate follows. The query is i sort the data on all 5 keys and then on value columns , and i remove duplicates on the first 3 Keys .

The expected behavior is to get "least value columns" , and works fine with a few 1000 records , the behavior is haphazard when data volume increases (improper data appearing in the input)

2 Queries :

a) Is the Job Design right ? (Sort on 5 keys and RD on only 3)

b) What can be a possible solution if Q1 answer is wrong ?
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

Answer to the first question is no. You might be getting a warning regarding this in the job log. You will have to use the same keys for both sort and remove duplicate stage and in the same order. To get the least value can set the sort order to ascending and the duplicates to retain to first. You also need to hash partition the data on the same keys.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

Use Sort stage to Sort (Asc)/Partition on the 3 keys you want to remove duplicate and specify remove duplicates to True in sort stage itself.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
seshikumar
Participant
Posts: 44
Joined: Fri Mar 16, 2007 5:51 am

re:how to removing duplicates or sorting jobs in server jobs

Post by seshikumar »

pla can any body help me
how do removing duplicates in server jobs[/b]
seshu
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

Seshikumar you are asking the question in the wrong forum. Post the query in server forum for better results. Anyway here is one link for your answer. viewtopic.php?t=102106&highlight=remove+duplicates
Post Reply