Hello All
I have a parallel job , the input with a set of 5 Keys and 2 value columns. There is a Parallel Sort Stage and a Remove Duplicate follows. The query is i sort the data on all 5 keys and then on value columns , and i remove duplicates on the first 3 Keys .
The expected behavior is to get "least value columns" , and works fine with a few 1000 records , the behavior is haphazard when data volume increases (improper data appearing in the input)
2 Queries :
a) Is the Job Design right ? (Sort on 5 keys and RD on only 3)
b) What can be a possible solution if Q1 answer is wrong ?
Query on Sort / Remove Duplicates
Moderators: chulett, rschirm, roy
Answer to the first question is no. You might be getting a warning regarding this in the job log. You will have to use the same keys for both sort and remove duplicate stage and in the same order. To get the least value can set the sort order to ascending and the duplicates to retain to first. You also need to hash partition the data on the same keys.
-
- Participant
- Posts: 612
- Joined: Thu May 03, 2007 4:59 am
- Location: Melbourne
Use Sort stage to Sort (Asc)/Partition on the 3 keys you want to remove duplicate and specify remove duplicates to True in sort stage itself.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
-
- Participant
- Posts: 44
- Joined: Fri Mar 16, 2007 5:51 am
re:how to removing duplicates or sorting jobs in server jobs
pla can any body help me
how do removing duplicates in server jobs[/b]
how do removing duplicates in server jobs[/b]
seshu
Seshikumar you are asking the question in the wrong forum. Post the query in server forum for better results. Anyway here is one link for your answer. viewtopic.php?t=102106&highlight=remove+duplicates