Query on Sort / Remove Duplicates
Posted: Thu May 17, 2007 2:42 am
Hello All
I have a parallel job , the input with a set of 5 Keys and 2 value columns. There is a Parallel Sort Stage and a Remove Duplicate follows. The query is i sort the data on all 5 keys and then on value columns , and i remove duplicates on the first 3 Keys .
The expected behavior is to get "least value columns" , and works fine with a few 1000 records , the behavior is haphazard when data volume increases (improper data appearing in the input)
2 Queries :
a) Is the Job Design right ? (Sort on 5 keys and RD on only 3)
b) What can be a possible solution if Q1 answer is wrong ?
I have a parallel job , the input with a set of 5 Keys and 2 value columns. There is a Parallel Sort Stage and a Remove Duplicate follows. The query is i sort the data on all 5 keys and then on value columns , and i remove duplicates on the first 3 Keys .
The expected behavior is to get "least value columns" , and works fine with a few 1000 records , the behavior is haphazard when data volume increases (improper data appearing in the input)
2 Queries :
a) Is the Job Design right ? (Sort on 5 keys and RD on only 3)
b) What can be a possible solution if Q1 answer is wrong ?