Hash Partioining - doubt
Posted: Wed Jan 17, 2007 7:25 am
Hi all,
I have a job which process about 80 Million to 100 Million of records.
In a joiner stage, I have selected the hash partioning and used the key column, col1 (2000 different possible values for this column) and doing a sort and partition based on that key column value.
I have 2 doubts:
1. I have another one column, col2 which has about 60 possible values.Which column I should use for partitioning, col1 or col2?
2.Also, I have not sorted the incoming data, I was said that if I sort the data using the sorter stage, before this partition, performance will be better. Is it correct?
Thanks in advance!
I have a job which process about 80 Million to 100 Million of records.
In a joiner stage, I have selected the hash partioning and used the key column, col1 (2000 different possible values for this column) and doing a sort and partition based on that key column value.
I have 2 doubts:
1. I have another one column, col2 which has about 60 possible values.Which column I should use for partitioning, col1 or col2?
2.Also, I have not sorted the incoming data, I was said that if I sort the data using the sorter stage, before this partition, performance will be better. Is it correct?
Thanks in advance!