Hi,
I am doing some performance tunning in one job. For that I am using Link Partitioner stage for partitioning the data. In this if I use Round Robin algorithm its running fine. But when i use Hash algorithm and in the link collector stage using Sort/Merge, job is running long time. What could be the problem.Is it necessary to sort the data before hash partitioning it?
My question may be somewhat lengthy but please give me solution patiently.
My job design:
seqfile--->LinkPartitioner-->3 XFM stages --> Linkcollector-->DB2
thanks in advance.
Hashing algorithm in Link Partitiner
Moderators: chulett, rschirm, roy
Hi Ravi
Sort always has overhead while running job and depends on the volumes. Is there a specific need to sort the data before sending it to DB2.?
Sort always has overhead while running job and depends on the volumes. Is there a specific need to sort the data before sending it to DB2.?
Regards
Siva
Listening to the Learned
"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
Siva
Listening to the Learned
"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
Sort is not necessary for partition.
The issue may be with data. If you apply the hash partiton based on the key you specified, it may likely to divide the data into three partiton, but not equally. May be more or all the data may fall under single partiton. Round robin is always good to split the records equally (more or less) to all the partition when compared to hash (Unless otherwise required).
The issue may be with data. If you apply the hash partiton based on the key you specified, it may likely to divide the data into three partiton, but not equally. May be more or all the data may fall under single partiton. Round robin is always good to split the records equally (more or less) to all the partition when compared to hash (Unless otherwise required).
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Hashing algorithm in Link Partitiner
Hi Rasi,
thanks for reply. there is no need to sort the data. Just I am splitting the data into 3 transformer stages and collecting into one db2 table using Link Collector. I want to improve performance. What is the performance overhead using Hashing algorithm?
thanks Kumar. I am using 2 Transformer stages b/w Link Partitioner and Link collector stage. when I run the job with 10 records and using Hashing alogorithm with key col is PK. Its distributed the records like 7 recs to 1 XFM stage, 1 rec to 2nd XFM and 2 recs to 3rd XFM stage. How its dirstributing the records? How many groups will it create by default?
please give me the solution patiently.
thanks in advance.
thanks for reply. there is no need to sort the data. Just I am splitting the data into 3 transformer stages and collecting into one db2 table using Link Collector. I want to improve performance. What is the performance overhead using Hashing algorithm?
thanks Kumar. I am using 2 Transformer stages b/w Link Partitioner and Link collector stage. when I run the job with 10 records and using Hashing alogorithm with key col is PK. Its distributed the records like 7 recs to 1 XFM stage, 1 rec to 2nd XFM and 2 recs to 3rd XFM stage. How its dirstributing the records? How many groups will it create by default?
please give me the solution patiently.
thanks in advance.
Ravi
DataStage has its inbuilt hashing algorithm. It applies to the field you supply. Now the record is distributed based on the reminder/resultant. And it divides to the number of partiton applied. It can be something like all the numbers ends with 2,4,8 may go to 1st partition and the some kind of odd numbers goes to 2nd partiton and so on....
You should get more insight if you go through the documentation provided for parallel jobs.
You should get more insight if you go through the documentation provided for parallel jobs.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Is it really necessary to collect the rows together before inserting into DB2 table? Why not have three parallel streams loading DB2? If the keys are unique (which they will be if you've partitioned on the key column) there will be no contention for locks.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.