Hashing algorithm in Link Partitiner

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ravij
Premium Member
Premium Member
Posts: 170
Joined: Mon Oct 10, 2005 7:04 am
Location: India

Hashing algorithm in Link Partitiner

Post by ravij »

Hi,

I am doing some performance tunning in one job. For that I am using Link Partitioner stage for partitioning the data. In this if I use Round Robin algorithm its running fine. But when i use Hash algorithm and in the link collector stage using Sort/Merge, job is running long time. What could be the problem.Is it necessary to sort the data before hash partitioning it?
My question may be somewhat lengthy but please give me solution patiently.
My job design:

seqfile--->LinkPartitioner-->3 XFM stages --> Linkcollector-->DB2

thanks in advance.
Ravi
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Hi Ravi

Sort always has overhead while running job and depends on the volumes. Is there a specific need to sort the data before sending it to DB2.?
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Sort is not necessary for partition.
The issue may be with data. If you apply the hash partiton based on the key you specified, it may likely to divide the data into three partiton, but not equally. May be more or all the data may fall under single partiton. Round robin is always good to split the records equally (more or less) to all the partition when compared to hash (Unless otherwise required).
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ravij
Premium Member
Premium Member
Posts: 170
Joined: Mon Oct 10, 2005 7:04 am
Location: India

Hashing algorithm in Link Partitiner

Post by ravij »

Hi Rasi,

thanks for reply. there is no need to sort the data. Just I am splitting the data into 3 transformer stages and collecting into one db2 table using Link Collector. I want to improve performance. What is the performance overhead using Hashing algorithm?

thanks Kumar. I am using 2 Transformer stages b/w Link Partitioner and Link collector stage. when I run the job with 10 records and using Hashing alogorithm with key col is PK. Its distributed the records like 7 recs to 1 XFM stage, 1 rec to 2nd XFM and 2 recs to 3rd XFM stage. How its dirstributing the records? How many groups will it create by default?

please give me the solution patiently.
thanks in advance.
Ravi
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

DataStage has its inbuilt hashing algorithm. It applies to the field you supply. Now the record is distributed based on the reminder/resultant. And it divides to the number of partiton applied. It can be something like all the numbers ends with 2,4,8 may go to 1st partition and the some kind of odd numbers goes to 2nd partiton and so on....
You should get more insight if you go through the documentation provided for parallel jobs.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Is it really necessary to collect the rows together before inserting into DB2 table? Why not have three parallel streams loading DB2? If the keys are unique (which they will be if you've partitioned on the key column) there will be no contention for locks.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply