Sort Merge

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
metlsas
Participant
Posts: 8
Joined: Sun May 30, 2004 5:55 pm

Sort Merge

Post by metlsas »

Hi,

I am getting data from a seq file where I am sorting it using sort stage and sending it back to another seq file. This is very simple job.

seq-------Xfr-------sort------seq

When i did it in server job it takes me 7 to 8 hours when there is no extra load on the server , when i use the PX it takes 1 hour.

My question in I m givin the partitioning option in the sort stage (AUTO) and collecting in the seq with (Round Robin )

I want to know is there any difference between Round Robin and Sort Merge.

we have to 2 node system.
baabi_26
Participant
Posts: 14
Joined: Mon Jan 24, 2005 5:31 pm

Post by baabi_26 »

Hi,

Definitely there is a difference between sort merge and round robin collection methods.

Sort Merge uses your columns (one or more that you have specified) to decide the order of collecting the results. Round-Robin, as you know, just collects the records in an orderly round robin fashion.

Since you have specified Auto as your partioning scheme, DataStage most likely will use round-robin to partition the data. Performance of the job will be good if you stick to Round-Robin. I would like to know the run-time of the job when you change it to Sort-Merge.

Thanks
Naveen
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Warning, if you do not partition on the key you're sorting (using the appropriate fields), you run a risk of not being able to group similiar fields that may be splayed across multiple partitions.

Be careful here, and pay very close attention to your data as you experiment and understand how it works.
metlsas
Participant
Posts: 8
Joined: Sun May 30, 2004 5:55 pm

8 hr to 35 min

Post by metlsas »

When i use PX with Round Robin it is 45 min

and I m runnning with sort merge I will let u know

How exactly will in the Sort Merge works in selecting the keys is are any preference when I have to 3 keys , which key comes first


Thanks in Advance.
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Round Robin = Split evening across partitions.

Sort Merge = While merging into one partition, sort the data.

Sorting take processing time dependent on the amount of data you're using. Do you REALLY need to sort?
metlsas
Participant
Posts: 8
Joined: Sun May 30, 2004 5:55 pm

Post by metlsas »

yes we need to sort the data that comes into the final target table. That's y we are using the Sort Merge in the collecting.

What u say is if we use round robin we will get job much faster than sort merge if we dont need to sort the data.
Post Reply