Page 1 of 1

Sort Merge

Posted: Tue Jan 25, 2005 12:12 pm
by metlsas
Hi,

I am getting data from a seq file where I am sorting it using sort stage and sending it back to another seq file. This is very simple job.

seq-------Xfr-------sort------seq

When i did it in server job it takes me 7 to 8 hours when there is no extra load on the server , when i use the PX it takes 1 hour.

My question in I m givin the partitioning option in the sort stage (AUTO) and collecting in the seq with (Round Robin )

I want to know is there any difference between Round Robin and Sort Merge.

we have to 2 node system.

Posted: Tue Jan 25, 2005 3:34 pm
by baabi_26
Hi,

Definitely there is a difference between sort merge and round robin collection methods.

Sort Merge uses your columns (one or more that you have specified) to decide the order of collecting the results. Round-Robin, as you know, just collects the records in an orderly round robin fashion.

Since you have specified Auto as your partioning scheme, DataStage most likely will use round-robin to partition the data. Performance of the job will be good if you stick to Round-Robin. I would like to know the run-time of the job when you change it to Sort-Merge.

Thanks
Naveen

Posted: Wed Jan 26, 2005 12:24 am
by T42
Warning, if you do not partition on the key you're sorting (using the appropriate fields), you run a risk of not being able to group similiar fields that may be splayed across multiple partitions.

Be careful here, and pay very close attention to your data as you experiment and understand how it works.

8 hr to 35 min

Posted: Wed Jan 26, 2005 10:17 am
by metlsas
When i use PX with Round Robin it is 45 min

and I m runnning with sort merge I will let u know

How exactly will in the Sort Merge works in selecting the keys is are any preference when I have to 3 keys , which key comes first


Thanks in Advance.

Posted: Wed Jan 26, 2005 10:52 pm
by T42
Round Robin = Split evening across partitions.

Sort Merge = While merging into one partition, sort the data.

Sorting take processing time dependent on the amount of data you're using. Do you REALLY need to sort?

Posted: Thu Jan 27, 2005 2:04 pm
by metlsas
yes we need to sort the data that comes into the final target table. That's y we are using the Sort Merge in the collecting.

What u say is if we use round robin we will get job much faster than sort merge if we dont need to sort the data.