Keeping data sorted whilst repartioning.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dohertys
Participant
Posts: 39
Joined: Thu Oct 11, 2007 3:26 am
Location: Sheffield

Keeping data sorted whilst repartioning.

Post by dohertys »

I'm trying to understand what happens to sorted data as it gets repartioned.

I'm fairly happy to assume that if I have sorted data on multiple nodes, and then repartition it, then it cannot still be sorted. However, I'm not sure what would happen if I had data sorted on a single node and then repartition it to multiple nodes.

For exampe...
If I have a dataset which is contains sorted data and was written using just 1 node, and then read that dataset using a job that is running on multiple nodes, will that data still be sorted?

Is there a way I can confirm this ? If I use a sort stage, with the setting 'Don't sort - already sorted' will it generate an error if the data is not sorted correctly?

Thanks
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Depends on your partitioning method...

Hash partition will certainly disrupt the sort order. Round robin partitioning will maintain sort order within the partitions, but round robin is not suitable for key-based operations.

In general, when you go from sequential to parallel, it's best to make no assumptions about sort order.

Mike
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: Keeping data sorted whilst repartioning.

Post by ray.wurlod »

dohertys wrote:If I use a sort stage, with the setting 'Don't sort - already sorted' will it generate an error if the data is not sorted correctly?
Yes
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dohertys
Participant
Posts: 39
Joined: Thu Oct 11, 2007 3:26 am
Location: Sheffield

Post by dohertys »

Thanks
Post Reply