Performing total sort

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Performing total sort

Post by zulfi123786 »

I was trying to implement a total sort using Range partitioning and hence as a prerequisite created a range map file using write range map stage. Out of curiosity wanted to see what this file contained but it turned out to be a binary file, any documentation which states what layout/format this file follows ?

Also during the implemention of range partitioning does datastage ensure that key values are assigned in ascending order to nodes as they appear in the configuration file ? if so then ordered collection could be used else have to go with sort merge collection which per my understanding would sort data across partitions (similiar to sorting on single node)

Thanks
- Zulfi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Structure of the range map is not documented in the public domain. You can be fairly certain that it contains the limit values for each node.

Range partitioning does not perform a sort, though it will preserve any sorted order of the partitioning key that happens to exist on the input link. This is purely an artefact of the FIFO nature of record processing. Therefore, if this is your scenario, then an Ordered collection algorithm will be apposite.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

what particularly interest me is how the key boundaries are defined per each node, which node gets the keys with least values when the range map is created and so on as this would decide the link ordering during ordered collection when the actual input is not sorted to achieve a total sort.

Also wondering how datastage would dynamically adjust if the range map is created in x nodes and the same is used in range partitioning in a job running on y nodes. Would have tested this myself by dont have the access to datastage right now :(
- Zulfi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Your proposed scenario of different number of nodes for range map will fail.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply