Sortung on voluminous data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mdbatra
Premium Member
Premium Member
Posts: 175
Joined: Wed Oct 22, 2008 10:01 am
Location: City of London

Sortung on voluminous data

Post by mdbatra »

Hi,
We are processing a data volume of more than 100 million records with each record having 304 columns. The business rules require them to be in a sorted manner.
So, any performance improvement suggestion to be incorporated in the sort stage. Tried with keeping "Stable Sort" as False & increasing the "Restricting Memory Usage" option but haven't got more than 4000 rows/sec.

Thanks in advance !
Rgds,
MB
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Just out of curiosity, how fast does the UNIX sort command sort this volume of data? Do you have any third-party sort utilities, such as SyncSort or CoSort, available?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mdbatra
Premium Member
Premium Member
Posts: 175
Joined: Wed Oct 22, 2008 10:01 am
Location: City of London

Post by mdbatra »

Actually, the sorting is required in the middle of the job & then some transformation rules to be applied upon.
Don't have third party sorting utilities either :( .
Rgds,
MB
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Still, as a test hack off the end of the job and land the data. Use your UNIX command line to sort the data appropriately and time it. If it is significantly better then it could behoove you to incorporate it into your processing.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply