Sortung on voluminous data

mdbatra · Post by **mdbatra** » Fri May 01, 2009 7:34 am

Hi,
We are processing a data volume of more than 100 million records with each record having 304 columns. The business rules require them to be in a sorted manner.
So, any performance improvement suggestion to be incorporated in the sort stage. Tried with keeping "Stable Sort" as False & increasing the "Restricting Memory Usage" option but haven't got more than 4000 rows/sec.

Thanks in advance !

ray.wurlod · Post by **ray.wurlod** » Fri May 01, 2009 7:11 pm

Just out of curiosity, how fast does the UNIX sort command sort this volume of data? Do you have any third-party sort utilities, such as SyncSort or CoSort, available?

mdbatra · Post by **mdbatra** » Sat May 02, 2009 6:09 am

Actually, the sorting is required in the middle of the job & then some transformation rules to be applied upon.
Don't have third party sorting utilities either

.

chulett · Post by **chulett** » Sat May 02, 2009 7:21 am

Still, as a test hack off the end of the job and land the data. Use your UNIX command line to sort the data appropriately and time it. If it is significantly better then it could behoove you to incorporate it into your processing.