How to estimate sratchdisk space for sorting

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jiegao
Premium Member
Premium Member
Posts: 46
Joined: Fri Sep 22, 2006 6:12 pm

How to estimate sratchdisk space for sorting

Post by jiegao »

I have to sort data with volume of 900 million records. The sort is on two keys with 26 bytes in total. The sort is not stable sort. If I caluclate 900 million multiple by 26, it should be about 23 G of data. But it consumes more than 200G of scratch disk space. What does the other data come from? Does it come from the buffer overflow? We do not specify buffer resource in the configure file. Thanks
Regards
Jie
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

What is the total length of your row? Not just the keys.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

As Paul noted, the entire Data Set has to be sorted, not just the keys. Estimate not less than two times the entire volume of the Data Set as the requirement for scratch space when sorting. More is always better.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jiegao
Premium Member
Premium Member
Posts: 46
Joined: Fri Sep 22, 2006 6:12 pm

Post by jiegao »

PaulVL wrote:What is the total length of your row? Not just the keys.
The total length of the row is 50 bytes
Regards
Jie
jiegao
Premium Member
Premium Member
Posts: 46
Joined: Fri Sep 22, 2006 6:12 pm

Post by jiegao »

Hi Ray, it looks like ways more than double of the entire volume of the data set. It requries 4 or 5 times more than the entire dataset. I need to justify the space requested. Thanks
Regards
Jie
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, I was being simplistic. For example are your VarChar data types bounded (have a maximum length) or unbounded? There are also other overheads associated with storage in a Data Set - approximately 80 bits per row last time I looked (version 7.x).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jiegao
Premium Member
Premium Member
Posts: 46
Joined: Fri Sep 22, 2006 6:12 pm

Post by jiegao »

Thanks. 50 Bytes is the max length of the row
Regards
Jie
Post Reply