Resource Estimates for Scratch and Resource Disk

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vvsaucse
Participant
Posts: 16
Joined: Thu Aug 27, 2009 11:23 pm
Location: BAngalore

Resource Estimates for Scratch and Resource Disk

Post by vvsaucse »

Hello,

I am using DS 8.1 Version on Unix (AIX) platform. We have a requirement for a project where in the source and Targets are Sequential files, with transformers, lookups, sort (HASH partition on 4 columns) and Remove Duplicates as intermediate stages.

The source is a Fixed width file with 6-8GB size limit and approximately 70Million Records (58 Character Length, for each record).

I can put in an approximate size for the output files being generated (8GB approx),

However am interested to know is what would be the approximate size (maximum value) of Scratch Disk and Resource Disk that the DS Server would need for the same, and the total file system size requirement? Please do let me know how we arrive at the conclusion/stats for the same?
Last edited by vvsaucse on Fri Apr 15, 2011 9:58 am, edited 2 times in total.
Subbu
TCS BAngalore
vvsaucse
Participant
Posts: 16
Joined: Thu Aug 27, 2009 11:23 pm
Location: BAngalore

Post by vvsaucse »

Can someone please help me out in this?
Subbu
TCS BAngalore
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

I think your scratch disk estimation really depends on your current and future job design.

If you sort or merge the data for instance, you'll use the scratch space. So your job design directly affects how much of that space is used.

You also have other jobs running in your environment that might also use that scratch space.

My advice... go big.

It is true that a $$$ amount for hardware should not affect your code design, but for scratch space... more is better.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use the DataStage resource estimation tool.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vvsaucse
Participant
Posts: 16
Joined: Thu Aug 27, 2009 11:23 pm
Location: BAngalore

Post by vvsaucse »

Was trying to use the Resource Estimate option , but getting the below error messages for the same job, which is running to completion with out this option., Would help if anyone can shed some light on the same

main_program: Current heap size: 1,554,287,264 bytes in 6,709,915 blocks
main_program: Fatal Error: Throwing exception: APT_BadAlloc: Heap allocation failed.
Subbu
TCS BAngalore
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

You are running out of heap, which for the most part indicates some sort of programmatical error. You'd need to contact your service provider and get them to run diagnostics on the job while it runs to determine why it is breaking and how to patch it.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Post Reply