ETL Capacity Planning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sswarup
Participant
Posts: 6
Joined: Fri May 28, 2004 6:11 am

ETL Capacity Planning

Post by sswarup »

Hi,

I am struggling with ETL Capacity Planning and Sizing for DataStage jobs. Maybe this is just wishful thinking, but are there any templates to estimate how much space (in-process and landing) an ETL jobs will require?

Please note that I am interested only in ETL capacity sizing for final datasets/sequential files and the scratch space required by a job.

Regards,
Saurabh
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The disk sizings for similar jobs can vary greatly. If you read a database and pre-sort the data on the select then a typical job might not need any interim storage even for gigabytes of data. If the incoming data isn't sorted then any aggregation or similar operation might trigger a temporary storage space larger than the original incoming data stream.

Disk capacity planning is a function of what you need to do and how you do it - this is more important than assuming it is a function of the amount of data to be processed.

If you could elaborate on what your ETL does a rough estimate might be possible. This would include your projected lookups, whether you need Changed Data Detection or any type of historical data stored. With those figures plus actual expected data volumes (total & daily delta) some rough estimations could be computed.
Post Reply