ETL Capacity Planning

sswarup · Post by **sswarup** » Mon May 08, 2006 3:14 pm

Hi,

I am struggling with ETL Capacity Planning and Sizing for DataStage jobs. Maybe this is just wishful thinking, but are there any templates to estimate how much space (in-process and landing) an ETL jobs will require?

Please note that I am interested only in ETL capacity sizing for final datasets/sequential files and the scratch space required by a job.

Regards,
Saurabh

ArndW · Post by **ArndW** » Tue May 09, 2006 9:52 am

The disk sizings for similar jobs can vary greatly. If you read a database and pre-sort the data on the select then a typical job might not need any interim storage even for gigabytes of data. If the incoming data isn't sorted then any aggregation or similar operation might trigger a temporary storage space larger than the original incoming data stream.

Disk capacity planning is a function of what you need to do and how you do it - this is more important than assuming it is a function of the amount of data to be processed.

If you could elaborate on what your ETL does a rough estimate might be possible. This would include your projected lookups, whether you need Changed Data Detection or any type of historical data stored. With those figures plus actual expected data volumes (total & daily delta) some rough estimations could be computed.