Wanted Comments on the expected run time.

theverma · Post by **theverma** » Fri Oct 27, 2006 4:18 am

Hello friends,
I have a Job Sequence consisting of two Jobs.
I am providing you with the number of stages and some expected data that the jobs are expected to handle :

First Job : 17 Look-ups (12 using Hashed File and 5 using Database tables).Average Rows in a hashed file is around 2000-3000.Database tables references will have less than 200 rows.
7 Transformers.
8 IPCs.
Source is a Sequential file containing around 5000 rows and target is two Database tables.

Second Job : 12 Look-ups (9 using Hashed File and 3 using Datastage tables).Average Rows in 6 hashed filea is around 800-1000 and 3 hashed file would have 2000-3000 rows.Database tables references will have less than 200 rows.
6 Transformers.
3 IPCs.
Source is a Sequential file containing around 30 Million rows and target is three tables.

We are in development phase right now and the above is is the expected Data that the jobs have to handle.

Please post ur comments on the expected time the Job Sequence will take to complete the execution.

Ur comments are very important for me....

Thanx in advance!!!

Kirtikumar · Post by **Kirtikumar** » Fri Oct 27, 2006 4:41 am

The expected time depends on ur hardware.

Why u want to know the expected time in advance?
Run the in development env and u will get the exact time rather than spending time to calc or guess expected time to run.

jhmckeever · Post by **jhmckeever** » Fri Oct 27, 2006 4:54 am

theverma,

That's a buit like saying "How fast does a car go?" It really does depend on many factors: hardware/software landscape, job design, etc.

I've worked alongside IBM when clients have asked the same question and they'll tell you the only way to infer any likely performance characteristics is to extrapolate from a run of sample of data through the hardware/software/jobs you're planning to use for production.

If you're being pushed for a figure I would suggest the runtime will be somewhere within the 3-seconds to 30-minutes timeframe.

J.

ray.wurlod · Post by **ray.wurlod** » Fri Oct 27, 2006 7:45 am

At your site you have single-CPU PCs and (if I recall correctly) 36-CPU SunFire servers. Do you expect the same throughput on each? Do you really need all those IPC stages? Enabling inter-process row buffering will (in general) have the same effect. Are all the lookups in a single Transformer stage, or are there multiple Transformer stages? All these, and other, factors will affect the overall throughput.

Build some demo jobs to establish the baseline measurements. For example the first of these will read the source file and discard it (use a Transformer stage with a constraint expression of @FALSE). That's the upper limit on throughput. Then incrementally add pieces and measure the effect of each.

Your table update strategy will also affect performance. If it's all inserts, prefer a bulk loader; if you do, prefer a Sequential File stage and execute a tuned control script. If it's inserts and updates, separate them into different streams for the database server: insert-only and update-only streams.