Wanted Comments on the expected run time.

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
theverma
Participant
Posts: 91
Joined: Tue Jul 18, 2006 10:41 am
Location: India

Wanted Comments on the expected run time.

Post by theverma »

Hello friends,
I have a Job Sequence consisting of two Jobs.
I am providing you with the number of stages and some expected data that the jobs are expected to handle :

First Job : 17 Look-ups (12 using Hashed File and 5 using Database tables).Average Rows in a hashed file is around 2000-3000.Database tables references will have less than 200 rows.
7 Transformers.
8 IPCs.
Source is a Sequential file containing around 5000 rows and target is two Database tables.

Second Job : 12 Look-ups (9 using Hashed File and 3 using Datastage tables).Average Rows in 6 hashed filea is around 800-1000 and 3 hashed file would have 2000-3000 rows.Database tables references will have less than 200 rows.
6 Transformers.
3 IPCs.
Source is a Sequential file containing around 30 Million rows and target is three tables.

We are in development phase right now and the above is is the expected Data that the jobs have to handle.

Please post ur comments on the expected time the Job Sequence will take to complete the execution.

Ur comments are very important for me....

Thanx in advance!!!
Arun Verma
Kirtikumar
Participant
Posts: 437
Joined: Fri Oct 15, 2004 6:13 am
Location: Pune, India

Post by Kirtikumar »

The expected time depends on ur hardware.

Why u want to know the expected time in advance?
Run the in development env and u will get the exact time rather than spending time to calc or guess expected time to run.
Regards,
S. Kirtikumar.
jhmckeever
Premium Member
Premium Member
Posts: 301
Joined: Thu Jul 14, 2005 10:27 am
Location: Melbourne, Australia
Contact:

Post by jhmckeever »

theverma,

That's a buit like saying "How fast does a car go?" It really does depend on many factors: hardware/software landscape, job design, etc.

I've worked alongside IBM when clients have asked the same question and they'll tell you the only way to infer any likely performance characteristics is to extrapolate from a run of sample of data through the hardware/software/jobs you're planning to use for production.

If you're being pushed for a figure I would suggest the runtime will be somewhere within the 3-seconds to 30-minutes timeframe. ;-)

J.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

At your site you have single-CPU PCs and (if I recall correctly) 36-CPU SunFire servers. Do you expect the same throughput on each? Do you really need all those IPC stages? Enabling inter-process row buffering will (in general) have the same effect. Are all the lookups in a single Transformer stage, or are there multiple Transformer stages? All these, and other, factors will affect the overall throughput.

Build some demo jobs to establish the baseline measurements. For example the first of these will read the source file and discard it (use a Transformer stage with a constraint expression of @FALSE). That's the upper limit on throughput. Then incrementally add pieces and measure the effect of each.

Your table update strategy will also affect performance. If it's all inserts, prefer a bulk loader; if you do, prefer a Sequential File stage and execute a tuned control script. If it's inserts and updates, separate them into different streams for the database server: insert-only and update-only streams.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply