Bottleneck Advice

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
denzilsyb
Participant
Posts: 186
Joined: Mon Sep 22, 2003 7:38 am
Location: South Africa
Contact:

Bottleneck Advice

Post by denzilsyb »

Hi Guys

Just something I was pondering on while tuning a job. I have a SEQuential stage looking up against a HASH stage and writing to a different HASH stage. This needs to occur like this because I am using "different HASH stage" as lookup/reference for another transformer.

The SEQ stage has 19'000'000 records in it, 3 columns wide (char 16 [key], integer 10, decimal 4); this is how it comes from the DB.

The lookup HASH stage has 300'000 records in and is two columns wide (decimal 4 [key], char 3) (dynamic/type 30 HASH)

The output HASH stage I am writing is 3 columns wide and will have all 19'000'000 records in it as (char 16 [key], integer 10, char 3 (derived from the lookup) (static/type 10 HASH -- at the moment).

Unfortunately, I need to process all these records, so limiting the number of records to be written is not an option.

The problem I have is that to create the SEQ stage I was writing at 25'000 rows per second. The lookup HASH was created at a good speed. Now, when I do the matching between looklup HASH and SEQ stage, I am getting 6'000 rows per second. By tuning the output HASH stage I know I can improve performance, but at which stage do I realise that the bottleneck is resulting from reading the SEQ stage?
dnzl
"what the thinker thinks, the prover proves" - Robert Anton Wilson
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If you're on Solaris, use prstat -a and see if your job is using 100% of a cpu. If it is, then the only way to speed up that single job is to remove logic. Chances are if your job is seq-->xfm --> hash with a hash reference, your job is at 100% cpu speed unless you're disk thrashing.

If you're not trashing, then the ONLY solution is to use multiple job instances and divide up your source sequential file using a partitioning constraint. If you have 8 cpus, then use 8 instances of your job to each handle 1/8th of the source data. You'll be finished 8X sooner.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are you using read cache for the lookups?

Can you split the input stream (maybe using a link partitioner stage, maybe using a Transformer stage), running the separate streams through separate Transformer stages which will run in separate processes thereby achieving "partition parallelism" as well as using more CPUs? (This is essentially what Ken suggested.)

To determine exactly where the bottleneck(s) may be, use the an incremental approach such as you have already described, but make sure that you've identified all possible obstructions (network bandwidth among them if relevant). For example, are the source and target on the same physical disk spindle? If so, try separating them. (Of course, if you're using a SAN, you have no control over this, but can usually assume they're on separate disks; you can, however, use different logical volumes or different connection channels if there's a bandwidth issue for disk I/O.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
denzilsyb
Participant
Posts: 186
Joined: Mon Sep 22, 2003 7:38 am
Location: South Africa
Contact:

Post by denzilsyb »

thanks Ray/Ken - I'll give your suggestions a shot and post the results.
dnzl
"what the thinker thinks, the prover proves" - Robert Anton Wilson
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I think Ken meant "thrashing" when he went with "trashing".

But, since he lives in the hurricane state (Florida), who knows? :lol:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply