Bottleneck Advice

denzilsyb · Post by **denzilsyb** » Tue Sep 28, 2004 6:54 am

Hi Guys

Just something I was pondering on while tuning a job. I have a SEQuential stage looking up against a HASH stage and writing to a different HASH stage. This needs to occur like this because I am using "different HASH stage" as lookup/reference for another transformer.

The SEQ stage has 19'000'000 records in it, 3 columns wide (char 16 [key], integer 10, decimal 4); this is how it comes from the DB.

The lookup HASH stage has 300'000 records in and is two columns wide (decimal 4 [key], char 3) (dynamic/type 30 HASH)

The output HASH stage I am writing is 3 columns wide and will have all 19'000'000 records in it as (char 16 [key], integer 10, char 3 (derived from the lookup) (static/type 10 HASH -- at the moment).

Unfortunately, I need to process all these records, so limiting the number of records to be written is not an option.

The problem I have is that to create the SEQ stage I was writing at 25'000 rows per second. The lookup HASH was created at a good speed. Now, when I do the matching between looklup HASH and SEQ stage, I am getting 6'000 rows per second. By tuning the output HASH stage I know I can improve performance, but at which stage do I realise that the bottleneck is resulting from reading the SEQ stage?

kcbland · Post by **kcbland** » Tue Sep 28, 2004 8:08 am

If you're on Solaris, use prstat -a and see if your job is using 100% of a cpu. If it is, then the only way to speed up that single job is to remove logic. Chances are if your job is seq-->xfm --> hash with a hash reference, your job is at 100% cpu speed unless you're disk thrashing.

If you're not trashing, then the ONLY solution is to use multiple job instances and divide up your source sequential file using a partitioning constraint. If you have 8 cpus, then use 8 instances of your job to each handle 1/8th of the source data. You'll be finished 8X sooner.

ray.wurlod · Post by **ray.wurlod** » Tue Sep 28, 2004 4:05 pm

Are you using read cache for the lookups?

Can you split the input stream (maybe using a link partitioner stage, maybe using a Transformer stage), running the separate streams through separate Transformer stages which will run in separate processes thereby achieving "partition parallelism" as well as using more CPUs? (This is essentially what Ken suggested.)

To determine exactly where the bottleneck(s) may be, use the an incremental approach such as you have already described, but make sure that you've identified all possible obstructions (network bandwidth among them if relevant). For example, are the source and target on the same physical disk spindle? If so, try separating them. (Of course, if you're using a SAN, you have no control over this, but can usually assume they're on separate disks; you can, however, use different logical volumes or different connection channels if there's a bandwidth issue for disk I/O.)

denzilsyb · Post by **denzilsyb** » Wed Sep 29, 2004 12:51 am

thanks Ray/Ken - I'll give your suggestions a shot and post the results.

ray.wurlod · Post by **ray.wurlod** » Wed Sep 29, 2004 1:38 am

I think Ken meant "thrashing" when he went with "trashing".

But, since he lives in the hurricane state (Florida), who knows?