LCP & LCC performance Tuning

ravij · Post by **ravij** » Mon Feb 27, 2006 3:46 am

Hi,

I have run 2 diff jobs using same file like Seqfile-->Tfm-->Db2UDB and SeqFile-->LCP-->Tfm-->LCC-->Db2UDB
LCP:Link Partitioner, LCC: Link Collector

This is my Log Info regarding the 2 jobs.

Seqfile-->Tfm-->Db2UDB
LoadCSKATbl..XfmTrimAndDateConversion: DSD.StageRun Active stage finishing.
3460 rows read from ToXfmTrimAndDateConversion
3460 rows written to ToTgtCSKATbl
1.570 CPU seconds used, 3.130 seconds elapsed.

SeqFile-->LCP-->Tfm-->LCC-->Db2UDB

CopyOfLoadCSKATbl_Test..LCP: DSD.StageRun Active stage finishing.
1730 rows read from FromXFM1
1730 rows read from FromXFM2
3460 rows written to TgtTbl
1.090 CPU seconds used, 3.490 seconds elapsed.

Now my question is how can I judge that which job is performance tuned? eithere using CPU seconds or by Elapsed Time?
And one more doubt:Why there is no much diff when I use Link partitioner and collector? becoz theese are for Performance Tuning know.?
please clarify my doubt.
Thanks in Advance.

ArndW · Post by **ArndW** » Mon Feb 27, 2006 3:54 am

Ravi,

I have two main points to make:

1. Your runtime of 3-4 seconds is far too short to get any meaningful speeds for performance measurement. The startup and finish times are a large chunk of this and small differences in system load will make huge differences in measurement times. Use a job runtime of several minutes at an absolute minimum.

2. The Link collector and partitioner stages are meant to allow the developer to explicitly control parallelism in data processing. You have used them to split and merge one stream of data, so they are not going to improve performance at all, in fact since they add additional processing overhead they are likely to make the job slower (albeit just a very small bit). If you use the link partitioner to split your job into 4 parallel streams you might see a difference in performance but only if you are being limited by CPU. Your job is most likely being limited by the write to DB/2 and you might get more performance benefit by dispensing with the Link Collector stage altogether and doing parallel writes to DB/2 stages from your separate streams as split out by the link partitioner.

ravij · Post by **ravij** » Mon Feb 27, 2006 5:25 am

ArndW
Thank you.

You have used them to split and merge one stream of data, so they are not going to improve performance at all, in fact since they add additional processing overhead they are likely to make the job slower (albeit just a very small bit).

But as I read in Server developer guide we can acheive the performance by using Link Partitioner n Link Collector stages in Server Jobs. You are saying that its performance overhead. I am a bit confused with your words. Can you please give some detail description?
It may look simple but I need some clear information.

thanks in advance.[/quote]

ArndW · Post by **ArndW** » Mon Feb 27, 2006 5:51 am

I don't think you've quite understood what the link partitioner is there to achieve. I will split your data stream into 1 or more processes that run in parallel. Your example used a link partitioner to split a single stream into.... a single stream. So effectively this partitioner has done nothing towards increasing performance, and since it is in the design it has some overhead so it will most likely have slowed down your job or at least added some processing overhead to it.

If you had split your stream into 2 or more in the partitioner, then you might have increased performance.

Let assume you have a factory production line making widgets. The conveyor belt can move at 25 units per minute at top speed. Somewhere in the production line you have a step where a worker has to screw 4 bolts into your widget. If the worker needs 15 seconds per bolt then the top speed of widget production is going to be 4 per minute at best because of this one station.

If we split the incoming widgets at this station into 4 separate belts and have 4 workers doing their 4 bolts, one to a conveyor belt, and then have the belts merge again you can increase your production to 16 per minute. The functions of the link splitter and partitioner are analogous to the the production line conveyor belt split and merge in this example.

Since this worked so well, let's change the conveyor belt to have an 8-way split in the widget bolting phase. These hard workers can now make 32 widgets per minute - but since the main conveyor belt can only move at 25 units per minute the extra stations are useless.

This is what often happens in DataStage jobs - you have to know why your job's speed is being limited. If you use a link partitioner (and a link collector later on) you will be able to perform the actions after the partitioner in parallel per partition - but it will only make a difference if the limiting factor can be improved by a parallel run.

In your example the most likely bottleneck is your load to DB/2. I doubt that use of a partitioner and collector for a transform stage is going to make a difference in speed unless this transform stage does a very large amount of complex CPU processing.

ravij · Post by **ravij** » Mon Feb 27, 2006 6:47 am

HI ArndW,

I am very sorry. In the example I have showed only one transformer stage but I am using 3 transformer stages.
Here I am giving the diff levels of job run details.

No LCC n LKP is used here b/w LCC n LKP
3460 rows read from ToXfmTrimAndDateConversion
3460 rows written to ToTgtCSKATbl
1.570 CPU seconds used, 3.130 seconds elapsed.

2 Transformer stages used here b/w LCC n LKP
1730 rows read from FromXFM1
1730 rows read from FromXFM2
3460 rows written to TgtTbl
1.090 CPU seconds used, 3.490 seconds elapsed.

3 Transformer stages used here
1153 rows read from ToXFM
1153 rows written to FromXFM
0.070 CPU seconds used, 0.280 seconds elapsed.

Ravi Quoted

Now my question is how can I judge that which job is performance tuned? eithere using CPU seconds or by Elapsed Time?

Now can you comment on it please?

ArndW · Post by **ArndW** » Mon Feb 27, 2006 7:20 am

Ravi,

no, I can't comment. The samples are still far too small to make any sort of an analysis. If you have just one transform stage have you looked at the %CPU usage (in the job's log)? If you are testing on a system that isn't overloaded, you will only get a performance benefit by using parallel streams behind a link partitioner stage if your %CPU is close to 100%.

Using the analogy given earlier, you don't know what your conveyor belt's top speed is, so you can't really decide into how many parallel streams you need to go. In other words, if your write to DB/2 has a max speed of 1000 rows per second and you are getting about that speed with just one transform and no link partitioner you can add partitioners and collectors around your transform stages until your designer canvas fills up without making the job go any faster.