Page 1 of 1

Partitioner/Collector Performance

Posted: Thu Nov 13, 2003 11:00 am
by Rahul
I am just trying to a performance test on extracting data from flatfile and loading into another flatfile with/without partitioner and collector.

Seq->partitioner->xfm----->collector--->seq
| |
->xfm--------->

I have enabled interprocess buffer to its default. What i observe is that when the process starts it writes to the target at around 5000rows/seq but as time progresses it starts reducing gradually until it reaches around 15-20 rows/sec.

Any thoughts from experts ?

Rahul

Re: Partitioner/Collector Performance

Posted: Thu Nov 13, 2003 12:17 pm
by crouse
Have you tried writing to /dev/null in the Seq stage? Maybe disk writing/caching (or lack thereof) may be getting in the way. Writing to /dev/null (the bit bucket) may take the disk performance out of the picture.

Rahul wrote:I am just trying to a performance test on extracting data from flatfile and loading into another flatfile with/without partitioner and collector.

Seq->partitioner->xfm----->collector--->seq
| |
->xfm--------->

I have enabled interprocess buffer to its default. What i observe is that when the process starts it writes to the target at around 5000rows/seq but as time progresses it starts reducing gradually until it reaches around 15-20 rows/sec.

Any thoughts from experts ?

Rahul

Posted: Thu Nov 13, 2003 1:10 pm
by Rahul
I have set the target as flatfile. If i do a simple test with no partitioner/collecotor links,the performance is far better.

seq->xfm->seq

On this i recieve consistently around 5000 rows/sec. But same thing if done with link partitioner/collector then it becomes slow. I had carried this out to show folks around that data partitioning is better if we can afford doing it.

Any suggestions ??

Rahul

Posted: Thu Nov 13, 2003 2:32 pm
by Creo
Hi Rahul,

What type of collector are you using? Some just write randomly to the flat file (ex: round robin), others require that you sort the data (ex: sort merge) which might slow down the process as the file gets bigger... but it's just a wild guess from my part.

Hope it helps!

Creo

Posted: Fri Nov 14, 2003 8:20 am
by Rahul
I have allowed the policy to remain to default (round robin) and i have not used hash or the other ones.

Any suggestions ?

Rahul

Posted: Fri Nov 14, 2003 9:12 pm
by kcbland
Uggh. If what you are doing is attempting "instantiation" without incuring the overhead of extra job clones (Agent Smith from Matrix 2&3), this is not a good approach.

The link partioner and collector stages are a more elegant way of splitting processing and collecting it back into a single output stream. This makes a design that employed multiple sequential files with an after job concatentation command much more seemlessly designed. I would not recommend it as a way of increasing the net number of rows/second processing through your ETL application.

You're better off with a seq-->xfm-->seq job design using instantiated clones to handle the "MORE of ME Agent Smith" approach. You can do a round-robin algorithm in the transformer constraint using a simple expression:

Code: Select all

MOD(@INROWNUM, NumberOfJobClones) = ThisJobClonesNumber - 1
If you supply the NumberOfJobClones parameter value equal to how many instances you're going to run, and give each instance clone a parameter ThisJobClonesNumber a value of 1 thru NumberOfJobClones, you achieve a simple round-robin distribution of rows in the source sequential file.