Page 1 of 1
Partitioner/Collector Performance
Posted: Thu Nov 13, 2003 11:00 am
by Rahul
I am just trying to a performance test on extracting data from flatfile and loading into another flatfile with/without partitioner and collector.
Seq->partitioner->xfm----->collector--->seq
| |
->xfm--------->
I have enabled interprocess buffer to its default. What i observe is that when the process starts it writes to the target at around 5000rows/seq but as time progresses it starts reducing gradually until it reaches around 15-20 rows/sec.
Any thoughts from experts ?
Rahul
Re: Partitioner/Collector Performance
Posted: Thu Nov 13, 2003 12:17 pm
by crouse
Have you tried writing to /dev/null in the Seq stage? Maybe disk writing/caching (or lack thereof) may be getting in the way. Writing to /dev/null (the bit bucket) may take the disk performance out of the picture.
Rahul wrote:I am just trying to a performance test on extracting data from flatfile and loading into another flatfile with/without partitioner and collector.
Seq->partitioner->xfm----->collector--->seq
| |
->xfm--------->
I have enabled interprocess buffer to its default. What i observe is that when the process starts it writes to the target at around 5000rows/seq but as time progresses it starts reducing gradually until it reaches around 15-20 rows/sec.
Any thoughts from experts ?
Rahul
Posted: Thu Nov 13, 2003 1:10 pm
by Rahul
I have set the target as flatfile. If i do a simple test with no partitioner/collecotor links,the performance is far better.
seq->xfm->seq
On this i recieve consistently around 5000 rows/sec. But same thing if done with link partitioner/collector then it becomes slow. I had carried this out to show folks around that data partitioning is better if we can afford doing it.
Any suggestions ??
Rahul
Posted: Thu Nov 13, 2003 2:32 pm
by Creo
Hi Rahul,
What type of collector are you using? Some just write randomly to the flat file (ex: round robin), others require that you sort the data (ex: sort merge) which might slow down the process as the file gets bigger... but it's just a wild guess from my part.
Hope it helps!
Creo
Posted: Fri Nov 14, 2003 8:20 am
by Rahul
I have allowed the policy to remain to default (round robin) and i have not used hash or the other ones.
Any suggestions ?
Rahul
Posted: Fri Nov 14, 2003 9:12 pm
by kcbland
Uggh. If what you are doing is attempting "instantiation" without incuring the overhead of extra job clones (Agent Smith from Matrix 2&3), this is not a good approach.
The link partioner and collector stages are a more elegant way of splitting processing and collecting it back into a single output stream. This makes a design that employed multiple sequential files with an after job concatentation command much more seemlessly designed. I would not recommend it as a way of increasing the net number of rows/second processing through your ETL application.
You're better off with a seq-->xfm-->seq job design using instantiated clones to handle the "MORE of ME Agent Smith" approach. You can do a round-robin algorithm in the transformer constraint using a simple expression:
Code: Select all
MOD(@INROWNUM, NumberOfJobClones) = ThisJobClonesNumber - 1
If you supply the NumberOfJobClones parameter value equal to how many instances you're going to run, and give each instance clone a parameter ThisJobClonesNumber a value of 1 thru NumberOfJobClones, you achieve a simple round-robin distribution of rows in the source sequential file.