I am just trying to a performance test on extracting data from flatfile and loading into another flatfile with/without partitioner and collector.
Seq->partitioner->xfm----->collector--->seq
| |
->xfm--------->
I have enabled interprocess buffer to its default. What i observe is that when the process starts it writes to the target at around 5000rows/seq but as time progresses it starts reducing gradually until it reaches around 15-20 rows/sec.
Any thoughts from experts ?
Rahul
Partitioner/Collector Performance
Moderators: chulett, rschirm, roy
Re: Partitioner/Collector Performance
Have you tried writing to /dev/null in the Seq stage? Maybe disk writing/caching (or lack thereof) may be getting in the way. Writing to /dev/null (the bit bucket) may take the disk performance out of the picture.
Rahul wrote:I am just trying to a performance test on extracting data from flatfile and loading into another flatfile with/without partitioner and collector.
Seq->partitioner->xfm----->collector--->seq
| |
->xfm--------->
I have enabled interprocess buffer to its default. What i observe is that when the process starts it writes to the target at around 5000rows/seq but as time progresses it starts reducing gradually until it reaches around 15-20 rows/sec.
Any thoughts from experts ?
Rahul
I have set the target as flatfile. If i do a simple test with no partitioner/collecotor links,the performance is far better.
seq->xfm->seq
On this i recieve consistently around 5000 rows/sec. But same thing if done with link partitioner/collector then it becomes slow. I had carried this out to show folks around that data partitioning is better if we can afford doing it.
Any suggestions ??
Rahul
seq->xfm->seq
On this i recieve consistently around 5000 rows/sec. But same thing if done with link partitioner/collector then it becomes slow. I had carried this out to show folks around that data partitioning is better if we can afford doing it.
Any suggestions ??
Rahul
Uggh. If what you are doing is attempting "instantiation" without incuring the overhead of extra job clones (Agent Smith from Matrix 2&3), this is not a good approach.
The link partioner and collector stages are a more elegant way of splitting processing and collecting it back into a single output stream. This makes a design that employed multiple sequential files with an after job concatentation command much more seemlessly designed. I would not recommend it as a way of increasing the net number of rows/second processing through your ETL application.
You're better off with a seq-->xfm-->seq job design using instantiated clones to handle the "MORE of ME Agent Smith" approach. You can do a round-robin algorithm in the transformer constraint using a simple expression:
If you supply the NumberOfJobClones parameter value equal to how many instances you're going to run, and give each instance clone a parameter ThisJobClonesNumber a value of 1 thru NumberOfJobClones, you achieve a simple round-robin distribution of rows in the source sequential file.
The link partioner and collector stages are a more elegant way of splitting processing and collecting it back into a single output stream. This makes a design that employed multiple sequential files with an after job concatentation command much more seemlessly designed. I would not recommend it as a way of increasing the net number of rows/second processing through your ETL application.
You're better off with a seq-->xfm-->seq job design using instantiated clones to handle the "MORE of ME Agent Smith" approach. You can do a round-robin algorithm in the transformer constraint using a simple expression:
Code: Select all
MOD(@INROWNUM, NumberOfJobClones) = ThisJobClonesNumber - 1
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle