Partitioner/Collector Performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Rahul
Participant
Posts: 19
Joined: Wed Oct 29, 2003 10:21 pm

Partitioner/Collector Performance

Post by Rahul »

I am just trying to a performance test on extracting data from flatfile and loading into another flatfile with/without partitioner and collector.

Seq->partitioner->xfm----->collector--->seq
| |
->xfm--------->

I have enabled interprocess buffer to its default. What i observe is that when the process starts it writes to the target at around 5000rows/seq but as time progresses it starts reducing gradually until it reaches around 15-20 rows/sec.

Any thoughts from experts ?

Rahul
crouse
Charter Member
Charter Member
Posts: 204
Joined: Sun Oct 05, 2003 12:59 pm
Contact:

Re: Partitioner/Collector Performance

Post by crouse »

Have you tried writing to /dev/null in the Seq stage? Maybe disk writing/caching (or lack thereof) may be getting in the way. Writing to /dev/null (the bit bucket) may take the disk performance out of the picture.

Rahul wrote:I am just trying to a performance test on extracting data from flatfile and loading into another flatfile with/without partitioner and collector.

Seq->partitioner->xfm----->collector--->seq
| |
->xfm--------->

I have enabled interprocess buffer to its default. What i observe is that when the process starts it writes to the target at around 5000rows/seq but as time progresses it starts reducing gradually until it reaches around 15-20 rows/sec.

Any thoughts from experts ?

Rahul
Rahul
Participant
Posts: 19
Joined: Wed Oct 29, 2003 10:21 pm

Post by Rahul »

I have set the target as flatfile. If i do a simple test with no partitioner/collecotor links,the performance is far better.

seq->xfm->seq

On this i recieve consistently around 5000 rows/sec. But same thing if done with link partitioner/collector then it becomes slow. I had carried this out to show folks around that data partitioning is better if we can afford doing it.

Any suggestions ??

Rahul
Creo
Participant
Posts: 34
Joined: Wed Mar 19, 2003 1:12 pm
Location: Canada

Post by Creo »

Hi Rahul,

What type of collector are you using? Some just write randomly to the flat file (ex: round robin), others require that you sort the data (ex: sort merge) which might slow down the process as the file gets bigger... but it's just a wild guess from my part.

Hope it helps!

Creo
Rahul
Participant
Posts: 19
Joined: Wed Oct 29, 2003 10:21 pm

Post by Rahul »

I have allowed the policy to remain to default (round robin) and i have not used hash or the other ones.

Any suggestions ?

Rahul
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Uggh. If what you are doing is attempting "instantiation" without incuring the overhead of extra job clones (Agent Smith from Matrix 2&3), this is not a good approach.

The link partioner and collector stages are a more elegant way of splitting processing and collecting it back into a single output stream. This makes a design that employed multiple sequential files with an after job concatentation command much more seemlessly designed. I would not recommend it as a way of increasing the net number of rows/second processing through your ETL application.

You're better off with a seq-->xfm-->seq job design using instantiated clones to handle the "MORE of ME Agent Smith" approach. You can do a round-robin algorithm in the transformer constraint using a simple expression:

Code: Select all

MOD(@INROWNUM, NumberOfJobClones) = ThisJobClonesNumber - 1
If you supply the NumberOfJobClones parameter value equal to how many instances you're going to run, and give each instance clone a parameter ThisJobClonesNumber a value of 1 thru NumberOfJobClones, you achieve a simple round-robin distribution of rows in the source sequential file.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply