Pivot Stage Performance

rleishman · Post by **rleishman** » Thu Jan 12, 2006 1:13 am

I have a bunch of jobs where I want to convert financial transactions into journals. ie. For each transaction coming in, I generate 2 transactions with mostly the same column vals except the amount column where the sign is reversed.

Being a newbie (and not knowing about the Pivot stage, which was not installed at the time), I used a Transformer to create the 2 journal transactions, wrote them to files, merged the files with a Link Collector into a third file, and then proceeded with further processing.

ie.

Code: Select all

         ----> SEQ ----
...TX ---+            +---> LC ---> SEQ ---> TX ---> ....
         ----> SEQ ----

This works great, but its seems sub-optimal because it writes the data to disk twice. I tried doing it without the files, but DS won't let you link the Collector to an Active stage on either the input or the output (which is what I need to do).

Reading the doco

I found the Pivot stage (actually, I read about it here, but it sounds better when I say I discovered it myself). Installed it, tried it again, but it's SLOOOOOOOOOW!!!!!!

Without the Pivot, the job processes at 1500 rows/sec (but writes only one row per input row). With Pivot, I was prepared to accept half pace because I'm doubling the transactions, but it went down to 150 rows/sec.

Much research later, I put derivations on every column to stop them being used as grouping keys, and wrapped the Pivot stage in InterProcess stages to mitigate the row-buffering problem with grouping stages. This brought it up to 220 rows/sec. Still too shabby.

Q1. Is there anything else I can do to make Pivot faster?
Q2. Is Pivot just rubbish? Should I go back to my old method with the Collector?
Q3. Is there another method on Server 7.5.1a I could use? (Without breaking it up into multiple jobs that communicate through named pipes - too much trouble)

Thanks in advance for your advice.

ray.wurlod · Post by **ray.wurlod** » Thu Jan 12, 2006 3:17 am

How about

Code: Select all

          ----> IPC ---- 
...TX ---+              +---> LC ---> IPC ---> TX ---> .... 
          ----> IPC ----

Or, assuming that the results are going to a table

Code: Select all

           ---> TX2A ---> ....  
...T1X ---+             
           ---> TX2B ---> ....

The two Transformer stages TX2A and TX2B perform identical processing (apart from the negative sign) and construct appropriate insert rows.

rleishman · Post by **rleishman** » Thu Jan 12, 2006 3:58 pm

Excellent! That first solution is brilliant. All I need to do is swap my files for IPCs - child's play. It's running at a writeback rate of 600-700 rows/s, which is pretty creditable given the number of IPC's and the miserable config of my dev box. It should do better on a box with more procs.

Am I to take it then that the answers to my questions are therefore:

Q1. Is there anything else I can do to make Pivot faster? Buy a bigger box.
Q2. Is Pivot just rubbish? Yes. Should I go back to my old method with the Collector? Kind of.
Q3. Is there another method on Server 7.5.1a I could use? (Without breaking it up into multiple jobs that communicate through named pipes - too much trouble) See Ray's post.

The only thing I don't like about this is that it's heavier on maintenance. If a column changes or is added, I have to make the change in both IPCs. Not such a huge problem here, but I have two other jobs that pivot one row into twelve. But that's support's problem...

ray.wurlod · Post by **ray.wurlod** » Thu Jan 12, 2006 5:30 pm

It occured to me that the IPC stages are not required in my second design, which will make maintenance even easier. I've editied it to eliminate them.