Pivot Stage Performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Pivot Stage Performance

Post by rleishman »

I have a bunch of jobs where I want to convert financial transactions into journals. ie. For each transaction coming in, I generate 2 transactions with mostly the same column vals except the amount column where the sign is reversed.

Being a newbie (and not knowing about the Pivot stage, which was not installed at the time), I used a Transformer to create the 2 journal transactions, wrote them to files, merged the files with a Link Collector into a third file, and then proceeded with further processing.

ie.

Code: Select all

         ----> SEQ ----
...TX ---+            +---> LC ---> SEQ ---> TX ---> ....
         ----> SEQ ----
This works great, but its seems sub-optimal because it writes the data to disk twice. I tried doing it without the files, but DS won't let you link the Collector to an Active stage on either the input or the output (which is what I need to do).

Reading the doco :idea: I found the Pivot stage (actually, I read about it here, but it sounds better when I say I discovered it myself). Installed it, tried it again, but it's SLOOOOOOOOOW!!!!!!

Without the Pivot, the job processes at 1500 rows/sec (but writes only one row per input row). With Pivot, I was prepared to accept half pace because I'm doubling the transactions, but it went down to 150 rows/sec. :(

Much research later, I put derivations on every column to stop them being used as grouping keys, and wrapped the Pivot stage in InterProcess stages to mitigate the row-buffering problem with grouping stages. This brought it up to 220 rows/sec. Still too shabby.

Q1. Is there anything else I can do to make Pivot faster?
Q2. Is Pivot just rubbish? Should I go back to my old method with the Collector?
Q3. Is there another method on Server 7.5.1a I could use? (Without breaking it up into multiple jobs that communicate through named pipes - too much trouble)

Thanks in advance for your advice.
Ross Leishman
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How about

Code: Select all

          ----> IPC ---- 
...TX ---+              +---> LC ---> IPC ---> TX ---> .... 
          ----> IPC ----
Or, assuming that the results are going to a table

Code: Select all

           ---> TX2A ---> ....  
...T1X ---+             
           ---> TX2B ---> .... 
The two Transformer stages TX2A and TX2B perform identical processing (apart from the negative sign) and construct appropriate insert rows.
Last edited by ray.wurlod on Thu Jan 12, 2006 5:30 pm, edited 2 times in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

Excellent! That first solution is brilliant. All I need to do is swap my files for IPCs - child's play. It's running at a writeback rate of 600-700 rows/s, which is pretty creditable given the number of IPC's and the miserable config of my dev box. It should do better on a box with more procs.

Am I to take it then that the answers to my questions are therefore:

Q1. Is there anything else I can do to make Pivot faster? Buy a bigger box.
Q2. Is Pivot just rubbish? Yes. Should I go back to my old method with the Collector? Kind of.
Q3. Is there another method on Server 7.5.1a I could use? (Without breaking it up into multiple jobs that communicate through named pipes - too much trouble) See Ray's post.


The only thing I don't like about this is that it's heavier on maintenance. If a column changes or is added, I have to make the change in both IPCs. Not such a huge problem here, but I have two other jobs that pivot one row into twelve. But that's support's problem... :)
Ross Leishman
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It occured to me that the IPC stages are not required in my second design, which will make maintenance even easier. I've editied it to eliminate them.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply