Calling DB2 Stored Proc multiple times in Parallel from PX

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Calling DB2 Stored Proc multiple times in Parallel from PX

Post by gagan8877 »

Hi All

My requirement is to call a DB2 stored proc 25 times in parallel with different INPUT parameters. The sp returns 2 status columns in a single row. We are going to use Stored Procedure Stage to run SPs. We have an after job subroutine that tests if the value returned by the sp was a success or a failure. Each sp execution might take 5 minutes or so. The questions is: which is a better approach and why?

1. Run the job with STP stage multiple times with different invocation Ids in a loop in the sequence. i.e. Start SPs in a sequential fashion, without waiting for the previous execution (sp) to finish.

2. Can we acheive the above when passing rows to STP stage in a single job run? Can STP stage start stored procs with each input row, without waiting on the previous sp run (is there a property that can be set) to finish or does it run each row sequentially i.e. waits for the previous execution to finish before calling the next one (the next input parameters row)?

3. Is there another easier or better way in terms of performance?

Thanks
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

DataStage will run the procedure sequentially, per processing node. So if you had a 25-way configuration and used round-robin partitioning you would get parallelism. Otherwise option (1) would be the way to go. Both assume that the stored procedure is more time-consuming than the DataStage overhead - otherwise you will be getting a net loss by having so much DataStage job management going on.

Does your database reside on the same machine as the DataStage server?
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Post by gagan8877 »

Hi ArndW

Thanks for your valuable answer.

I have a question about option 2 answer. You said "DataStage will run the procedure sequentially, per processing node." Does that mean Stored Procedure Stage will run an instance of stored proc on each node with different parameters coming from the source link or will it run multiple instances of the stored proc on different nodes with the same parameter on all nodes (1 src row parameter on all nodes) at a time?

I would think that it is sending different parameters to different nodes, but I remember reading a post that said that STG should be run sequentially or otherwise it would invoke the same sp with same parameters multiple times, as there are nodes.

DS Server (windows) and DB2 Server (linux) are on different machines.
ArndW wrote:DataStage will run the procedure sequentially, per processing node. So if you had a 25-way configuration and used round-robin partitioning you would get parallelism. Otherwise option (1) would be the way to go. Both assume that the stored procedure is more time-consuming than the DataStage overhead - otherwise you will be getting a net loss by having so much DataStage job management going on.

Does your database reside on the same machine as the DataStage server?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I am not sure how the database stage would get the same parameters in different nodes, as those parameters will be values from the data and thus each instance would be separate and distinct.
Post Reply