Link Partitioning/Collecting or Coded Round Robin Partitions

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
simong
Participant
Posts: 6
Joined: Mon Mar 08, 2004 9:06 am

Link Partitioning/Collecting or Coded Round Robin Partitions

Post by simong »

All,

I am attaining 5,000 Rows Per Second (I know it's bytes that really matter) from DB2 (Using DB2/UDB Object) through and IPC and out to a SQL Bulk/Load Object in DataStage.

I am looking to introduce a parallel alternative (Do not have PX yet) in order to use more than 1 CPU (I have 8 in total).

My question is should I use 4 SEPARATE jobs with the MOD(@INROWNUM, PartitionNumber) = Partitioncount - 1 method or should I be using Link Partitioner and Link Collector objects in a single job complete with a separate Active stage for each flow (such as a 1-to-1-mapping Transform Stage).

I have tried the later and no apparent performance improved occured.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Optimize your spool of data out of DB2. Write a job that is just DB2 --> xfm --> seq and see how fast that runs. Then, pick a numeric column, something like the primary key, and instantiate that job passing in the parameters to divide the data in the SQL. You'll find that the degree of instantiantion for spooling data from a database will be different than the degree of instantiation for transformation, and also the loading. You want the flexibility to maybe run the extract X 3, and the transformation X 10, and the loading via a bulk loader.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
santhu
Participant
Posts: 20
Joined: Fri Mar 12, 2004 3:07 am

Re: Link Partitioning/Collecting or Coded Round Robin Partit

Post by santhu »

santhu wrote:
Hi,

When you tried using IPC with Link partitioner and collector, did you enable "Inter-process" buffering option under "Performance" tab at the job level? This is must to get the best performance for multi-process jobs.

I agree completely with Kenneth's sugestion.

Write a job that is just DB2 --> xfm --> seq , and try testing the performance with "Multi job Instance" option enabled under Job properties.
Each instance of job with sequential file name, and different range of source key column data as parameters

Regards,
Santhosh S
simong
Participant
Posts: 6
Joined: Mon Mar 08, 2004 9:06 am

Post by simong »

Ok,

thanks for the advice. I will try this and get back to you early next week with my findings.

It is also worth mentioning that the SQL Bulk Loader seems fine in this process in terms of performance. I have heard a lot on this forum suggesting that the Bulk Loaders with DataStage are very poorly designed.

Is this still the case with Version 7 as I guess the alternative is to create a control file (based on a single output record) and upload via a sequential file instead?

Simon
simong
Participant
Posts: 6
Joined: Mon Mar 08, 2004 9:06 am

Post by simong »

I have each job set with the appropriate properties in terms of multiple instancing and inter-process buffering.

Are you guys talking about a Control job calling these 4-5 identical jobs (with different selection criteria) or executing them as 5 separate jobs from director? In fact does it make a difference?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

simong wrote:Are you guys talking about a Control job calling these 4-5 identical jobs (with different selection criteria) or executing them as 5 separate jobs from director? In fact does it make a difference?
You can either make physical clones, or use job instances. The affect is the same. An instance is just a copy of another job that is intimately linked to the master (shares the same job log, compiling the master deletes all instances, etc).
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
santhu
Participant
Posts: 20
Joined: Fri Mar 12, 2004 3:07 am

Post by santhu »

[quote="santhu"]

I think it's better to declare the job as Multi job instance rather than making N copies as any change in design in the latter scenario, you need to change in all 5 jobs, i.e redundant work.

In multi job instance, the same job design runs with different job parameters and can run concurrently.

Rgds,
Santhosh S
santhu
Participant
Posts: 20
Joined: Fri Mar 12, 2004 3:07 am

Post by santhu »

DB2 Bulk loader or any Bulk loader provided by Datastage internally calls the respective DB's Bulk loading utility. So it's not slow as it inserts without DB logging.

Thanks,
Santhosh S
Post Reply