SMPs and process distribution

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

SMPs and process distribution

Post by gateleys »

Our DataStage server runs on an SMP with 4 processors. Now, let's say I have a simple job, with a flat-file as source and a transformer with outlinks to ORAOCI9 (just inserts) and Reject_file. With the inprocess and interprocess buffers left at default, will the server split the load among the processors for the inserts? I am using 7.0 server ed. on Windows. For a "simple-designed, but huge data responsibility" jobs, is there any way that I maximize the throughput via optimal utilization of these processors? Something in the direction of data partitioning and assigning to nodes in parallel systems, but with Server ed. resources.

Thanks.
gateleys
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Did you try using Link partitioner. As far as I have read from Rays post that in server, the only way to use parallelism is using the Link Partitioner stage.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Absolutely. Use job instances with a constraint that divides the rows evenly among the instances. Each job is responsible for a portion of the data. You can run 4 copies of the job, each inserting 1/4 of the rows. This is quite easy and elegant and we were doing this long before PX. :wink:

I've got a document on my website called Performance Analysis which covers this effort.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

DSguru2B wrote:Did you try using Link partitioner. As far as I have read from Rays post that in server, the only way to use parallelism is using the Link Partitioner stage.
I have used Link Partitioner and Collector in situations where the input rows can be seggregated based on certain conditions applied at the transformer in each partitions (between the partitioner and collector). However, if I just need to read from a sequential file, and via a transformer, load a table, I do NOT have any constraint which would be the basis for partitioning my data. If I used something like rownums (or some IDs) to split them, it would not be seamless. Can you suggest a design whereby the partitioner/collector set could be used in my case?

Thanks,
gateleys
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

kcbland wrote:Absolutely. Use job instances with a constraint that divides the rows evenly among the instances.
I've got a document on my website called Performance Analysis which covers this effort.
Hi Kenneth,
Sorry about yanking part of your response. Can you provide some more guidelines into creating multiple instances of the kind of job I was talking about? What would the design look like? And can you provide a link to your website also, please!!

Thanks,
gateleys
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

gateleys wrote:And can you provide a link to your website also, please!!
At the bottom of every one of his posts. :wink:
Last edited by chulett on Mon Mar 20, 2006 9:11 pm, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

DSguru2B wrote:As far as I have read from Rays post that in server, the only way to use parallelism is using the Link Partitioner stage.
You have misconstrued whatever it was you read. There are at least five different ways to effect partition parallelism in server jobs, including multiple independent streams in one job, multiple jobs, and multiple instances of multi-instance jobs. As well as Link Partitioner, and using a Transfomer stage to split the input streams into many, each with active processing downstream.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

oh!!
Last edited by gateleys on Tue Mar 21, 2006 7:30 am, edited 1 time in total.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

I really appreciate your comments. However, will someone please take me a step ahead of what Kenneth has suggested in terms of using job instances. How do I create it, and how do I specify constraints to split my input rows into different portions that can feed each processor. All I can see is the Allow multiple instances checkbox in the job properties.

Thanks,
gateleys
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Allowing multiple instances means you can run any number of copies of the same job at the same time.

Let's assume you are reading from a database table into a hashed file and that your key is a numeric field in both files. If you have a single instance of a job you would have your source stage do a SELECT from the table and write to the hashed file.

In a simple multi-instance version of the same job you could add a parameter called INSTANCENUMBER and call this job from a sequencer 3 times in parallel, each instance getting it's own unique name and passing a 0, 1, or 2 as the INSTANCENUMBER.

Your user SQL SELECT clause contains a

Code: Select all

WHERE MOD(KEY,3) EQ #INSTANCENUMBER#
which ensures that each instance get 1/3 of the records selected, assuming your KEY has an even distribution.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I guess you don't want the 10 page document that SHOWS you how to do this.

Do you see underneath all of my posts there's a "Posters Website" link?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

I tried parallelizing my process, with 3 separate parallel links from input feeding the output link. MOD(@INROWNUM,3) = PartitionNumber - 1 was used as constraint in the transformer, where the partition number is a job parameter, as specified in Kenneth's doc. The problem is that I get an error that the different processes cannot write to the same sequential file (which is my output). It seems there is a conflict over the target resource. How do I get over this? And yes, my job is defined as 'Allow multiple instance'.

Thanks,
gateleys
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Read closer and see that the output sequential file has to be uniquely named, in the filename include either a job parameter that is instance number aware (in my example I use PartitionNumber) or the invocation ID (a macro value available).
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Oooops!! Missed that. Yeah, Kenneth. Thanks again. The need came intuitively to me, and the possibility was addressed here in this forum, and the direction given here and in your performance doc. Thanks all of you.

gateleys
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Okay, but how about the good stuff. Are you moving more data and using more of your server now? Are ya happy? :lol:
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply