SMPs and process distribution
Moderators: chulett, rschirm, roy
SMPs and process distribution
Our DataStage server runs on an SMP with 4 processors. Now, let's say I have a simple job, with a flat-file as source and a transformer with outlinks to ORAOCI9 (just inserts) and Reject_file. With the inprocess and interprocess buffers left at default, will the server split the load among the processors for the inserts? I am using 7.0 server ed. on Windows. For a "simple-designed, but huge data responsibility" jobs, is there any way that I maximize the throughput via optimal utilization of these processors? Something in the direction of data partitioning and assigning to nodes in parallel systems, but with Server ed. resources.
Thanks.
gateleys
Thanks.
gateleys
Absolutely. Use job instances with a constraint that divides the rows evenly among the instances. Each job is responsible for a portion of the data. You can run 4 copies of the job, each inserting 1/4 of the rows. This is quite easy and elegant and we were doing this long before PX.
I've got a document on my website called Performance Analysis which covers this effort.
![Wink :wink:](./images/smilies/icon_wink.gif)
I've got a document on my website called Performance Analysis which covers this effort.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
I have used Link Partitioner and Collector in situations where the input rows can be seggregated based on certain conditions applied at the transformer in each partitions (between the partitioner and collector). However, if I just need to read from a sequential file, and via a transformer, load a table, I do NOT have any constraint which would be the basis for partitioning my data. If I used something like rownums (or some IDs) to split them, it would not be seamless. Can you suggest a design whereby the partitioner/collector set could be used in my case?DSguru2B wrote:Did you try using Link partitioner. As far as I have read from Rays post that in server, the only way to use parallelism is using the Link Partitioner stage.
Thanks,
gateleys
Hi Kenneth,kcbland wrote:Absolutely. Use job instances with a constraint that divides the rows evenly among the instances.
I've got a document on my website called Performance Analysis which covers this effort.
Sorry about yanking part of your response. Can you provide some more guidelines into creating multiple instances of the kind of job I was talking about? What would the design look like? And can you provide a link to your website also, please!!
Thanks,
gateleys
At the bottom of every one of his posts.gateleys wrote:And can you provide a link to your website also, please!!
![Wink :wink:](./images/smilies/icon_wink.gif)
Last edited by chulett on Mon Mar 20, 2006 9:11 pm, edited 1 time in total.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You have misconstrued whatever it was you read. There are at least five different ways to effect partition parallelism in server jobs, including multiple independent streams in one job, multiple jobs, and multiple instances of multi-instance jobs. As well as Link Partitioner, and using a Transfomer stage to split the input streams into many, each with active processing downstream.DSguru2B wrote:As far as I have read from Rays post that in server, the only way to use parallelism is using the Link Partitioner stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I really appreciate your comments. However, will someone please take me a step ahead of what Kenneth has suggested in terms of using job instances. How do I create it, and how do I specify constraints to split my input rows into different portions that can feed each processor. All I can see is the Allow multiple instances checkbox in the job properties.
Thanks,
gateleys
Thanks,
gateleys
Allowing multiple instances means you can run any number of copies of the same job at the same time.
Let's assume you are reading from a database table into a hashed file and that your key is a numeric field in both files. If you have a single instance of a job you would have your source stage do a SELECT from the table and write to the hashed file.
In a simple multi-instance version of the same job you could add a parameter called INSTANCENUMBER and call this job from a sequencer 3 times in parallel, each instance getting it's own unique name and passing a 0, 1, or 2 as the INSTANCENUMBER.
Your user SQL SELECT clause contains a which ensures that each instance get 1/3 of the records selected, assuming your KEY has an even distribution.
Let's assume you are reading from a database table into a hashed file and that your key is a numeric field in both files. If you have a single instance of a job you would have your source stage do a SELECT from the table and write to the hashed file.
In a simple multi-instance version of the same job you could add a parameter called INSTANCENUMBER and call this job from a sequencer 3 times in parallel, each instance getting it's own unique name and passing a 0, 1, or 2 as the INSTANCENUMBER.
Your user SQL SELECT clause contains a
Code: Select all
WHERE MOD(KEY,3) EQ #INSTANCENUMBER#
I guess you don't want the 10 page document that SHOWS you how to do this.
Do you see underneath all of my posts there's a "Posters Website" link?
Do you see underneath all of my posts there's a "Posters Website" link?
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
I tried parallelizing my process, with 3 separate parallel links from input feeding the output link. MOD(@INROWNUM,3) = PartitionNumber - 1 was used as constraint in the transformer, where the partition number is a job parameter, as specified in Kenneth's doc. The problem is that I get an error that the different processes cannot write to the same sequential file (which is my output). It seems there is a conflict over the target resource. How do I get over this? And yes, my job is defined as 'Allow multiple instance'.
Thanks,
gateleys
Thanks,
gateleys
Read closer and see that the output sequential file has to be uniquely named, in the filename include either a job parameter that is instance number aware (in my example I use PartitionNumber) or the invocation ID (a macro value available).
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Okay, but how about the good stuff. Are you moving more data and using more of your server now? Are ya happy? ![Laughing :lol:](./images/smilies/icon_lol.gif)
![Laughing :lol:](./images/smilies/icon_lol.gif)
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle