Parallelism in Datastge

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
goutam
Premium Member
Premium Member
Posts: 109
Joined: Thu Jul 26, 2007 6:53 am

Parallelism in Datastge

Post by goutam »

This question is about architecture of datastage. If i am correct, Datastge creates that much no of partition as specified in configuration file. So if i have with one node configuration file , can datastge achieve partition parallelism? Because I found that every partition algorithim partitions the input data as per no of nodes specified in configuration file.
Goutam Sahoo
Scope
Premium Member
Premium Member
Posts: 63
Joined: Wed Jun 06, 2007 6:38 am
Location: Chennai

Post by Scope »

you can get Parallelism not Partition Parallelism
Kumarez
goutam
Premium Member
Premium Member
Posts: 109
Joined: Thu Jul 26, 2007 6:53 am

Post by goutam »

Scope wrote:you can get Parallelism not Partition Parallelism
Do u mean to say , I can get pipeline parallelism but not partition parallelism?
Goutam Sahoo
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

Yes.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

... unless, of course, you design in partition parallelism, in a similar fashion as you would in server jobs. But please don't go down this path! Stop thinking like a server job designer and wrap your mind around how the Orchestrate engine works - it will look after data partitioning for you.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Ray is correct. You want to let the framework do the work for you...and technically speaking, as it relates to parallelism, you get both. Operators (stages for the purpose of this discussion) run in their own process [not always, and this can be tweaked, but lets keep it simple]...so even with a single node config, you will have several processes running, performing pipeline partitioning. Once you start adding nodes, you will get partitioning also... of course, here is where you make decisions on what "kind" of partitioning (hash on a value, round-robin, etc.), but the "degree" of partitioning is dictated by the number of nodes (and of course, again, there are vast ways to tweak this). The key is that the Job you see in the Designer remains largely the same, whether you have 1 or 4 or 8 "nodes." It's too complex a subject for a paragraph here and there, but this is the general idea.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply