Parallelism in Datastge

goutam · Post by **goutam** » Thu Feb 21, 2008 12:22 am

This question is about architecture of datastage. If i am correct, Datastge creates that much no of partition as specified in configuration file. So if i have with one node configuration file , can datastge achieve partition parallelism? Because I found that every partition algorithim partitions the input data as per no of nodes specified in configuration file.

Scope · Post by **Scope** » Thu Feb 21, 2008 12:32 am

you can get Parallelism not Partition Parallelism

goutam · Post by **goutam** » Thu Feb 21, 2008 12:53 am

Scope wrote:you can get Parallelism not Partition Parallelism

Do u mean to say , I can get pipeline parallelism but not partition parallelism?

Maveric · Post by **Maveric** » Thu Feb 21, 2008 1:04 am

ray.wurlod · Post by **ray.wurlod** » Thu Feb 21, 2008 1:37 am

... unless, of course, you design in partition parallelism, in a similar fashion as you would in server jobs. But please don't go down this path! Stop thinking like a server job designer and wrap your mind around how the Orchestrate engine works - it will look after data partitioning for you.

eostic · Post by **eostic** » Thu Feb 21, 2008 12:27 pm

Ray is correct. You want to let the framework do the work for you...and technically speaking, as it relates to parallelism, you get both. Operators (stages for the purpose of this discussion) run in their own process [not always, and this can be tweaked, but lets keep it simple]...so even with a single node config, you will have several processes running, performing pipeline partitioning. Once you start adding nodes, you will get partitioning also... of course, here is where you make decisions on what "kind" of partitioning (hash on a value, round-robin, etc.), but the "degree" of partitioning is dictated by the number of nodes (and of course, again, there are vast ways to tweak this). The key is that the Job you see in the Designer remains largely the same, whether you have 1 or 4 or 8 "nodes." It's too complex a subject for a paragraph here and there, but this is the general idea.

Ernie