Diff between Server and parallel job in case or multiple CPU

dsedi · Post by **dsedi** » Tue Jul 04, 2006 5:58 am

Hi All

Just need to get my concept cleared here.

In case of server jobs in case of multiple processors, we are able to acheive parallelism by using IPC stage and partitioning using link partitioner and link collector stages.

Are these stages overheads which we donot come across in PX for parallelism. Secondly there is no pipeline parallelism in Server jobs while it is present in PX job.

Could anyone give his/her inputs on this.

Thanks in advance.

kcbland · Post by **kcbland** » Tue Jul 04, 2006 7:39 am

Server jobs can use multiple job instances to multi-process (partitioned parallelism). They have row buffering (quasi-pipeline parallelism) and inter-process buffering (active stages are independent processes communication thru fifo thus pipeline parallelism). IPC stages are just a user-placed fifo, not needed if using inter-process buffering because all active stages are using fifos.

A PX job using a 1-node node pool acts like a Server job. The more nodes in the node pool, the more degrees of partitioned parallelism. In a Server job, this would be the number of job instances needed for partitioned parallelism.

PX does all of the partitioning, data alignment to nodes, and high-performance database access seamlessly, so yes, there's a significantly more optimized route in the PX engine for this type of processing.

kumar_s · Post by **kumar_s** » Tue Jul 04, 2006 7:52 am

And IPC will be overhead only if the increase in nubmer of process (due to usage of IPC) become a nightmare to the CPUs capacity.

atul sharma · Post by **atul sharma** » Wed Jul 05, 2006 4:50 am

Kenneth thanks for your reply

According to your reply, it wont be wrong to say that both Parallel and Server job can have both forms of parallelism(partitioning and pipeline).

In parallel job, it is dependent on nodes specified in config file, while
In Server job, it is dependents on the presence of multiple CPUs which will indirectly give rise to multiple processes.

Just a spare of thought, In PX job also, we can acheive partition parallelism only when we have more than one CPUs. right?

How then are the two form of jobs different apart from the way we acheive parallelism in them?

I am getting a bit confused

ray.wurlod · Post by **ray.wurlod** » Wed Jul 05, 2006 6:05 am

Welcome aboard. :D

I expect to have a white paper called Parallelism in Server Jobs available in about a month. Please be patient. All will be revealed!

kcbland · Post by **kcbland** » Wed Jul 05, 2006 7:00 am

Partitioned Parallelism simply means that a set of data (rows) have more than one identical processes simultaneously working on different groups of the rows.

If you have 10 rows of data and 1 process, then you don't have partitioned parallelism. If you have 10 rows of data and 2 processes each handling 5 rows of data, you have partitioned parallelism. The means of dividing up which process gets which row is the "partitioning algorithm", which can be either an "every other row" (round-robin) or intelligent (hash) division of the rows.

An example is 5 people with shovels digging the same hole. That's parallelism. Someone must tell the 5 people what order they each take a shovel of dirt, that's partitioning. If it's each person takes a turn over and over, that's round-robin. If someone is putting some decision making into who goes next, that's hash.

Pipeline parallism is different, it means that your process is actually a chain of processes. If one person digs into the hole to get the dirt, then dumps the dirt into a wheelbarrow that another person takes away, then you have a pipeline. So your 5 people working each have 5 people with a wheelbarrow to take away their dirt. Now you have 10 processes, with 5 pipelines. The diggers are digging simultaneously, the wheelbarrows are moving simultaneously, but within the digger-wheelbarrow pipeline there is coordination.

When someone puts their dirt into someone else's wheelbarrow, that's what you call re-partitioning. You're moving your data outside your pipeline and giving it up.

dsedi · Post by **dsedi** » Thu Jul 06, 2006 6:25 am

Kenneth thanks a lot for explaining partitioning and pipelining parallelism with a day to day life example.

Could you please let me know that in below case

3 Processors, 3 stages (Source=> oracle stage, transformer stage, Target=>Dataset in case of PX and Sequential file in case of Server job)

Suppose in these two similar kind of jobs, we have inter process row buffering on in case of Server job, while having 3 node config file in case of PX job.

Will we be having Partitioning and Pipeline parallelism in both jobs and if yes then which will give better results in case of 1 million of records.

Thanks in advance.

kcbland · Post by **kcbland** » Thu Jul 06, 2006 7:40 am

In your example, you would have Partitioned Parallelism in the PX job because you would have 3 XFM stages active but in the Server job you would have 1. The PX job also would have the chance to have 3 readers accessing Oracle using some partitioning means, whereas the Server job would have 1. There's also a pipeline because the 3 readers would be talking to the 3 transformers. On the Server side, no inter-process or row buffering would help because there's only one active stage, you need two to gain that advantage.

atul sharma · Post by **atul sharma** » Thu Jul 06, 2006 11:15 pm

Hi Kenneth

As you mentioned that in above example, as there is 3 node config file, therefore there will be 3 transformer stages active at same time. Similarly there will be 3 reader stages and 3 loader stages loading data.

In case now, instead of using 3 node config file, I use a 1 node file, then how will i distinguish between server job and parallel job in above example.

Thanks

jasper · Post by **jasper** » Fri Jul 07, 2006 12:39 am

For me the major difference is more in the configuration: in a parallel job you define the number of parallel processes at runtime (by giving the config file.) In server you define this when designing the jobs.

So when you move from a 2 CPU machine to a 4 cPU-machine, most of the work for parallel is just doubling the nodes in the config file (and check off cource), for server you have to update and compile the jobs.

atul sharma · Post by **atul sharma** » Fri Jul 07, 2006 4:03 am

I just designed a simple job having Source Oracle stage which has around 5 lakh records. A transformer stage and a target stage Dataset.

I used Round Robin partitioning and found that:

3 node === 26016 rows per sec
2 node === 24932 rows per sec
1 node === 23016 rows per sec

Time taken for the job to finish in each case was as follows:

3 node === 30 sec
2 node === 30 sec
1 node === 30 sec

Could you please let me know which is giving me best performance in this case.

ray.wurlod · Post by **ray.wurlod** » Fri Jul 07, 2006 5:55 am

Define performance.

500,000 rows is not really many for this kind of test. Try with 5 crore rows and you will begin to see some differences.

It is probably also the case that at least one of your stages (Oracle) is operating in sequential mode. Are you using the Oracle enterprise stage? Is the Oracle table partitioned? Are you specifying this in the job design?

atul sharma · Post by **atul sharma** » Fri Jul 07, 2006 6:09 am

Yes Ray you must be right that one of the stage infact(Oracle enterprise stage) must be sequential. In the info which I obtained from Director after using Environment vartiable Dump_Score, it shows that

ds0: {op0[1p] (sequential Oracle_Enterprise_0)

The create script for the table which i am calling in the above stage has no partitions mentioned.

Are we suppose to have table partitioned(range, hash etc) for gaining benefits of partion parallelism.

But I am explicitly mentioning the Round Robin partitioning in the input tab of transformer. Will that not help the cause.

warm regards
Atul

ray.wurlod · Post by **ray.wurlod** » Fri Jul 07, 2006 10:03 pm

Without using a partitioned table, you are limited to the speed at which Oracle can deliver rows. For SELECT, the Oracle Enterprise stage can only operate in sequential mode, so that is and will remain your principal bottleneck.

Round robin and random are the best algorithms for even distribution of rows.

kcbland · Post by **kcbland** » Fri Jul 07, 2006 10:36 pm

atul sharma wrote:I just designed a simple job having Source Oracle stage which has around 5 lakh records. A transformer stage and a target stage Dataset.

blah blah blah

Could you please let me know which is giving me best performance in this case.

In your example, you simply allowed PX to run more processes to do the same work. At some point your return diminishes because the OS is spending more time starting/stopping/managing tasks. You can't double performance with 2 nodes over 1 without adding more CPUs if your CPUs were already heavily utilized. If 1-2 node improvement was +1900 rows/sec, and 2-3 was +1100 rows/sec, a 4 node should be about the same as a 3 node, and 5 nodes should be worse than 3 nodes.

Just because PX can automatically spawn multiples of processes to tackle the data doesn't mean it's going to be faster. A beginner ETL person with PX will always be beat by an experienced ETL person with just about any other tools (Server, SQL, java, hell, even Informatica

). The point is you have to look at everything and tune the equation.

As Ray points out, the database has to be designed, modeled, implemented, tuned, and maintained. A seasoned ETL developer knows what the database architecture should look like since they've probably been in environments setup for high-performance processing and high-volume storage. It makes my teeth itch when someone doesn't understand "SELECT * from bigtable" will take a long time to spool to a file. But many ""SELECT * from bigtable.partitionX" processing spooling simultaneously would be a better design.