Degree of parallelism for Oracle Entreprise

c_wf · Post by **c_wf** » Tue Feb 28, 2006 4:17 pm

The following comes from Datastage EE's document:

Partition Table. Specifies execution of the SELECT in parallel on the
processing nodes containing a partition derived from the named table. If
you do not specify this, the stage executes the query sequentially on a
single node.

Here is my question:
Say I define 4 nodes in my APT_CONFIG_FILE. And I use Oracle Enterprise Stage to read a CONSUMER table to a dataset. How will Partition Table option affect the degree of parallelism? There should be 4 query processes reading CONSUMER in EE, are there any slave process under each EE's process?

Thx

ray.wurlod · Post by **ray.wurlod** » Tue Feb 28, 2006 4:39 pm

If you do not specify the partition table, then your SELECT will operate in sequential mode. If there is an Oracle table that specifies that your CUSTOMER data can be read four-way parallel then naming that table will allow DataStage to take advantage of it.

jasper · Post by **jasper** » Wed Mar 01, 2006 2:26 am

c_wf,
from experience:this is not as powerfull as it looks. It is quick and dumb partitioning.
Example: if you extract from one table you mention the same table as partition table. Datastage will look for min and max rowid, split this up in even parts and then fire 4 queries with the added clause rowid between X and Y.
If you have a join-query defined, you pick one of the tables in the query and again DS will add rowid between. (check your plans to find which table is the best candidate)

So if you add a whereclause, DS will not partition on what you expect but on the full table, mostly leading to one or two partitions returning all data, and other partitions returning 0 rows.

There's numberous things that can improve here, why can't it look at the way the table is partitioned for example.

kwwilliams · Post by **kwwilliams** » Wed Mar 01, 2006 8:41 am

I agree with Jasper, I ran a test on it and found that my very specific job ran a bit slower when reading the table parallel than it did when reading it sequentially. But that was my job. You could try the same thing. In your Oracle stage try using it and post your results.

c_wf · Post by **c_wf** » Wed Mar 01, 2006 8:49 am

ray.wurlod wrote:If you do not specify the partition table, then your SELECT will operate in sequential mode. If there is an Oracle table that specifies that your CUSTOMER data can be read four-way parallel then naming that table will allow DataStage to take advantage of it.

Hi Ray, you said SELECT will operate in sequential mode, do you mean DataStage will just send one SELECT statement to Oracle EE? And how Oracle EE handle this query? Oracle EE has parallel query function, will it use parallel query?

Thx

c_wf · Post by **c_wf** » Wed Mar 01, 2006 8:50 am

jasper wrote:c_wf,
from experience:this is not as powerfull as it looks. It is quick and dumb partitioning.
Example: if you extract from one table you mention the same table as partition table. Datastage will look for min and max rowid, split this up in even parts and then fire 4 queries with the added clause rowid between X and Y.
If you have a join-query defined, you pick one of the tables in the query and again DS will add rowid between. (check your plans to find which table is the best candidate)

So if you add a whereclause, DS will not partition on what you expect but on the full table, mostly leading to one or two partitions returning all data, and other partitions returning 0 rows.

There's numberous things that can improve here, why can't it look at the way the table is partitioned for example.

Sweet. Thx!