Degree of parallelism for Oracle Entreprise

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
c_wf
Participant
Posts: 14
Joined: Fri Mar 04, 2005 5:35 pm

Degree of parallelism for Oracle Entreprise

Post by c_wf »

The following comes from Datastage EE's document:

Partition Table. Specifies execution of the SELECT in parallel on the
processing nodes containing a partition derived from the named table. If
you do not specify this, the stage executes the query sequentially on a
single node.


Here is my question:
Say I define 4 nodes in my APT_CONFIG_FILE. And I use Oracle Enterprise Stage to read a CONSUMER table to a dataset. How will Partition Table option affect the degree of parallelism? There should be 4 query processes reading CONSUMER in EE, are there any slave process under each EE's process?

Thx
bored to death
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you do not specify the partition table, then your SELECT will operate in sequential mode. If there is an Oracle table that specifies that your CUSTOMER data can be read four-way parallel then naming that table will allow DataStage to take advantage of it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Post by jasper »

c_wf,
from experience:this is not as powerfull as it looks. It is quick and dumb partitioning.
Example: if you extract from one table you mention the same table as partition table. Datastage will look for min and max rowid, split this up in even parts and then fire 4 queries with the added clause rowid between X and Y.
If you have a join-query defined, you pick one of the tables in the query and again DS will add rowid between. (check your plans to find which table is the best candidate)

So if you add a whereclause, DS will not partition on what you expect but on the full table, mostly leading to one or two partitions returning all data, and other partitions returning 0 rows.

There's numberous things that can improve here, why can't it look at the way the table is partitioned for example.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

I agree with Jasper, I ran a test on it and found that my very specific job ran a bit slower when reading the table parallel than it did when reading it sequentially. But that was my job. You could try the same thing. In your Oracle stage try using it and post your results.
c_wf
Participant
Posts: 14
Joined: Fri Mar 04, 2005 5:35 pm

Post by c_wf »

ray.wurlod wrote:If you do not specify the partition table, then your SELECT will operate in sequential mode. If there is an Oracle table that specifies that your CUSTOMER data can be read four-way parallel then naming that table will allow DataStage to take advantage of it.
Hi Ray, you said SELECT will operate in sequential mode, do you mean DataStage will just send one SELECT statement to Oracle EE? And how Oracle EE handle this query? Oracle EE has parallel query function, will it use parallel query?

Thx
bored to death
c_wf
Participant
Posts: 14
Joined: Fri Mar 04, 2005 5:35 pm

Post by c_wf »

jasper wrote:c_wf,
from experience:this is not as powerfull as it looks. It is quick and dumb partitioning.
Example: if you extract from one table you mention the same table as partition table. Datastage will look for min and max rowid, split this up in even parts and then fire 4 queries with the added clause rowid between X and Y.
If you have a join-query defined, you pick one of the tables in the query and again DS will add rowid between. (check your plans to find which table is the best candidate)

So if you add a whereclause, DS will not partition on what you expect but on the full table, mostly leading to one or two partitions returning all data, and other partitions returning 0 rows.

There's numberous things that can improve here, why can't it look at the way the table is partitioned for example.
Sweet. Thx!
bored to death
Post Reply