The following comes from Datastage EE's document:
Partition Table. Specifies execution of the SELECT in parallel on the
processing nodes containing a partition derived from the named table. If
you do not specify this, the stage executes the query sequentially on a
single node.
Here is my question:
Say I define 4 nodes in my APT_CONFIG_FILE. And I use Oracle Enterprise Stage to read a CONSUMER table to a dataset. How will Partition Table option affect the degree of parallelism? There should be 4 query processes reading CONSUMER in EE, are there any slave process under each EE's process?
Thx
Degree of parallelism for Oracle Entreprise
Moderators: chulett, rschirm, roy
Degree of parallelism for Oracle Entreprise
bored to death
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If you do not specify the partition table, then your SELECT will operate in sequential mode. If there is an Oracle table that specifies that your CUSTOMER data can be read four-way parallel then naming that table will allow DataStage to take advantage of it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
c_wf,
from experience:this is not as powerfull as it looks. It is quick and dumb partitioning.
Example: if you extract from one table you mention the same table as partition table. Datastage will look for min and max rowid, split this up in even parts and then fire 4 queries with the added clause rowid between X and Y.
If you have a join-query defined, you pick one of the tables in the query and again DS will add rowid between. (check your plans to find which table is the best candidate)
So if you add a whereclause, DS will not partition on what you expect but on the full table, mostly leading to one or two partitions returning all data, and other partitions returning 0 rows.
There's numberous things that can improve here, why can't it look at the way the table is partitioned for example.
from experience:this is not as powerfull as it looks. It is quick and dumb partitioning.
Example: if you extract from one table you mention the same table as partition table. Datastage will look for min and max rowid, split this up in even parts and then fire 4 queries with the added clause rowid between X and Y.
If you have a join-query defined, you pick one of the tables in the query and again DS will add rowid between. (check your plans to find which table is the best candidate)
So if you add a whereclause, DS will not partition on what you expect but on the full table, mostly leading to one or two partitions returning all data, and other partitions returning 0 rows.
There's numberous things that can improve here, why can't it look at the way the table is partitioned for example.
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
I agree with Jasper, I ran a test on it and found that my very specific job ran a bit slower when reading the table parallel than it did when reading it sequentially. But that was my job. You could try the same thing. In your Oracle stage try using it and post your results.
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
Hi Ray, you said SELECT will operate in sequential mode, do you mean DataStage will just send one SELECT statement to Oracle EE? And how Oracle EE handle this query? Oracle EE has parallel query function, will it use parallel query?ray.wurlod wrote:If you do not specify the partition table, then your SELECT will operate in sequential mode. If there is an Oracle table that specifies that your CUSTOMER data can be read four-way parallel then naming that table will allow DataStage to take advantage of it.
Thx
bored to death
Sweet. Thx!jasper wrote:c_wf,
from experience:this is not as powerfull as it looks. It is quick and dumb partitioning.
Example: if you extract from one table you mention the same table as partition table. Datastage will look for min and max rowid, split this up in even parts and then fire 4 queries with the added clause rowid between X and Y.
If you have a join-query defined, you pick one of the tables in the query and again DS will add rowid between. (check your plans to find which table is the best candidate)
So if you add a whereclause, DS will not partition on what you expect but on the full table, mostly leading to one or two partitions returning all data, and other partitions returning 0 rows.
There's numberous things that can improve here, why can't it look at the way the table is partitioned for example.
bored to death