Parallel Extender usage on non-partitioned database

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sathyanveshi
Participant
Posts: 66
Joined: Tue Dec 07, 2004 12:48 pm

Parallel Extender usage on non-partitioned database

Post by sathyanveshi »

Hi,

My source database from where I extract the data has no partitions. If I use Parallel Extender for loading tables from source to target, and my source doesn't have partitions on it, then whether the use of Parallel Extender will add any value? I heard some contradictory statements which say that: when we use Parallel Extender on non-paritioned database and allow DataStage to parition, then that would add significant overhead on the DataStage processing.

Anyways, please give me your inputs.

Cheers,
Mohan
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

There are some databases where you can partition on the source (Oracle, DB2...), but the advantage of PX lies in the transformation stage. PX is capable of breaking up the data into good set of partitions (as defined by your $APT_CONFIG_FILE), and handle each set individually.

This is why we strongly advise against using massive SQL statements in your source database. Basic SQL statements (where a = b) that minimize data transfer over networks are good, but nesting SQL statements usually do not perform as well as EE on equally-configurated computer systems.
jyc168
Participant
Posts: 2
Joined: Wed Dec 01, 2004 1:15 pm

Parallel Extender usage on non-partitioned database

Post by jyc168 »

T42 wrote:There are some databases where you can partition on the source (Oracle, DB2...), but the advantage of PX lies in the transformation stage. PX is capable of breaking up the data into good set of partitions (as defined by your $APT_CONFIG_FILE), and handle each set individually.

This is why we strongly advise against using massive SQL statements in your source database. Basic SQL statements (where a = b) that minimize data transfer over networks are good, but nesting SQL statements usually do not perform as well as EE on equally-configurated computer systems.
This is a question to T42, would you please expand a little as to the definition of "massive SQL statement" and "nesting SQL statement" in your reply. I'm not strong on SQL, some example will help me a lot. Thanks in advance.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Had T42 used the adjectives "simple" and "complex" to describe SQL perhaps his meaning would have been clearer.

He suggests that you use simple, straightfoward SQL statements to extract data, and let DataStage EE worry about the partitioning of those data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply