Parallelism in oracle

dspxlearn · Post by **dspxlearn** » Wed Feb 22, 2006 11:11 pm

Hi all,

In one of our jobs we are extracting data from oracle using oracle enterprise stage.While extracting the data we are doing it in sequential mode and the stages following it in parallel.It is working fine now.Now, want the data from the source also to be extracted in parallel mode. So, when i give parallel mode in the oracle enterprise stage it will retrive the double no. of records since we have two nodes...

There is a transformer following to the stage..So, is there any other way around to achieve it...
If i change to parallel mode and use only one node in the 'node map constraint' option so that the records will execute in the first node itself it is working fine...But will it effect later........

dspxlearn · Post by **dspxlearn** » Thu Feb 23, 2006 3:18 am

Hi,

Are there any inputs from you guys on my requirement:D

ArndW · Post by **ArndW** » Thu Feb 23, 2006 3:22 am

I think that no answers have been forthcoming because the question is phrased around a misconception of how PX jobs will read from a database in parallel. You stated

So, when i give parallel mode in the oracle enterprise stage it will retrive the double no. of records

which is incorrect but forms the basis of your question; there is no need to reduce your APT_CONFIG file number of nodes to 1, as that would defeat the advantages of PX.

dspxlearn · Post by **dspxlearn** » Thu Feb 23, 2006 3:32 am

Hi ArndW,

That was just my trial..I did it just to test and it was working. But if i remove the 'node map constraint' option and still continue with the parallel mode , it will give double no of records(2 nodes). I can keep a remove duplicate stage between oracle stage and the transformer following it.But, it doesnt work in some cases where we get duplicate records from the source itself.
So,is there any other way to achieve this...

ArndW · Post by **ArndW** » Thu Feb 23, 2006 4:44 am

I can't call up DS now, but I do know you have to enter the partition table information on one tab, and then configure the serial/parallel tab on another. I know you will have to add the information on this tab so that each node get it's distinct data when running in parallel but you don't need to reduce your active nodes to 1 when running in parallel. Since I don't have Oracle available on my site I can't test it out, but you should be able to get this working and it would help to use the documentation for this portion as well.

dspxlearn · Post by **dspxlearn** » Thu Feb 23, 2006 5:33 am

Hi ArndW,

Since, Oracle stage is my source and doesnt have any inputs,It doesn't show up the input tab to select the different types of partitions.Let me elaborate it..

The flow is

DB2/Oracle stage --> Transformer --> aggregator,transformer etc..

I could implement the parallelism right from the first transformer but not inside DB2/Oracle.
If i implememt it in DB2 i will generate the duplicates(2*no. of records because we have two nodes) carried to transformer which is an additional burden in terms of processing.

So, if i have 100 records from the source only 100 records should be passed from DB2/Oracle stage to transformer.

Can can suggest me which documents to be reffered on the same.

aramachandra · Post by **aramachandra** » Thu Feb 23, 2006 1:41 pm

Hi

According to the orachestrate manual

"By default the oraread operator runs sequentially. However, you can direct it to
read from multiple partitions and run on all processing nodes in the default node"

Typically most reads from oracle are sequential. I tested out a simple job which takes data from oracle stage into a lookup and looking at the session/ connection information i did not see multiple oracle connections being established.

Again i have not tested by running a parellel query as i not quite sure how that will be channeled through the partitions.

Hope this helps, but orachestrate operators manual has some good information on this

arvind