Parallelism in oracle

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dspxlearn
Premium Member
Premium Member
Posts: 291
Joined: Sat Sep 10, 2005 1:26 am

Parallelism in oracle

Post by dspxlearn »

Hi all,


In one of our jobs we are extracting data from oracle using oracle enterprise stage.While extracting the data we are doing it in sequential mode and the stages following it in parallel.It is working fine now.Now, want the data from the source also to be extracted in parallel mode. So, when i give parallel mode in the oracle enterprise stage it will retrive the double no. of records since we have two nodes... :shock: There is a transformer following to the stage..So, is there any other way around to achieve it...
If i change to parallel mode and use only one node in the 'node map constraint' option so that the records will execute in the first node itself it is working fine...But will it effect later........ :?: :?:
Thanks and Regards!!
dspxlearn
dspxlearn
Premium Member
Premium Member
Posts: 291
Joined: Sat Sep 10, 2005 1:26 am

Post by dspxlearn »

Hi,


Are there any inputs from you guys on my requirement:D
Thanks and Regards!!
dspxlearn
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I think that no answers have been forthcoming because the question is phrased around a misconception of how PX jobs will read from a database in parallel. You stated
So, when i give parallel mode in the oracle enterprise stage it will retrive the double no. of records
which is incorrect but forms the basis of your question; there is no need to reduce your APT_CONFIG file number of nodes to 1, as that would defeat the advantages of PX.
dspxlearn
Premium Member
Premium Member
Posts: 291
Joined: Sat Sep 10, 2005 1:26 am

Post by dspxlearn »

Hi ArndW,



That was just my trial..I did it just to test and it was working. But if i remove the 'node map constraint' option and still continue with the parallel mode , it will give double no of records(2 nodes). I can keep a remove duplicate stage between oracle stage and the transformer following it.But, it doesnt work in some cases where we get duplicate records from the source itself.
So,is there any other way to achieve this... 8)
Thanks and Regards!!
dspxlearn
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I can't call up DS now, but I do know you have to enter the partition table information on one tab, and then configure the serial/parallel tab on another. I know you will have to add the information on this tab so that each node get it's distinct data when running in parallel but you don't need to reduce your active nodes to 1 when running in parallel. Since I don't have Oracle available on my site I can't test it out, but you should be able to get this working and it would help to use the documentation for this portion as well.
dspxlearn
Premium Member
Premium Member
Posts: 291
Joined: Sat Sep 10, 2005 1:26 am

Post by dspxlearn »

Hi ArndW,

Since, Oracle stage is my source and doesnt have any inputs,It doesn't show up the input tab to select the different types of partitions.Let me elaborate it..

The flow is

DB2/Oracle stage --> Transformer --> aggregator,transformer etc..

I could implement the parallelism right from the first transformer but not inside DB2/Oracle.
If i implememt it in DB2 i will generate the duplicates(2*no. of records because we have two nodes) carried to transformer which is an additional burden in terms of processing. :shock:

So, if i have 100 records from the source only 100 records should be passed from DB2/Oracle stage to transformer.

Can can suggest me which documents to be reffered on the same.
:idea:
Thanks and Regards!!
dspxlearn
aramachandra
Participant
Posts: 55
Joined: Tue Sep 20, 2005 10:58 am

Post by aramachandra »

Hi

According to the orachestrate manual

"By default the oraread operator runs sequentially. However, you can direct it to
read from multiple partitions and run on all processing nodes in the default node"

Typically most reads from oracle are sequential. I tested out a simple job which takes data from oracle stage into a lookup and looking at the session/ connection information i did not see multiple oracle connections being established.

Again i have not tested by running a parellel query as i not quite sure how that will be channeled through the partitions.

Hope this helps, but orachestrate operators manual has some good information on this

arvind
Post Reply