duplicate rows from oracle when running in parallel

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
hiltsmi
Participant
Posts: 20
Joined: Thu Aug 04, 2005 9:03 am

duplicate rows from oracle when running in parallel

Post by hiltsmi »

My source is a table on Oracle 9i. In the Oracle 9i stage I have set it to work in parallel. When I run the job I am getting duplicate records from Oracle.

How can I set up the stage so that each partition gets a unique set of rows to process?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

By default that is what PX does, it doesn't send all rows down all streams unless you explicitly tell it do do so by specifying "entire" for the partitioning algorithm.

How are you ascertaining the row duplications in your job?
hiltsmi
Participant
Posts: 20
Joined: Thu Aug 04, 2005 9:03 am

Post by hiltsmi »

I have an Oracle stage going into a transformer stage which writes out to a flat file.

In the oracle stage I have changed from the default of sequential to parallel partitioning. It does not give me any options to specify entire , round robin etc...

In the transformer stage partitioning is set to the defaults which are parallel and auto.


The sequential file is set to the default which is auto collector.

I am seeing rows twice in the flat file.


I don't have a lot of experience with DataStage but I have used Informatica and in Informatica the query for each partition would be set up with a where clause to ensure that each partition got a unique set of rows from the source. But I don't see that functionality in DataStage.

What am I missing?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Odd.

Do you have a 2-node configuration and do the duplicate rows show up together in your text file? You should be using the "Auto" partitioning going into your transformer stage, and I cannot change my Oracle enterprise stage away from sequential here :(.
hiltsmi
Participant
Posts: 20
Joined: Thu Aug 04, 2005 9:03 am

Post by hiltsmi »

Yes it is a 2 node configuration. No the records are not always together, When I do a sort at the windows command line I see 2 of everything.

I can definitely change my oracle 9i oci stage from sequential to parallel.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I am mystified. Can you run the job on a temporary 1-node configuration file and see if the duplicates go away (or on a 4-node to see if they multiply). Also, do you specify the Oracle stage "partition" value?
Post Reply