duplicate rows from oracle when running in parallel

hiltsmi · Post by **hiltsmi** » Fri Nov 04, 2005 7:05 am

My source is a table on Oracle 9i. In the Oracle 9i stage I have set it to work in parallel. When I run the job I am getting duplicate records from Oracle.

How can I set up the stage so that each partition gets a unique set of rows to process?

ArndW · Post by **ArndW** » Fri Nov 04, 2005 8:07 am

By default that is what PX does, it doesn't send all rows down all streams unless you explicitly tell it do do so by specifying "entire" for the partitioning algorithm.

How are you ascertaining the row duplications in your job?

hiltsmi · Post by **hiltsmi** » Fri Nov 04, 2005 8:38 am

I have an Oracle stage going into a transformer stage which writes out to a flat file.

In the oracle stage I have changed from the default of sequential to parallel partitioning. It does not give me any options to specify entire , round robin etc...

In the transformer stage partitioning is set to the defaults which are parallel and auto.

The sequential file is set to the default which is auto collector.

I am seeing rows twice in the flat file.

I don't have a lot of experience with DataStage but I have used Informatica and in Informatica the query for each partition would be set up with a where clause to ensure that each partition got a unique set of rows from the source. But I don't see that functionality in DataStage.

What am I missing?

ArndW · Post by **ArndW** » Fri Nov 04, 2005 8:52 am

Odd.

Do you have a 2-node configuration and do the duplicate rows show up together in your text file? You should be using the "Auto" partitioning going into your transformer stage, and I cannot change my Oracle enterprise stage away from sequential here

.

hiltsmi · Post by **hiltsmi** » Fri Nov 04, 2005 9:09 am

Yes it is a 2 node configuration. No the records are not always together, When I do a sort at the windows command line I see 2 of everything.

I can definitely change my oracle 9i oci stage from sequential to parallel.

ArndW · Post by **ArndW** » Fri Nov 04, 2005 9:42 am

I am mystified. Can you run the job on a temporary 1-node configuration file and see if the duplicates go away (or on a 4-node to see if they multiply). Also, do you specify the Oracle stage "partition" value?