My source is a table on Oracle 9i. In the Oracle 9i stage I have set it to work in parallel. When I run the job I am getting duplicate records from Oracle.
How can I set up the stage so that each partition gets a unique set of rows to process?
duplicate rows from oracle when running in parallel
Moderators: chulett, rschirm, roy
By default that is what PX does, it doesn't send all rows down all streams unless you explicitly tell it do do so by specifying "entire" for the partitioning algorithm.
How are you ascertaining the row duplications in your job?
How are you ascertaining the row duplications in your job?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I have an Oracle stage going into a transformer stage which writes out to a flat file.
In the oracle stage I have changed from the default of sequential to parallel partitioning. It does not give me any options to specify entire , round robin etc...
In the transformer stage partitioning is set to the defaults which are parallel and auto.
The sequential file is set to the default which is auto collector.
I am seeing rows twice in the flat file.
I don't have a lot of experience with DataStage but I have used Informatica and in Informatica the query for each partition would be set up with a where clause to ensure that each partition got a unique set of rows from the source. But I don't see that functionality in DataStage.
What am I missing?
In the oracle stage I have changed from the default of sequential to parallel partitioning. It does not give me any options to specify entire , round robin etc...
In the transformer stage partitioning is set to the defaults which are parallel and auto.
The sequential file is set to the default which is auto collector.
I am seeing rows twice in the flat file.
I don't have a lot of experience with DataStage but I have used Informatica and in Informatica the query for each partition would be set up with a where clause to ensure that each partition got a unique set of rows from the source. But I don't see that functionality in DataStage.
What am I missing?
Odd.
Do you have a 2-node configuration and do the duplicate rows show up together in your text file? You should be using the "Auto" partitioning going into your transformer stage, and I cannot change my Oracle enterprise stage away from sequential here .
Do you have a 2-node configuration and do the duplicate rows show up together in your text file? You should be using the "Auto" partitioning going into your transformer stage, and I cannot change my Oracle enterprise stage away from sequential here .
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I am mystified. Can you run the job on a temporary 1-node configuration file and see if the duplicates go away (or on a 4-node to see if they multiply). Also, do you specify the Oracle stage "partition" value?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>