Performance issue reading data from Oracle connector

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

ROWID is Oracle row identifier and is unique for every single row in your table. Without knowing more specifics of your job, there are only two reasons that I can think of that you would have duplicates:

1. As Ray said you have duplicates in your source table
2. You are not partitioning and/or sorting the data properly prior to your join stage.

I'm thinking it's probably number 2. How are you partitioning and sorting the data prior to joining to the 7 million records. Partitioning should be hash with using the field or fields that you are using in the join stage. Sorting should also be by the fields used in the join.
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

I'm stretching here

I presume you are sorting before the join. If you're using a DataStage sort stage have you considered adjusting the memory used to handle the larger record lenghts?
Bob
Post Reply