Page 2 of 2

Posted: Tue Nov 12, 2013 7:53 am
by kwwilliams
ROWID is Oracle row identifier and is unique for every single row in your table. Without knowing more specifics of your job, there are only two reasons that I can think of that you would have duplicates:

1. As Ray said you have duplicates in your source table
2. You are not partitioning and/or sorting the data properly prior to your join stage.

I'm thinking it's probably number 2. How are you partitioning and sorting the data prior to joining to the 7 million records. Partitioning should be hash with using the field or fields that you are using in the join stage. Sorting should also be by the fields used in the join.

Posted: Tue Nov 12, 2013 11:24 am
by bobyon
I'm stretching here

I presume you are sorting before the join. If you're using a DataStage sort stage have you considered adjusting the memory used to handle the larger record lenghts?