handling pk violations

sweetleaf · Post by **sweetleaf** » Wed Feb 19, 2003 11:34 am

Hi there,

I have a sequential file stage which I want to read data into 2 oracle tables.
table 1 will contain records which satisfy the pk (product#,vendor).
I'd like bad records (records missing pk data, or duplicates) to get routed to table 2.
I've already got bad records with missing pk data going over to table 2, but do not know how to get duplicates to go there as well.
I'm not even sure where to begin with recognizing duplicates within DataStage.

ray.wurlod · Post by **ray.wurlod** » Wed Feb 19, 2003 1:23 pm

Pre-load a single-column hashed file with the PK values from the Oracle table and perform a reference lookup to determine whether the PK already exists.
To determine whether there are any duplicates in the source data, sort it (use UNIX sort or, if your budget allows, SyncSort) then use stage variables to determine whether the PK value in this row is the same as that in the previous row. Or have your job update the hashed file containing PK values.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

trobinson · Post by **trobinson** » Wed Feb 19, 2003 2:35 pm

Why aren't the duplicates being treated the same way as the bad PK records. If the Oracle PK unique constraint is active on table 1 then the bad PKs and the duplicates should fail on an INSERT into table 1. Is so, they can be routed to table 2 via a constraint. What am I missing?

sweetleaf · Post by **sweetleaf** » Wed Feb 19, 2003 3:33 pm

Thanks for the reply guys,

Whats happening is when I load the data from a flat file into table 1 the process simply stops once a duplicate is encountered (product#,vendor#) - rather than continuing to process the rest of the file.

I'm handling missing data with an expression in the constraints header. So, before the pk can even be enforced by oracle, DataStage has already handled records with missing data by outputting them to table 2.

The problem is that I don't know how to create an expression in the constraints header (like the one that traps records with missing data) which can trap a duplicate record so that I can redirect it to table 2.

Thanks!

ray.wurlod · Post by **ray.wurlod** » Wed Feb 19, 2003 3:43 pm

Do you know how to use stage variables to compare current row against previous row? With sorted input you can use stage variables to check for duplicates, and the value of the IsDifferent stage variable (if you called it that) can drive the output constraint expression.
You can construct a more complex mechanism using DataStage BASIC via a transform function whereby a searchable list is maintained in a variable defined to be COMMON. This technique is taught on Ascential "Programming with DataStage BASIC" class.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518