handling pk violations

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sweetleaf
Participant
Posts: 30
Joined: Fri Jan 24, 2003 3:28 pm
Location: Canada

handling pk violations

Post by sweetleaf »

Hi there,

I have a sequential file stage which I want to read data into 2 oracle tables.
table 1 will contain records which satisfy the pk (product#,vendor).
I'd like bad records (records missing pk data, or duplicates) to get routed to table 2.
I've already got bad records with missing pk data going over to table 2, but do not know how to get duplicates to go there as well.
I'm not even sure where to begin with recognizing duplicates within DataStage.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Pre-load a single-column hashed file with the PK values from the Oracle table and perform a reference lookup to determine whether the PK already exists.
To determine whether there are any duplicates in the source data, sort it (use UNIX sort or, if your budget allows, SyncSort) then use stage variables to determine whether the PK value in this row is the same as that in the previous row. Or have your job update the hashed file containing PK values.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
trobinson
Participant
Posts: 208
Joined: Thu Apr 11, 2002 6:02 am
Location: Saint Louis
Contact:

Post by trobinson »

Why aren't the duplicates being treated the same way as the bad PK records. If the Oracle PK unique constraint is active on table 1 then the bad PKs and the duplicates should fail on an INSERT into table 1. Is so, they can be routed to table 2 via a constraint. What am I missing?
sweetleaf
Participant
Posts: 30
Joined: Fri Jan 24, 2003 3:28 pm
Location: Canada

Post by sweetleaf »

Thanks for the reply guys,

Whats happening is when I load the data from a flat file into table 1 the process simply stops once a duplicate is encountered (product#,vendor#) - rather than continuing to process the rest of the file.

I'm handling missing data with an expression in the constraints header. So, before the pk can even be enforced by oracle, DataStage has already handled records with missing data by outputting them to table 2.

The problem is that I don't know how to create an expression in the constraints header (like the one that traps records with missing data) which can trap a duplicate record so that I can redirect it to table 2.

Thanks!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do you know how to use stage variables to compare current row against previous row? With sorted input you can use stage variables to check for duplicates, and the value of the IsDifferent stage variable (if you called it that) can drive the output constraint expression.
You can construct a more complex mechanism using DataStage BASIC via a transform function whereby a searchable list is maintained in a variable defined to be COMMON. This technique is taught on Ascential "Programming with DataStage BASIC" class.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
Post Reply