Hi there,
I have a sequential file stage which I want to read data into 2 oracle tables.
table 1 will contain records which satisfy the pk (product#,vendor).
I'd like bad records (records missing pk data, or duplicates) to get routed to table 2.
I've already got bad records with missing pk data going over to table 2, but do not know how to get duplicates to go there as well.
I'm not even sure where to begin with recognizing duplicates within DataStage.
handling pk violations
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Pre-load a single-column hashed file with the PK values from the Oracle table and perform a reference lookup to determine whether the PK already exists.
To determine whether there are any duplicates in the source data, sort it (use UNIX sort or, if your budget allows, SyncSort) then use stage variables to determine whether the PK value in this row is the same as that in the previous row. Or have your job update the hashed file containing PK values.
Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
To determine whether there are any duplicates in the source data, sort it (use UNIX sort or, if your budget allows, SyncSort) then use stage variables to determine whether the PK value in this row is the same as that in the previous row. Or have your job update the hashed file containing PK values.
Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
Thanks for the reply guys,
Whats happening is when I load the data from a flat file into table 1 the process simply stops once a duplicate is encountered (product#,vendor#) - rather than continuing to process the rest of the file.
I'm handling missing data with an expression in the constraints header. So, before the pk can even be enforced by oracle, DataStage has already handled records with missing data by outputting them to table 2.
The problem is that I don't know how to create an expression in the constraints header (like the one that traps records with missing data) which can trap a duplicate record so that I can redirect it to table 2.
Thanks!
Whats happening is when I load the data from a flat file into table 1 the process simply stops once a duplicate is encountered (product#,vendor#) - rather than continuing to process the rest of the file.
I'm handling missing data with an expression in the constraints header. So, before the pk can even be enforced by oracle, DataStage has already handled records with missing data by outputting them to table 2.
The problem is that I don't know how to create an expression in the constraints header (like the one that traps records with missing data) which can trap a duplicate record so that I can redirect it to table 2.
Thanks!
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Do you know how to use stage variables to compare current row against previous row? With sorted input you can use stage variables to check for duplicates, and the value of the IsDifferent stage variable (if you called it that) can drive the output constraint expression.
You can construct a more complex mechanism using DataStage BASIC via a transform function whereby a searchable list is maintained in a variable defined to be COMMON. This technique is taught on Ascential "Programming with DataStage BASIC" class.
Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
You can construct a more complex mechanism using DataStage BASIC via a transform function whereby a searchable list is maintained in a variable defined to be COMMON. This technique is taught on Ascential "Programming with DataStage BASIC" class.
Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518