Change Capture and unique constraints

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ASU_ETL_DEV
Premium Member
Premium Member
Posts: 43
Joined: Tue Sep 09, 2008 1:56 pm

Change Capture and unique constraints

Post by ASU_ETL_DEV »

Hello,
In order to discover the latest source table's inserts, updates and deletes (two-column primary key and additional unique index built on another column) I coded a change capture job flow composed of the steps outined below.
Job 5) aborts with over 50 warnings of unique constraint messages on the secondary unique index.
All primary key and secondary unique index columns have been flagged as key columns throughout the jobs. The jobs are running with a configuration file with 4 nodes.

I am trying to figure out whether I need to sequence the updates by first applying the deletes then the inserts and then the updates or whether the flow and options I chose have fundamental flaws.

Any help is appreciated.

Thanks.
Marco

Code: Select all

1) 
Sequence job that is controlling four parallel jobs: it executes job 2) and 3) concurrently and then 4) and 5) sequentially. 

Code: Select all

2) 
After job
   An Oracle Connector (Preserve partitioning=propagate; 
                        Execution mode=parallel) with Round Robin partitioned reads links to a 
   Transformer (Preserve partitioning=propagate; 
                Combinability mode=auto; 
                Execution mode=propagate) which links to an 
   (After) Dataset stage (Partition type=auto; 
                          Combinability mode=auto; 
                          Execution mode=parallel) 

Code: Select all

3) 
Before job
   An Oracle Connector (Preserve partitioning=propagate; 
                        Execution mode=parallel) with Round Robin partitioned reads links to a 
   (Before) Dataset stage (Partition type=auto; 
                           Combinability mode=auto; 
                           Execution mode=parallel)

Code: Select all

4) 
Change Capture job
   After Dataset stage (Preserve partitioning=propagate; 
                        Combinability mode=auto; 
                        Execution mode=parallel) together with the 
   Before Dataset stage (Preserve partitioning=propagate; 
                         Combinability mode=auto; 
                         Execution mode=parallel) link to a 
   Change Capture stage (Change keys=primary key and unique index columns; 
                         Change mode=Explicit keys, All values; 
                         Drop out for Copy=True; 
                         Drop out for Delete/Edit/Insert=False;
                         Execution mode=parallel; 
                         Combinability mode=auto; 
                         Preserve partitioning=Default(set);
                         INPUT TAB (both After and Before links): Partitioning type=Hash; 
                                                                  Perform sort=Yes, on the three key columns with usage=Sorting, Partitioning/Ascending) which links to a 
   (CCapture) Dataset stage (Partition type=auto; 
                             Combinability mode=auto; 
                             Execution mode=parallel) 

Code: Select all

5) 
Update target job
   CCapture Dataset stage (Execution mode=parallel; 
                           Combinability mode=auto; 
                           Preserve partitioning=propagate) links to a 
   Transformer stage (Execution mode=parallel; 
                      Combinability mode=auto; 
                      Preserve partitioning=propagate;
                      INPUT TAB: Partition type=Auto) which links to three
   (INS, UPD, DEL) Oracle Connectors (Execution mode=parallel;
                                      Preserve partitioning=propagate)
ASU Developer
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Did you verify that the unique constraint violation is for the columns you are mentioning as key in change capture, are there any other unique key such as the surrogate key which is generating old values?

Does it run without problem on one node?
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ASU_ETL_DEV
Premium Member
Premium Member
Posts: 43
Joined: Tue Sep 09, 2008 1:56 pm

Post by ASU_ETL_DEV »

Thank you for answering.

I verified and the unique constraint error is actually happening on the primary key index (they have very similar names), not the secondary unique one. It is a two-column primary key that is correctly defined in the table and throughout the jobs.

I am running the job on one node now and will report back with the results.
ASU Developer
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So if that does "fix" it and you want to go back to running on multiple nodes, we'll need to look at your partitioning.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ASU_ETL_DEV
Premium Member
Premium Member
Posts: 43
Joined: Tue Sep 09, 2008 1:56 pm

Post by ASU_ETL_DEV »

The run on one node did not fix it. Same story. I changed the secondary index to non-unique and did not flag its column as key in any of the jobs and the whole thing started working without further problem. I am wondering whether the Change Capture stage does not support multiple unique indexes and so we need to drop all but one.
ASU Developer
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

that means there is logic issue, it just means that the record which were unique on primary index, were not unique based on secondary index. You should look at data whether you can have both unique indexes or not.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'm confused (which doesn't happen often). What have indexes got to do with the Change Capture stage, which processes streams of data?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ASU_ETL_DEV
Premium Member
Premium Member
Posts: 43
Joined: Tue Sep 09, 2008 1:56 pm

Post by ASU_ETL_DEV »

In the property page of the Change Capture stage, in the Change Keys section, I specified the primary key columns. In the options section, as Change Mode, I selected "Explicit keys, all values".
In the Input tab I specified Hash partitioning with sort on the primary key columns for both Before and After datasets.
I fed the resulting dataset to three Oracle Connectors which processed the data according to the value in ChangeCode() (insert, update, delete). The "insert" connector is the one that gave the unique constraint messages.
ASU Developer
ASU_ETL_DEV
Premium Member
Premium Member
Posts: 43
Joined: Tue Sep 09, 2008 1:56 pm

Post by ASU_ETL_DEV »

If I completely reload the target with the same set of jobs I do not get this unique constraint error on the insert so I am wondering why this happens on an incremental load.
Note that I am just staging the source data to the target table and the source table has the same primary index and secondary unique index defined so the data should be good, no duplicate keys.
If I update the target through Oracle connectors that are all active at the same time, is there a specific sequence in which the updates are applied to the target table (e.g. deletes first, then inserts and updates)?
ASU Developer
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

None of that tells me anything about why you believe the stage makes any use of indexes. What it actually makes use of is the fact that the data are sorted by the defined key.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

And the change capture stage expects unique keys on both of its input links.

You won't get a warning or fatal error if there are duplicate keys, but the generated change_code may be inaccurate.

For debugging purposes, check for duplicate keys on each input link.

Mike
ASU_ETL_DEV
Premium Member
Premium Member
Posts: 43
Joined: Tue Sep 09, 2008 1:56 pm

Post by ASU_ETL_DEV »

Ray, I was not implying that the stage used the Oracle indexes or any other index. But the stage sorts the before/after data according to key columns that in my case reflect the Oracle tables' primary key columns.
During the change discovery the stage assigns the insert change code to records that later cause unique constraint.

Mike, NLS is turned on and the source database has a different character set than the target so maybe the after and before datasets are not created equal in the key columns (and other columns as well).
ASU Developer
Post Reply