What defines a 'copy' in Change Capture

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

What defines a 'copy' in Change Capture

Post by bcarlson »

I have created the following process:

Code: Select all

before_dataset
               > ChangeCapture > change_dataset
after_dataset
I specified a change key (cust_acct_id), Change Mode is Explicit Keys, All Values, and I have 2 excluded value fields (period_dt, load_ts).

When I run this, I get change_code values of 1-3 (insert, delete, edit), but no 0's (copy). I DID specify Drop Output for Copy = FALSE, so I would have thought I'd get. I double checked the data and, with the exception of period_dt and load_ts (which should have been excluded), the data was identical in many rows.

So what's going on? I need to be able to account for all input records, including copies.

Any help would be greatly appreciated. Thanks!
Brad.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Do your before and after datasets have exactly the same metadata definitions? Did your before dataset have any additional transformations applied when it was originally populated such as data conversion or string trimming?
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

The inputs to the Change Capture stage are reads from 2 different DB2 tables with identical layouts (select *). The only other thing that I do to the input stream is sort and hash partition the data on the primary key.
kapil008
Participant
Posts: 20
Joined: Wed Sep 14, 2005 1:49 pm

Re: What defines a 'copy' in Change Capture

Post by kapil008 »

Actually, we too are having difficulty with DUPLICATE IDENTIFIER fatal error:

I have created the following process for Change Capture:


processing_table
> ChangeCapture > delta_table
archive_table

The change dataset has an additional column change_code. Run time propagation is set to on
and the change capture stage in parallel datastage job works fine, with the change_code
populated with either 0-3 values.


In sync with Change Capture, I am also using Change Apply stage to update
the archive table with the contents of delta_table


archive_table
> ChangeApply > archive_table
delta_table

Setup:

Archive_dataset table is exactly the same as the delta_table except that the delta_table table
has one additional column change_code as it should be. I have run time propogation set to on.
Also in the change_mode of Change_Apply stage I have set "Change Mode" value to "All Keys, Explict Values".
This set up is imperative for my processing to compare all avaliable keys and values.


Issue:

When I run the Change Apply stage, I get a "Fatal" error :

main_program: Added Field has duplicate indentifier(): CHANGE_CODE

So what's going on?

Any help would be greatly appreciated. Thanks!

Kapil
richdhan
Premium Member
Premium Member
Posts: 364
Joined: Thu Feb 12, 2004 12:24 am

Post by richdhan »

Hi Brad,

We had a similar issue and what we found was a problem with string columns.

Both the tables had string columns declared as varchar but during the processing of the job(after CDC stage) the metadata of the string column(s) had changed from varchar to char and back to char from varchar and loaded into the database.

Now eventhough the data as well as the metadata was same the CDC was not able to detect it. Pls ensure that the metadata throughout the entire job remains the same.

Are you getting any weird warnings from the job? Moreover what is the change_code for those rows which are identical. If it is 1 then monitor the metadata of change keys and if it is 3 monitor the metadata of change values.

HTH
--Rich
Post Reply