Change Capture stage vs Difference stage driving me nuts

djbarham · Post by **djbarham** » Tue Aug 02, 2011 1:14 am

I'm trying to compare a set of potential updates with existing data on a table. The updates have a natural key which is also on the table but is not the primary key (it has a surrogate key).

the catch is that I also want to bring the primary (surrogate) key through for changed records so I can use it to apply updates.

I have done extensive searching of this forum, without a solution.

The documentation and behavior of the Difference Stage seems to indicate that it can also bring through non-key and non-value columns whereas the Change Capture stage only allows columns that appear on both inputs to be used at the output.

The doco for the difference stage says:

If the before and after data have the same column names, then one data set effectively overwrites the other data set and so you only see one set of columns in the output. Which data set is output is controlled by the settings on the Link Order tab and the Mapping tab. If your before and after data sets have different column names, columns from both data sets are available to be output as set on the Mapping tab. Any columns that are designated as key or value columns in the input data sets must have the same names.

So, it is saying that

columns whose names don't match will be available on output
columns whose names do match will have the value available at output based on the input ordering AND you will get a warning
"value" columns (used for comparing) MUST have matching names

End result ... I get a warning for every value column on every row. Surely it is not meant to do this. Am I doing something wrong?

Can I get additional columns to the output using the Change Capture stage?

ray.wurlod · Post by **ray.wurlod** » Tue Aug 02, 2011 3:46 pm

Why not just use the Slowly Changing Dimensions stage?

djbarham · Post by **djbarham** » Tue Aug 02, 2011 4:33 pm

OK, I'll admit that this had not occurred to me. The table is not a dimension, it is not tracking changes over time, nor is the structure in any way a star schema, but sure, if you think the SCD stage might still be useful for this, I'll have a look.

Still interested in feedback on the Difference stage because apart from the warnings it logs, its functionality seems to meet my requirements.

djbarham · Post by **djbarham** » Tue Aug 02, 2011 4:45 pm

OK, I had a quick look at the SCD stage. I'm not sure I'm seeing how it applies. Maybe I didn't describe my requirements very well ... actually, I didn't really describe my requirements at all.

So, here is the functionality I require.

I have from an external source an image of how my record should look. I need to compare this with the existing record in our database.

If the record exists and the fields match, no action is taken.

If the record exists and fields differ, it is updated.

If the record does not exist, a new one is created.

Note that the existing record has additional columns that are not part of this comparison. The natural key used to find the record is not the primary key to the table.

Seems to me that this is what the Difference and the Change Capture stages are meant to achieve.