Page 1 of 1

CDC and comparing NULLS

Posted: Tue Jul 03, 2007 10:44 pm
by pbatchu
Dear DataStager's

***************
Problem definition
***************

We have a third party tool change data capture tool called OMNI that identifies all changes to our SAP source data and generates .csv files for consumption into data warehouse

Under a given situation, I need to compare all the change data generated in csv by the OMNI tool to our persistant stage table data for the same source table to identify true changes.

For example

I could have a flat file .csv called TESTTABLE.csv that has 10 lines of data indicating 10 changes in the source system

I am reading the TESTTABLE from the database for those 10 keys as in the csv to compare the table and the file data to identify which is an insert,update delete etc

But when comparing NULLS for a nullable field the data when read from the CSV and data read from the database are not viewed by the cdc stage as being same and it generates a update change code thinking the record is changed

IN other words my .csv in the above example having 10 lines could have only one line that really changed in the source

Bottom line is as follows

I read one record from the table and read the same record from the .csv file and fields which are NULL seem to be represented differently and hence my CDC FAILs to mark them as a NON change

Any one run into a similar issue.

I am trying to figure a generic solution so that i do not have to go column by column for each table to define a consistent NULL value representation

I am sorry for the really long post but please let me know if there are situations like above that anyone has addressed

Thanks
arvind

Posted: Tue Jul 03, 2007 10:53 pm
by ArndW
Are you working with a PX or a Server environment (you posted in PX but the product is listed as server). The solution depends significantly on which flavour you are using.

The .csv file will have empty strings instead of nulls. You will need to convert these values to null in order to do your comparison with the actual database contents.

Posted: Wed Jul 04, 2007 6:39 am
by pbatchu
We are working on parallel edition

-

Posted: Wed Jul 04, 2007 6:39 am
by pbatchu
We are working mainly with px jobs.

This particular issue is when building a px job

PX jobs

Posted: Wed Jul 04, 2007 6:41 am
by pbatchu
We are working with px jobs

Re: CDC and comparing NULLS

Posted: Wed Jul 04, 2007 6:52 am
by JeroenDmt
Is the data you read in from the csv file and the data you read in from the database table both indeed NULL after you read them in datastage?

If you insert a peek stage for both of them and look at the actual values, do they show NULL or may an empty string?

For the csv file to regard an empty string value as a NULL value, you would have to set the 'null field value' property to an empty string.

Otherwise you might be comparing a NULL value from the table with an empty string from the csv file, which would indeed be an update.

Posted: Wed Jul 04, 2007 1:25 pm
by ray.wurlod
If you can have both "" and NULL, I would use a non-default representation of null, such as "<NULL>" - something that will never occur in data.

CDC works fine for null value

Posted: Tue Jul 10, 2007 3:18 pm
by thamark
We have tested the null comparison using CDC and it just works fine.

I guess one of the data is actually not having a null value, so this CDC returns as change. Please see the internal representation of the field, which is having a problem to identify the real issue.