CDC and comparing NULLS
Posted: Tue Jul 03, 2007 10:44 pm
Dear DataStager's
***************
Problem definition
***************
We have a third party tool change data capture tool called OMNI that identifies all changes to our SAP source data and generates .csv files for consumption into data warehouse
Under a given situation, I need to compare all the change data generated in csv by the OMNI tool to our persistant stage table data for the same source table to identify true changes.
For example
I could have a flat file .csv called TESTTABLE.csv that has 10 lines of data indicating 10 changes in the source system
I am reading the TESTTABLE from the database for those 10 keys as in the csv to compare the table and the file data to identify which is an insert,update delete etc
But when comparing NULLS for a nullable field the data when read from the CSV and data read from the database are not viewed by the cdc stage as being same and it generates a update change code thinking the record is changed
IN other words my .csv in the above example having 10 lines could have only one line that really changed in the source
Bottom line is as follows
I read one record from the table and read the same record from the .csv file and fields which are NULL seem to be represented differently and hence my CDC FAILs to mark them as a NON change
Any one run into a similar issue.
I am trying to figure a generic solution so that i do not have to go column by column for each table to define a consistent NULL value representation
I am sorry for the really long post but please let me know if there are situations like above that anyone has addressed
Thanks
arvind
***************
Problem definition
***************
We have a third party tool change data capture tool called OMNI that identifies all changes to our SAP source data and generates .csv files for consumption into data warehouse
Under a given situation, I need to compare all the change data generated in csv by the OMNI tool to our persistant stage table data for the same source table to identify true changes.
For example
I could have a flat file .csv called TESTTABLE.csv that has 10 lines of data indicating 10 changes in the source system
I am reading the TESTTABLE from the database for those 10 keys as in the csv to compare the table and the file data to identify which is an insert,update delete etc
But when comparing NULLS for a nullable field the data when read from the CSV and data read from the database are not viewed by the cdc stage as being same and it generates a update change code thinking the record is changed
IN other words my .csv in the above example having 10 lines could have only one line that really changed in the source
Bottom line is as follows
I read one record from the table and read the same record from the .csv file and fields which are NULL seem to be represented differently and hence my CDC FAILs to mark them as a NON change
Any one run into a similar issue.
I am trying to figure a generic solution so that i do not have to go column by column for each table to define a consistent NULL value representation
I am sorry for the really long post but please let me know if there are situations like above that anyone has addressed
Thanks
arvind