CDC and comparing NULLS

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pbatchu
Charter Member
Charter Member
Posts: 20
Joined: Thu Aug 17, 2006 11:53 am
Location: Boise

CDC and comparing NULLS

Post by pbatchu »

Dear DataStager's

***************
Problem definition
***************

We have a third party tool change data capture tool called OMNI that identifies all changes to our SAP source data and generates .csv files for consumption into data warehouse

Under a given situation, I need to compare all the change data generated in csv by the OMNI tool to our persistant stage table data for the same source table to identify true changes.

For example

I could have a flat file .csv called TESTTABLE.csv that has 10 lines of data indicating 10 changes in the source system

I am reading the TESTTABLE from the database for those 10 keys as in the csv to compare the table and the file data to identify which is an insert,update delete etc

But when comparing NULLS for a nullable field the data when read from the CSV and data read from the database are not viewed by the cdc stage as being same and it generates a update change code thinking the record is changed

IN other words my .csv in the above example having 10 lines could have only one line that really changed in the source

Bottom line is as follows

I read one record from the table and read the same record from the .csv file and fields which are NULL seem to be represented differently and hence my CDC FAILs to mark them as a NON change

Any one run into a similar issue.

I am trying to figure a generic solution so that i do not have to go column by column for each table to define a consistent NULL value representation

I am sorry for the really long post but please let me know if there are situations like above that anyone has addressed

Thanks
arvind
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Are you working with a PX or a Server environment (you posted in PX but the product is listed as server). The solution depends significantly on which flavour you are using.

The .csv file will have empty strings instead of nulls. You will need to convert these values to null in order to do your comparison with the actual database contents.
pbatchu
Charter Member
Charter Member
Posts: 20
Joined: Thu Aug 17, 2006 11:53 am
Location: Boise

Post by pbatchu »

We are working on parallel edition
pbatchu
Charter Member
Charter Member
Posts: 20
Joined: Thu Aug 17, 2006 11:53 am
Location: Boise

-

Post by pbatchu »

We are working mainly with px jobs.

This particular issue is when building a px job
pbatchu
Charter Member
Charter Member
Posts: 20
Joined: Thu Aug 17, 2006 11:53 am
Location: Boise

PX jobs

Post by pbatchu »

We are working with px jobs
JeroenDmt
Premium Member
Premium Member
Posts: 107
Joined: Wed Oct 26, 2005 7:36 am

Re: CDC and comparing NULLS

Post by JeroenDmt »

Is the data you read in from the csv file and the data you read in from the database table both indeed NULL after you read them in datastage?

If you insert a peek stage for both of them and look at the actual values, do they show NULL or may an empty string?

For the csv file to regard an empty string value as a NULL value, you would have to set the 'null field value' property to an empty string.

Otherwise you might be comparing a NULL value from the table with an empty string from the csv file, which would indeed be an update.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you can have both "" and NULL, I would use a non-default representation of null, such as "<NULL>" - something that will never occur in data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
thamark
Premium Member
Premium Member
Posts: 43
Joined: Thu Jan 29, 2004 12:12 am
Location: US

CDC works fine for null value

Post by thamark »

We have tested the null comparison using CDC and it just works fine.

I guess one of the data is actually not having a null value, so this CDC returns as change. Please see the internal representation of the field, which is having a problem to identify the real issue.
Hmm i will fill this when ever i get one
Post Reply