Deletion of records from a CDC File

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
mwilli422
Participant
Posts: 4
Joined: Mon Jul 12, 2004 5:01 am

Deletion of records from a CDC File

Post by mwilli422 »

I wonder if could anyone can help with my CDC problem. I have a CDC file, that is already productionised. The CDC file contains a list of Sales Rep Id's and the CRC value of the Rep Name and Team Name. The Sales Reps when they leave the company are removed from the input file into the CDC processing and are left hanging as existing Sales Reps on the CDC File. Is there any way to delete these rows easily from the CDC file, as the Sales Rep Id could possibly be re-used in the future and I would want to treat these as 'IsNew' records in subsequent processing. Any pointers would be greatly appreciated.

Many Thanks
Mark
:)
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Re: Deletion of records from a CDC File

Post by ogmios »

You're a little vague on what you do and what you want. E.g. what is "productionised"... :wink:

Are you trying to say that when Sales Reps are deleted from your production system, they are not deleted in your datamart/warehouse.
If this is the case I think you might miss the point of kimball that you shouldn't use the primary key of your production system as key in your warehouse. Then again, since the question is somewhat vague ...

Ogmios
In theory there's no difference between theory and practice. In practice there is.
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

I hope that what you write to with DataStage is not a data warehouse, or I have hunch you have some architectural problems. In that case, run to the library/bookstore and get all Kimball books :wink:

For your current problem, the problem is that you can't process data which is not there.

The standard way to do stuff like that is sandboxing: you keep a copy of the data, compare everything on a new run and make rows for record deletions/inserts/updates to be applied to the real data. (also described in the Kimball books e.g.)

Alternatively you could change your business process to give you actual input when sales reps leave the company (rows you can use to delete them).
The second alternative would be better as it would minimize "guessing", e.g. once upon a time you only get half an input file for sales reps and your job decides to close off half of your sales reps.

Ogmios
In theory there's no difference between theory and practice. In practice there is.
mwilli422
Participant
Posts: 4
Joined: Mon Jul 12, 2004 5:01 am

Post by mwilli422 »

Hi!

We do keep a copy of the previous Sales Rep File. We compare the latest against this to identify those 'departed' Sales Reps. It is this file of departees that I wish to remove from the CDC file (used in the processing to Identify New and Changed records) at the end of the Batch Run.

The Data Warehouse is already in place and has been for nearly 18 months. This is just an additional requirement to tidy up a potential loophole that is not currently catered for.

From my attempts so far with a server job in Datastage, it seems impossible to remove records from a CDC file.

Cheers
Mark :)
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

Depends what you mean with CDC file. I assumed you meant "Change Data Capture". If it's just a regular ASCII file the only way to remove entries is to read the original file, write to a new file skipping the rows you don't want and copying it back to the original.

Or load the file to a database table, delete the required rows, and write it back to the original file.

There's no way to remove lines from an sequential file in place, without the above "tricks".

Ogmios
In theory there's no difference between theory and practice. In practice there is.
mwilli422
Participant
Posts: 4
Joined: Mon Jul 12, 2004 5:01 am

Post by mwilli422 »

It's a Hash File used in the DSSCD Transform that I wish to delete records from. I have been able to delete records from other Hash files, but not any used in DSSCD processing so far.

Thanks
Mark
:)
Post Reply