Incremental Loading

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
seeta
Participant
Posts: 8
Joined: Mon Nov 07, 2005 8:10 pm

Incremental Loading

Post by seeta »

Hi,

I am new to DataStage.I want to do Incremental loading that is I want to see the history data from the last run and if any update is there in data I want to update the data and insert the newdata in latest run.How can I do this
any suggestion would apriciated

Thanks
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Incremental Loading is more of a general approach and methodology than a DataStage specific question.

The general term for what you are looking to do is "Change Data Detection" and a search on the web and in this forum will give you a number of hints. I also think that the product documentation goes into the subject area and describes some approaches. I do know that the DS class covers this as part of the standard material.

The PX product actually has a stage that does this for you :)
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

In parallel jobs there is the changed data capture stage which identifies inserts, updates, deletes and unchanged rows. Have a look at the Parallel Job Developers Guide for a description.

You can also read my DataStage blog on Incremental Loads save the day which links to a wiki entry on the subject.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi,

There are various methods avaiable. As Arnd and Vincent (in this blogs) you can use the readily avaialble stage Change Capture us acheive this.
But be aware of numerous warnigs still been unhandled for that stage. Another method using simpile stages (workaround instead of Cange Capture) would be,
Do a full outer join between the Current month file and the target(Full file/Data base) based on the key.
Using Transformer Check for teh constraints of leftRec.Key ="" and rightRec.Key="". So that it can be New and Deleted records.
New would be your current month record. Delete would be your previous month record which doesnt comes on current month.
If leftRec.Key=RightRec.Key And If any other Sensitive information changed, then its actually an updated record else it is an Duplicate record.

-Kumar
Post Reply