how to load incremental loading

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
4friends
Participant
Posts: 71
Joined: Tue May 01, 2007 12:51 pm
Location: Mumbai

how to load incremental loading

Post by 4friends »

I have one job that contains 100 records in initial loading.

after that i have to do incremental loading, while doing this

if any new record came i have to insert and if i found any existing record

just upsert,if i have any changed record i have to update.

how can i do this?
thanks & regards
kk
xanupam
Participant
Posts: 6
Joined: Sun Nov 25, 2007 11:10 am
Location: India

Post by xanupam »

the need is to identify the delta from the source systems. Is there any field (Date etc) which indicates that this record is updated. If yes than you can use the same field for identification.,

Else you need to compare the whole source records to target and find out which all records are getting updated and what all are the new records, based on the condition you could have 2 output links one for inserts and other for update. This is basically implementation of SCD. You could use CheckSum kind of function for comparision in the stage variable and set a flag for insert and update.
Cheers !!!
An S
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In short, a lookup against the target, or a copy of it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

ray.wurlod wrote:In short, a lookup against the target, or a copy of it. ...
If you are looking up against the target make sure you insert a sequential file stage (write all records to a sequential file so that whole process of lookup and update breaks into two seperate processes) before you update the target. This is to avoid deadlocks.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

If there are no audit fields in the source tables or you can't scrape the redo logs of these tables, then you are left with -

If Source.Natural_Key = Target.Natural_Key Then
Write to a file that will be used as source to insert into the target table.
Else
Write to a file that will be used as source to update the table.
End

Of course, in reality it is not as simple as this, since apart from checking for existence of a row, you will be performing SCDs. This entails comparing the corresponding fields or their crc values.

Why am I even saying all this? One has to have a solid foundation of such basics before performing any datawarehouse development activity. So, make sure you get hold of a good book on Datawarhousing concepts... may be one by Ralph Kimball.
gateleys
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Krazykoolrohit wrote:If you are looking up against the target make sure you insert a sequential file stage (write all records to a sequential file so that whole process of lookup and update breaks into two seperate processes) before you update the target. This is to avoid deadlocks.
"a copy of it" - for example into a hashed file - satisfies this requirement, since a Hashed File stage is a passive stage it can not open its output until its inputs are closed.

You could also insert an IPC stage to force a process boundary - there's no real need to use an actual file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply