how to load incremental loading

4friends · Post by **4friends** » Mon Nov 26, 2007 1:43 am

I have one job that contains 100 records in initial loading.

after that i have to do incremental loading, while doing this

if any new record came i have to insert and if i found any existing record

just upsert,if i have any changed record i have to update.

how can i do this?

xanupam · Post by **xanupam** » Mon Nov 26, 2007 1:48 am

the need is to identify the delta from the source systems. Is there any field (Date etc) which indicates that this record is updated. If yes than you can use the same field for identification.,

Else you need to compare the whole source records to target and find out which all records are getting updated and what all are the new records, based on the condition you could have 2 output links one for inserts and other for update. This is basically implementation of SCD. You could use CheckSum kind of function for comparision in the stage variable and set a flag for insert and update.

ray.wurlod · Post by **ray.wurlod** » Mon Nov 26, 2007 3:42 am

In short, a lookup against the target, or a copy of it.

Krazykoolrohit · Post by **Krazykoolrohit** » Tue Nov 27, 2007 3:24 pm

ray.wurlod wrote:In short, a lookup against the target, or a copy of it. ...

If you are looking up against the target make sure you insert a sequential file stage (write all records to a sequential file so that whole process of lookup and update breaks into two seperate processes) before you update the target. This is to avoid deadlocks.

gateleys · Post by **gateleys** » Tue Nov 27, 2007 3:40 pm

If there are no audit fields in the source tables or you can't scrape the redo logs of these tables, then you are left with -

If Source.Natural_Key = Target.Natural_Key Then
Write to a file that will be used as source to insert into the target table.
Else
Write to a file that will be used as source to update the table.
End

Of course, in reality it is not as simple as this, since apart from checking for existence of a row, you will be performing SCDs. This entails comparing the corresponding fields or their crc values.

Why am I even saying all this? One has to have a solid foundation of such basics before performing any datawarehouse development activity. So, make sure you get hold of a good book on Datawarhousing concepts... may be one by Ralph Kimball.

ray.wurlod · Post by **ray.wurlod** » Tue Nov 27, 2007 4:48 pm

Krazykoolrohit wrote:If you are looking up against the target make sure you insert a sequential file stage (write all records to a sequential file so that whole process of lookup and update breaks into two seperate processes) before you update the target. This is to avoid deadlocks.

"a copy of it" - for example into a hashed file - satisfies this requirement, since a Hashed File stage is a passive stage it can not open its output until its inputs are closed.

You could also insert an IPC stage to force a process boundary - there's no real need to use an actual file.