I have one job that contains 100 records in initial loading.
after that i have to do incremental loading, while doing this
if any new record came i have to insert and if i found any existing record
just upsert,if i have any changed record i have to update.
how can i do this?
how to load incremental loading
Moderators: chulett, rschirm, roy
how to load incremental loading
thanks & regards
kk
kk
the need is to identify the delta from the source systems. Is there any field (Date etc) which indicates that this record is updated. If yes than you can use the same field for identification.,
Else you need to compare the whole source records to target and find out which all records are getting updated and what all are the new records, based on the condition you could have 2 output links one for inserts and other for update. This is basically implementation of SCD. You could use CheckSum kind of function for comparision in the stage variable and set a flag for insert and update.
Else you need to compare the whole source records to target and find out which all records are getting updated and what all are the new records, based on the condition you could have 2 output links one for inserts and other for update. This is basically implementation of SCD. You could use CheckSum kind of function for comparision in the stage variable and set a flag for insert and update.
Cheers !!!
An S
An S
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Charter Member
- Posts: 560
- Joined: Wed Jul 13, 2005 5:36 am
- Location: Ohio
If you are looking up against the target make sure you insert a sequential file stage (write all records to a sequential file so that whole process of lookup and update breaks into two seperate processes) before you update the target. This is to avoid deadlocks.ray.wurlod wrote:In short, a lookup against the target, or a copy of it. ...
If there are no audit fields in the source tables or you can't scrape the redo logs of these tables, then you are left with -
If Source.Natural_Key = Target.Natural_Key Then
Write to a file that will be used as source to insert into the target table.
Else
Write to a file that will be used as source to update the table.
End
Of course, in reality it is not as simple as this, since apart from checking for existence of a row, you will be performing SCDs. This entails comparing the corresponding fields or their crc values.
Why am I even saying all this? One has to have a solid foundation of such basics before performing any datawarehouse development activity. So, make sure you get hold of a good book on Datawarhousing concepts... may be one by Ralph Kimball.
If Source.Natural_Key = Target.Natural_Key Then
Write to a file that will be used as source to insert into the target table.
Else
Write to a file that will be used as source to update the table.
End
Of course, in reality it is not as simple as this, since apart from checking for existence of a row, you will be performing SCDs. This entails comparing the corresponding fields or their crc values.
Why am I even saying all this? One has to have a solid foundation of such basics before performing any datawarehouse development activity. So, make sure you get hold of a good book on Datawarhousing concepts... may be one by Ralph Kimball.
gateleys
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
"a copy of it" - for example into a hashed file - satisfies this requirement, since a Hashed File stage is a passive stage it can not open its output until its inputs are closed.Krazykoolrohit wrote:If you are looking up against the target make sure you insert a sequential file stage (write all records to a sequential file so that whole process of lookup and update breaks into two seperate processes) before you update the target. This is to avoid deadlocks.
You could also insert an IPC stage to force a process boundary - there's no real need to use an actual file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.