incremental loading

vijaykumar · Post by **vijaykumar** » Wed Feb 21, 2007 4:25 pm

Hi Gurus,

In the incremental loading, we consider loading records b/w the last successful load and current record.

Actually i have never done incremental loading.How do we know the last successful load.

Can Gurus plz help me in analyzing.

cheers;
Vijay

kcbland · Post by **kcbland** » Wed Feb 21, 2007 4:26 pm

Create a table and manage "Process Metadata" using it. Store information about every run, use that metadata to feed subsequent processes. That metadata could be configuration information, such as a "From Date" column.

DSguru2B · Post by **DSguru2B** » Wed Feb 21, 2007 4:30 pm

Usually two timestamps are stored in a control table. These timestamps are passed as a job parameter to the sql select (select * from table where date between #date1# AND #date2#). Once your cycle finishes OR before your next run, the control table is updated/inserted with new date records and selected again to be passed as parameters. This will work if the records you are selecting have a date/timestamp column.
The other way is to do a lookup against table and new records go as incremental inserts.

kumar_s · Post by **kumar_s** » Wed Feb 21, 2007 4:59 pm

For the second option DSguru2B, you can use Staging table to retain the last run data, rather than doing a lookup against the full target data. If you requirement specifies for last and current load.

rleishman · Post by **rleishman** » Thu Feb 22, 2007 7:45 pm

IMO this is better done at the extraction phase, not at load time.

For delta-extraction, you can use one of the methods described above, or if extracting from an Oracle database you can look into Change Data Capture - it's a non-invasive way to capture all the changes on a table without timestamp columns, full table scans, or - with 10g - even accessing the source table itself (deltas are scraped straight from Archive Logs).

vmcburney · Post by **vmcburney** » Thu Feb 22, 2007 8:05 pm

I did a blog post on this a long time ago. I think the IBM Oracle Change Data Capture and Replication Server both add triggers to the Oracle tables. You might have to go for an Oracle product to get log scraping. Even then you still need to do change data determination where, at the end of transformation, you compare your delta data to your target table to see if there really are any changes and split those into inserts and updates for optimised database loads.

So some type of delta off the database and change data capture in DataStage works well.

You can build a change data capture capability in DataStage using the CRC32 function, search the archives for examples. It is less messy than trying to compare dozens of columns in a Transformer stage. In parallel jobs you have the change capture and change apply stages to do this.

rleishman · Post by **rleishman** » Thu Feb 22, 2007 9:10 pm

I think the CDC triggers/packages are a 9i thing, and 10g does it straight from the logs. Never used it, so can't swear on it.