Page 1 of 1

Need to implement Incremental load logic

Posted: Fri Sep 05, 2008 6:12 am
by ds29
Hi All,

I wanted to implement incremental load in DataStage.

My case is some things like this-

Let suppose my source table has 10 rows and Target is blank initially

1) In first laod job should laod all the rows from source to target.
2) In second run onwards it should fetch only those records which got updated after last run of the job.
--There is a Datetime field in source which get updated whenever there is any updatation in source data.


Thanks in advance

Re: Need to implement Incremental load logic

Posted: Fri Sep 05, 2008 6:27 am
by sachin1
1.store your last execution date either in database or in sequential file, during your next execution scan for this value and have select on your source table with condition of Datetime > last_execution_date, you will get only those records which are updated or newly inserted.

2.you can scan for max value of date in target table which avoids storing the date in file or database, you can use this value to get your recent record like above mentioned.

Re: Need to implement Incremental load logic

Posted: Fri Sep 05, 2008 6:34 am
by ds29
sachin1 wrote:1.store your last execution date either in database or in sequential file, during your next execution scan for this value and have select on your source table with condition of Datetime > last_execution_date, you will get only those records which are updated or newly inserted.

2.you can scan for max value of date in target table which avoids storing the date in file or database, you can use this value to get your recent record like above mentioned.

Hi Sachin Thanks a lot for reply.

Could you tell me how to generate last_execution_date and read from from sequential file?

Re: Need to implement Incremental load logic

Posted: Fri Sep 05, 2008 6:49 am
by sachin1
Sorry, my intention for last_execution_date was with current data values.

now with same concept, interpret like below.


suppose you have a 10 records of for date 31-12-2006 in source, you executed the chain and you store the 10 records with value 31-12-2006 in your target table.

Next time you run the chain suppose in 02-02-2007, during this duration suppose few of your records got updated and inserted ...........now you can scan for records from source with date_time > 31-12-2006(max value in target database), which will give you most recent records modified/inserted in source.

Re: Need to implement Incremental load logic

Posted: Tue Sep 09, 2008 4:28 am
by ds29
Can anybody tell me how to read data from seq file to implement this logic


sachin1 wrote:Sorry, my intention for last_execution_date was with current data values.

now with same concept, interpret like below.


suppose you have a 10 records of for date 31-12-2006 in source, you executed the chain and you store the 10 records with value 31-12-2006 in your target table.

Next time you run the chain suppose in 02-02-2007, during this duration suppose few of your records got updated and inserted ...........now you can scan for records from source with date_time > 31-12-2006(max value in target database), which will give you most recent records modified/inserted in source.

Posted: Tue Sep 09, 2008 4:35 am
by ray.wurlod
If source is a sequential file, even though it has the date field in it, you have three options:
  • to read through the file if you're using a Sequential File stage: you can filter the records downstream of that

    to use a filter command in the Sequential File stage to select based on that date field (you'll have to create the command yourself)
    to use an ODBC stage and the driver for text files - this is surprisingly good with a simple WHERE clause
There may be more options; those three were what I thought of immediately.

Need to implement incremental load logic

Posted: Tue Sep 09, 2008 4:44 am
by ds29
Hi Ray,


I wanted to implement incremental load in DataStage.

My case is some things like this-

Let suppose my source table has 10 rows and Target is blank initially

1) In first laod job should laod all the rows from source to target.
2) In second run onwards it should fetch only those records which got updated after last run of the job.
--There is a Datetime field in source which get updated whenever there is any updatation in source data.

What I am thinking- to store the lastrundatetime of job in a seq file and compare it in source DRS with datetime filed of Source database and fecth only those records which has got updated after the last run date time.

But I wanted to know how to read the record from sequential file and compare in DRS.

or if there is any other way to implement this.

Please let me know ASAP.


Thanks in advance

ray.wurlod wrote:If source is a sequential file, even though it has the date field in it, you have three options:
  • to read through the file if you're using a Sequential File stage: you can filter the records do ...

Posted: Tue Sep 09, 2008 5:54 am
by ray.wurlod
If both sources are sequential file you can set the DRS to use ODBC and have a DSN set up using the text file driver. But a far more sensible (and fast) approach would be to read the file into a hashed file and do the lookups and comparisons in a Transformer stage.

Posted: Tue Sep 09, 2008 6:38 am
by shaimil
Could you clarify whether the source is a db or sequential file

Posted: Tue Sep 09, 2008 6:42 am
by ray.wurlod
Me? No idea whatsoever!

Posted: Tue Sep 09, 2008 11:52 pm
by ds29
source is db
shaimil wrote:Could you clarify whether the source is a db or sequential file

Posted: Wed Sep 10, 2008 12:29 am
by ds29
Hi Source is Oracle database and Target is a sequential file.

our reuiremet is whenever we run the job, it should fecth only delta (changed record) based on the datetime field in source

shaimil wrote:Could you clarify whether the source is a db or sequential file

Posted: Wed Sep 10, 2008 1:30 am
by ray.wurlod
Add a suitable WHERE clause. Get the most recent update time from the target and use that as a job parameter, which you add to the WHERE clause.

Posted: Wed Sep 10, 2008 9:41 pm
by ds29
ray.wurlod wrote:Add a suitable WHERE clause. Get the most recent update time from the target and use that as a job parameter, which you add to the WHERE clause. ...


Hi Ray,

I understand your suggestion but my question is how to read the the most recent update time from the target and apply as job parameter.

Could you please suggest me on this?
Do i need to write any routine for this?


Thanks a lot in advance.

Posted: Wed Sep 10, 2008 9:45 pm
by ray.wurlod
You don't need a routine. You can use a DataStage job. Pick up the value and stick it in a file or (better) in that job's user status area. (OK, you will need a two-line routine for that, but the code exists on DSXchange.)

Pick up the value of the user status area when assigning the value to the job parameter in the Job activity from which you run the main job. Make sure that the first job is upstream of the second in your job sequence.