Need to implement Incremental load logic

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ds29
Participant
Posts: 9
Joined: Wed Sep 03, 2008 10:48 pm
Location: Bangalore

Need to implement Incremental load logic

Post by ds29 »

Hi All,

I wanted to implement incremental load in DataStage.

My case is some things like this-

Let suppose my source table has 10 rows and Target is blank initially

1) In first laod job should laod all the rows from source to target.
2) In second run onwards it should fetch only those records which got updated after last run of the job.
--There is a Datetime field in source which get updated whenever there is any updatation in source data.


Thanks in advance
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Re: Need to implement Incremental load logic

Post by sachin1 »

1.store your last execution date either in database or in sequential file, during your next execution scan for this value and have select on your source table with condition of Datetime > last_execution_date, you will get only those records which are updated or newly inserted.

2.you can scan for max value of date in target table which avoids storing the date in file or database, you can use this value to get your recent record like above mentioned.
ds29
Participant
Posts: 9
Joined: Wed Sep 03, 2008 10:48 pm
Location: Bangalore

Re: Need to implement Incremental load logic

Post by ds29 »

sachin1 wrote:1.store your last execution date either in database or in sequential file, during your next execution scan for this value and have select on your source table with condition of Datetime > last_execution_date, you will get only those records which are updated or newly inserted.

2.you can scan for max value of date in target table which avoids storing the date in file or database, you can use this value to get your recent record like above mentioned.

Hi Sachin Thanks a lot for reply.

Could you tell me how to generate last_execution_date and read from from sequential file?
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Re: Need to implement Incremental load logic

Post by sachin1 »

Sorry, my intention for last_execution_date was with current data values.

now with same concept, interpret like below.


suppose you have a 10 records of for date 31-12-2006 in source, you executed the chain and you store the 10 records with value 31-12-2006 in your target table.

Next time you run the chain suppose in 02-02-2007, during this duration suppose few of your records got updated and inserted ...........now you can scan for records from source with date_time > 31-12-2006(max value in target database), which will give you most recent records modified/inserted in source.
ds29
Participant
Posts: 9
Joined: Wed Sep 03, 2008 10:48 pm
Location: Bangalore

Re: Need to implement Incremental load logic

Post by ds29 »

Can anybody tell me how to read data from seq file to implement this logic


sachin1 wrote:Sorry, my intention for last_execution_date was with current data values.

now with same concept, interpret like below.


suppose you have a 10 records of for date 31-12-2006 in source, you executed the chain and you store the 10 records with value 31-12-2006 in your target table.

Next time you run the chain suppose in 02-02-2007, during this duration suppose few of your records got updated and inserted ...........now you can scan for records from source with date_time > 31-12-2006(max value in target database), which will give you most recent records modified/inserted in source.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If source is a sequential file, even though it has the date field in it, you have three options:
  • to read through the file if you're using a Sequential File stage: you can filter the records downstream of that

    to use a filter command in the Sequential File stage to select based on that date field (you'll have to create the command yourself)
    to use an ODBC stage and the driver for text files - this is surprisingly good with a simple WHERE clause
There may be more options; those three were what I thought of immediately.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ds29
Participant
Posts: 9
Joined: Wed Sep 03, 2008 10:48 pm
Location: Bangalore

Need to implement incremental load logic

Post by ds29 »

Hi Ray,


I wanted to implement incremental load in DataStage.

My case is some things like this-

Let suppose my source table has 10 rows and Target is blank initially

1) In first laod job should laod all the rows from source to target.
2) In second run onwards it should fetch only those records which got updated after last run of the job.
--There is a Datetime field in source which get updated whenever there is any updatation in source data.

What I am thinking- to store the lastrundatetime of job in a seq file and compare it in source DRS with datetime filed of Source database and fecth only those records which has got updated after the last run date time.

But I wanted to know how to read the record from sequential file and compare in DRS.

or if there is any other way to implement this.

Please let me know ASAP.


Thanks in advance

ray.wurlod wrote:If source is a sequential file, even though it has the date field in it, you have three options:
  • to read through the file if you're using a Sequential File stage: you can filter the records do ...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If both sources are sequential file you can set the DRS to use ODBC and have a DSN set up using the text file driver. But a far more sensible (and fast) approach would be to read the file into a hashed file and do the lookups and comparisons in a Transformer stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
shaimil
Charter Member
Charter Member
Posts: 37
Joined: Fri Feb 28, 2003 5:37 am
Location: UK

Post by shaimil »

Could you clarify whether the source is a db or sequential file
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Me? No idea whatsoever!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ds29
Participant
Posts: 9
Joined: Wed Sep 03, 2008 10:48 pm
Location: Bangalore

Post by ds29 »

source is db
shaimil wrote:Could you clarify whether the source is a db or sequential file
ds29
Participant
Posts: 9
Joined: Wed Sep 03, 2008 10:48 pm
Location: Bangalore

Post by ds29 »

Hi Source is Oracle database and Target is a sequential file.

our reuiremet is whenever we run the job, it should fecth only delta (changed record) based on the datetime field in source

shaimil wrote:Could you clarify whether the source is a db or sequential file
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Add a suitable WHERE clause. Get the most recent update time from the target and use that as a job parameter, which you add to the WHERE clause.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ds29
Participant
Posts: 9
Joined: Wed Sep 03, 2008 10:48 pm
Location: Bangalore

Post by ds29 »

ray.wurlod wrote:Add a suitable WHERE clause. Get the most recent update time from the target and use that as a job parameter, which you add to the WHERE clause. ...


Hi Ray,

I understand your suggestion but my question is how to read the the most recent update time from the target and apply as job parameter.

Could you please suggest me on this?
Do i need to write any routine for this?


Thanks a lot in advance.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You don't need a routine. You can use a DataStage job. Pick up the value and stick it in a file or (better) in that job's user status area. (OK, you will need a two-line routine for that, but the code exists on DSXchange.)

Pick up the value of the user status area when assigning the value to the job parameter in the Job activity from which you run the main job. Make sure that the first job is upstream of the second in your job sequence.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply