Incremental Loading Without Date fields in source

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ramesh9394
Participant
Posts: 14
Joined: Thu Jun 09, 2011 7:24 pm
Location: USA

Incremental Loading Without Date fields in source

Post by ramesh9394 »

Hi
my source is Sql server and Target oracle
i am trying to design a job to load Delta records ..but the source doesn't have any date fields ..so whenever i am loading data source fetching all the records...is there any way that i can fetch latest records from source
in other words
today i loaded 100 records after initial load...when running the job tomorrow i need only records fof tomorrow i don't want to fetch yesterday's 100 records again
ram
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Really you have again neglected to provide us with sufficient information.

There may or may not be a mechanism for solving this problem. It really depends whether there is some way in the source data for identifying the delta.

For example if the source has an auto-incrementing key you can store the highest key retrieved today and tomorrow fetch just those records with a key value greater than this.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ramesh9394
Participant
Posts: 14
Joined: Thu Jun 09, 2011 7:24 pm
Location: USA

Post by ramesh9394 »

thanks for reply ray
i am using DS8.7 using Change Capture stage to capture the delta records
using odbc_connector as source and oracle_connector as target
ram
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That doesn't affect my earlier answer.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Has the client provided a method for identifying the new rows other than comparing the data row-to-row? Are there any unique keys available?

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
ramesh9394
Participant
Posts: 14
Joined: Thu Jun 09, 2011 7:24 pm
Location: USA

Post by ramesh9394 »

not provided any thing(any field which has update time stamp/system gen time stamp) to identify new rows...there is unique key... looks like i have to fetch everything from source every time ..but still looking for alternatives

i tried left joining source with target but even it also do the same(fetching all data and looking up on target)
ram
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

This "unique key", is it a surrogate or any kind of field that increments in such a manner you can use it for your high water mark? Without that or a timestamp all you can do is pull everything and compare.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ramesh9394
Participant
Posts: 14
Joined: Thu Jun 09, 2011 7:24 pm
Location: USA

Post by ramesh9394 »

unique key is not surrogate key its natural primary key
ram
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Can you create a work table in the source to hold the PK values you've extracted? Or are they willing to add a timestamp to the source? We're running out of options here.
-craig

"You can never have too many knives" -- Logan Nine Fingers
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Two questions:

1) Is fetching the delta relevant for your process?

You have been talking about a hundred rows in your first post. For something below a hundred thousand rows I would not consider the issue relevant, because Change-Captured is going to identify changes in almost no time at all.

2) Is the only kind of change possible an insert of new key-values, or is it possible that data-rows for known keys change content on non-key-columns? In the latter case you have to fetch all rows from your source anyway.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
ramesh9394
Participant
Posts: 14
Joined: Thu Jun 09, 2011 7:24 pm
Location: USA

Post by ramesh9394 »

Thanks for reply
100 records is just an example i told...records are in millions from source ..
i am fetching all delta records like new inserted records and updated one(no-key value changes for known keys)..and also deleted records
ram
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If there is no way to identify the delta at source then you will be obliged to fetch all the data from source every time and compare against target.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

Depending on the version of SQL server you have , Change tracking can be enabled at the database level . Once you have done that . I think its an easy task to get only newly modified records( Adds/Updates)

Here is a good link for
Change Tracking in SQL Server 2008
Post Reply