Incremental Extract based on a timestamp

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Re: idea

Post by peternolan9 »

Just to post what we are finding....


when we run this sql whick looks for the current batch, gets the timestamps for start and stop extracts and uses them in a where clause against a remote database we get the following execution plan.

My DBA tells me that the plan is telling us that what is actually happening is that the entire source table is being transmitted across the LAN and checked on the DW machine....obviously not a good idea......

We may/may not be able to get a link defined on the remote machine in production...we can ask....I think if we get the remote link it might just transmit the small tables across the LAN....

Peter


Select Statement

select * from ap_bank_accounts_all@peters where last_update_date between
(select t1.last_extract_tstamp from hlp_batch_extract_range t1, dw_batch_control t2
where t1.pk_batch_extract_range = t2.pk_batch_number and t2.batch_complete_flag = 0)
and
(select t1.this_extract_tstamp from hlp_batch_extract_range t1, dw_batch_control t2
where t1.pk_batch_extract_range = t2.pk_batch_number and t2.batch_complete_flag = 0)



Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 FILTER
2 1 REMOTE* PETERS
3 1 NESTED LOOPS
4 3 TABLE ACCESS (FULL) OF 'DW_BATCH_CONTROL'
5 3 TABLE ACCESS (BY INDEX ROWID) OF 'HLP_BATCH_EXTRACT_RANGE'
6 5 INDEX (UNIQUE SCAN) OF 'SYS_C004648' (UNIQUE)
7 1 NESTED LOOPS
8 7 TABLE ACCESS (FULL) OF 'DW_BATCH_CONTROL'
9 7 TABLE ACCESS (BY INDEX ROWID) OF 'HLP_BATCH_EXTRACT_RANGE'
10 9 INDEX (UNIQUE SCAN) OF 'SYS_C004648' (UNIQUE)


2 SERIAL_FROM_REMOTE SELECT "BANK_ACCOUNT_ID","BANK_ACCOUNT_NAME"
,"LAST_UPDATE_DATE","LAST_UPDATED_BY



Statistics
----------------------------------------------------------
7 recursive calls
1 db block gets
7 consistent gets
0 physical reads
308 redo size
3940 bytes sent via SQL*Net to client
213 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
Best Regards
Peter Nolan
www.peternolan.com
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Re: idea

Post by ogmios »

Oracle is the "odd duck" in database land, whatever applies to DB2 does not necessarily apply to Oracle (no uncommitted reads e.g.) In DB2 the system date would also be the time the row was inserted, not the commit time (if you don't commit on every row).

If you work with Oracle the best is to at least read the "Oracle Concepts" guide once, it's on the Oracle documentation cd.

In short, Oracle uses multi-versioning for it's consistency. You open the cursor of your extract query, the result you will get returned is the result set at that moment in time. Whatever happens to the data in the table afterwards, someone can erase all rows in the table and you will still get the rows how they were at that time. This is the multi-versioning part: Oracle keeps a SCN (number representing a moment in time) attached to your query and if it sees the data it has to retrieve is newer than the start of your query it will fetch the data from an earlier time to keep consistency.
So when someone inserts a row in your table after you run the extract query the current timestamp of that new row may be lower than your highest read timestamp and you will have missing rows in the next runs.
Another freaky point of Oracle (because of multi-versioning) is that readers are never blocked because of writers. Only different writers can block each other.

So it's not dependent on who commits at what time after you open the query, so for some applications it's a bummer. If all delta changes are important (and you can't stop changing the table) you are kind of forced to use Oracle streams to fetch deltas, something like DpropR in DB2.

Ogmios
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Re: idea

Post by peternolan9 »

Hi All,
we have been working through this and come up with the following....

1. We get a timestamp that is common for both machines. This timestamp is the 'thisextracttstamp'.
2. We wait N seconds.
3. We issue a quiesce against the source oracle system to force the application of updates.
4. Then we select all rows with a timestamp bewteen lastextracttstamp and thisextracttstamp.

And this should get us ALL the updates......

Anybody see a hole in this?


(PS. We have not forgotten deletes. We will write triggers for deletes, or we may even be responsible for issuing the deletes ourselves in the future. The client wants the DW to archive deleted data to improve performance of the operational system.)
Best Regards
Peter Nolan
www.peternolan.com
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Re: idea

Post by ogmios »

Depending on the type of processing on your Oracle database you possibly have to wait a long time before the quiesce comes into effect.

And this may also be a problem "after quiescing all non-DBA logins after this statement is issued are queued by the Database Resource Manager, and are not allowed to proceed", so I guess you're ETL processes have to run as DBA?

Ogmios
Post Reply