Synchronize two large Oracle tables daily

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Triton46
Charter Member
Charter Member
Posts: 83
Joined: Fri Feb 07, 2003 8:30 am

Synchronize two large Oracle tables daily

Post by Triton46 »

Hi all!

Our warehouse has customers across the globe, keeping the fact table online 24/7 is the new mandate, however those pesky bitmaps make this difficult (we're on Oracle 9i). We've decided to load a temp table then switch it out with the real one once the etl is done, that way the table is not down for hours while multiple ETL jobs populate it during the night.

This is what I have:

1)Nightly load a temp table (mirror of fact table) with updates and inserts from multiple jobs.
2)At the end of the night, the last job will add bitmaps/analyze to the temp then rename the temp table to the real fact table.
3)Re-sync the real table and temp table for next nights run.


What is the fastest way to resync the tables at the end of the night? The table has 9.5M rows. Does DataStage do Upserts?

I'm leaning between a truncate/reload or a minus to find the missing/updated rows and add them.
abby77
Participant
Posts: 3
Joined: Wed Jun 28, 2006 2:22 pm

Re: Synchronize two large Oracle tables daily

Post by abby77 »

Hi,
I had the exact same requirement in my previous project. We started of with truncating reloading the tables to synch the tables, but it was fine till 200K records. But once we hit 2 million records truncating and reloading was taking too long. So what we did is add 2 fields to each table. The first field is the load_number and would have a unique number for each batch load into the table and the other field was Action_Flag which will have either I or U depending on whether the record was updated or inserterd. So after the batch load we would pickup all the records with the latest load_number and either insert or update onto the other table.
For removin deletes we maintained another table with the primary key of the table and load_number. We would have another job to pickup the primary keys from this table and delete them from the other table.
Hope this makes sense.
-Abraham
Abraham
Triton46
Charter Member
Charter Member
Posts: 83
Joined: Fri Feb 07, 2003 8:30 am

Post by Triton46 »

Thanks for this Abraham. It sounds like I'll need to devise a list of inserts, updates and deletes to be submitted to a job.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

9 million is nothing like 900 million, you have some easier options. It's a shame you can't store load ready files from your ETL. You would simply load the background table, and when appropriate, move it into the foreground. Then, just take the same load files and apply it to the switched background table. Now your tables are in sync.

You have many options within the database to copy one table to another quickly. You could consider a truncate and then a DIRECT PATH insert from the foreground table. 9.5 million is very small.

You could also look at partitioning the table using a ranging value of a transactional date. This could allow stagnant partitions to not be "copied" around, you could use partition aware queries that just select and copy partitions that had activity.
Last edited by kcbland on Thu Jul 06, 2006 11:45 am, edited 1 time in total.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Triton46
Charter Member
Charter Member
Posts: 83
Joined: Fri Feb 07, 2003 8:30 am

Post by Triton46 »

I'm interested in the background table option. Currently, I have 13 jobs that update this one table each night. Preferably, I'd like to store the transactions from those 13 jobs somewhere once, then apply all the transactions to both tables. What are my options?

Our ETL is:

1000's of source Oracle tables --> 13 Oracle ODS --> 1 Oracle DDW
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Your options are really just to change your ETL that loads the tables to prepare insert and update load files. Triggers to audit tables are out if you're DIRECT path loading, plus, you'd have to write sophisticated triggers that only fire if they're in the background "mode" and not synchronizing. Materialized views won't help, because that is just about the same as trunc-and-load. Using replication services is probably too tricky as well.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Triton46
Charter Member
Charter Member
Posts: 83
Joined: Fri Feb 07, 2003 8:30 am

Post by Triton46 »

OK, so load 13 files then on the last job, loop throught the files and insert/update against the table.

Is one file more optimized than another (hash, sequential, ftp plugin, complex flat file, etc)?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Sequential files, delimited by far. You'll have to process chronologically so that inserts precede updates, but that should be no big deal. In fact, you could take all of the inserts files and concatenate them together and load in one operation, then loop thru the updates.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

If your ETL process builds a sequential file for the complete reload of the table, you can use the unix COMM command to return to you the unique rows from your current and you previous load files. Load these unique rows into two hashed files. Read from the first and lookup in the second to determine inserts and updates, then read from the second and lookup in the first to determine deletes.

If most of your data is static, then this technique can significantly reduce the number of rows you must touch.
Post Reply