Synchronize two large Oracle tables daily

Triton46 · Post by **Triton46** » Thu Jul 06, 2006 10:48 am

Hi all!

Our warehouse has customers across the globe, keeping the fact table online 24/7 is the new mandate, however those pesky bitmaps make this difficult (we're on Oracle 9i). We've decided to load a temp table then switch it out with the real one once the etl is done, that way the table is not down for hours while multiple ETL jobs populate it during the night.

This is what I have:

1)Nightly load a temp table (mirror of fact table) with updates and inserts from multiple jobs.
2)At the end of the night, the last job will add bitmaps/analyze to the temp then rename the temp table to the real fact table.
3)Re-sync the real table and temp table for next nights run.

What is the fastest way to resync the tables at the end of the night? The table has 9.5M rows. Does DataStage do Upserts?

I'm leaning between a truncate/reload or a minus to find the missing/updated rows and add them.

abby77 · Post by **abby77** » Thu Jul 06, 2006 10:59 am

Hi,
I had the exact same requirement in my previous project. We started of with truncating reloading the tables to synch the tables, but it was fine till 200K records. But once we hit 2 million records truncating and reloading was taking too long. So what we did is add 2 fields to each table. The first field is the load_number and would have a unique number for each batch load into the table and the other field was Action_Flag which will have either I or U depending on whether the record was updated or inserterd. So after the batch load we would pickup all the records with the latest load_number and either insert or update onto the other table.
For removin deletes we maintained another table with the primary key of the table and load_number. We would have another job to pickup the primary keys from this table and delete them from the other table.
Hope this makes sense.
-Abraham

Triton46 · Post by **Triton46** » Thu Jul 06, 2006 11:11 am

Thanks for this Abraham. It sounds like I'll need to devise a list of inserts, updates and deletes to be submitted to a job.

kcbland · Post by **kcbland** » Thu Jul 06, 2006 11:42 am

9 million is nothing like 900 million, you have some easier options. It's a shame you can't store load ready files from your ETL. You would simply load the background table, and when appropriate, move it into the foreground. Then, just take the same load files and apply it to the switched background table. Now your tables are in sync.

You have many options within the database to copy one table to another quickly. You could consider a truncate and then a DIRECT PATH insert from the foreground table. 9.5 million is very small.

You could also look at partitioning the table using a ranging value of a transactional date. This could allow stagnant partitions to not be "copied" around, you could use partition aware queries that just select and copy partitions that had activity.

Triton46 · Post by **Triton46** » Thu Jul 06, 2006 12:11 pm

I'm interested in the background table option. Currently, I have 13 jobs that update this one table each night. Preferably, I'd like to store the transactions from those 13 jobs somewhere once, then apply all the transactions to both tables. What are my options?

Our ETL is:

1000's of source Oracle tables --> 13 Oracle ODS --> 1 Oracle DDW

kcbland · Post by **kcbland** » Thu Jul 06, 2006 12:22 pm

Your options are really just to change your ETL that loads the tables to prepare insert and update load files. Triggers to audit tables are out if you're DIRECT path loading, plus, you'd have to write sophisticated triggers that only fire if they're in the background "mode" and not synchronizing. Materialized views won't help, because that is just about the same as trunc-and-load. Using replication services is probably too tricky as well.

Triton46 · Post by **Triton46** » Thu Jul 06, 2006 12:32 pm

OK, so load 13 files then on the last job, loop throught the files and insert/update against the table.

Is one file more optimized than another (hash, sequential, ftp plugin, complex flat file, etc)?

kcbland · Post by **kcbland** » Thu Jul 06, 2006 12:36 pm

Sequential files, delimited by far. You'll have to process chronologically so that inserts precede updates, but that should be no big deal. In fact, you could take all of the inserts files and concatenate them together and load in one operation, then loop thru the updates.

chucksmith · Post by **chucksmith** » Fri Jul 07, 2006 3:42 pm

If your ETL process builds a sequential file for the complete reload of the table, you can use the unix COMM command to return to you the unique rows from your current and you previous load files. Load these unique rows into two hashed files. Read from the first and lookup in the second to determine inserts and updates, then read from the second and lookup in the first to determine deletes.

If most of your data is static, then this technique can significantly reduce the number of rows you must touch.

DSXchange

Synchronize two large Oracle tables daily

Synchronize two large Oracle tables daily

Re: Synchronize two large Oracle tables daily