Slow Loading Performance - Oracle 10g

kcbland · Post by **kcbland** » Fri Jan 12, 2007 7:31 pm

You're moving data from one table to another within the same instance. The fastest method to do this is avoid using DataStage and allow a parallel DML SQL statement do the data transfer.

DataStage generates a program which will pull the data out of the database and transfer it to its external process, which in turn opens a cursor and sends the data thru the OCI connection. In short terms, the data is leaving the database to go thru an outside process and then re-enters the database. There is no parallelism whatsoever, unless your SELECT query has to build an elaborate result that may use parallel queries, but ultimately the data is collected into a single result set in a temp storage area for single-threaded spooling to the single threaded OCI connection process.

The fastest methods for moving data out of a database is parallel extraction processes, each taking a portion of rows and streaming it out. By "partitioning" or separating the required data into subsets, and then streaming each subset out independently to a receiving process, you have done what is called "partitioned parallelism".

The fastest methods for moving data into a database is bulk loading. For Oracle, sqlldr with the DIRECT path option is your choice.

So, design a job that looks like this: OCI --> XFM --> SEQ (filename has job parameter "N" in it). In the OCI stage, pick an integer column that you can use to someone evenly divide the data into subsets. Now, decide how many parallel occurences of this job can simultaneously execute without overburdening your database, say 7. Place the expression MOD(yourcolumn, 7) = N - 1 in the WHERE clause in the OCI stage. Now run 7 instances of this job simultaneous (a Sequence would be good here with 7 job activity stage icons) and set parameter N to 1 thru 7.

When you fire off the Sequence job, you'll see 7 parallel copies of the same job, each extracting a subset of the data. When all 7 instances complete, concatenate the 7 files into a single file and then bulk load it into the target table.

Of course, intra-instance DML will do this all within the database for you. But, this technique teaches you how we get massive volumes out in incremental pieces, dividing the labor and spreading i/o across multiple CPUS. You'll actually see your DataStage server resources climb, as the 7 processes use more resources in totality.

By the way, this same technique is extremely powerful when doing jobs like this: SEQ --> XFM w/hash references --> XFM w/hash references --> XFM w/hash references --> SEQ. Your single threaded job is now multi-processing (parallel transformation pipelines). By designing this way, we achieve massive parallelism.

In PX world, a lot of this parallelism is done automagically for you without using multiple job instances.

DSguru2B · Post by **DSguru2B** » Fri Jan 12, 2007 7:59 pm

I designed something like this for one of my clients sitting on a server engine. It was from a flat file. I had multiple instances of the same job. Each time getting records from the same file but in sets. If i had 5 instances of a job, the first instance would select 1,6,11 and so on... records; the second would select 2, 7, 12 and so on...
This way parallelism was achieved and performance increased 5 times. I know what you are talking about Ken. Its really powerful.

rodre · Post by **rodre** » Tue Jan 16, 2007 3:30 pm

Kenneth or Albert:
I like your idea....using multiple instances and sequencers...

Do you think it would work on this type of scenario:
I have a database table with 16Million records and 14 columns. Out of this table I need extract certain records through queries. I tried extracting the data from the database via queries of different types, including union queries and it just takes too long. The best timing I have been able to obtain is 4+hours.
Do you think I could get all 16Million records into sequential files or Hashed tables and do my selection process in datastage instead of using queries?
(I hope this makes sense)
Thanks in advance for your input!!!

DSguru2B · Post by **DSguru2B** » Tue Jan 16, 2007 3:45 pm

Do your unions and other transformations in DataStage. Since your source is a database, go for Ken's suggestions. Using mod, select data in chunks and pass it through your transformations, simultaneously, using multiple instance. This is for the "ET" part in ETL. For "L" , use a bulk loader for all inserts. For update, consider doing "Bulk Updates". Ken was talking about correlated updates today in one of his posts. Search for it. Or wait for Ken, he will be able to shed some more light on it.

kcbland · Post by **kcbland** » Tue Jan 16, 2007 4:06 pm

You need to be looking at cpu and disk utilization on database and ETL servers. Unless you're using all resources, you're not moving data as fast as you possibly can. Isolate job designs to limit mixed functionality is one way to optimize a portion of process. Then, multiple instances of that job will leverage more hardware.

Think of it this way: optimize the design of a straw until you can get as much slurpee flowing thru it without structural collapse. It will top out at a maximum liters/second Now, add more straws to increase the net flow per second of that sweet nectar.

Optimize your table scan to get a rows out as quick as possible, then run more of those until you top out the database or DS server. Sticking references and transformation logic means the database may wait for a set of rows to finish transforming before it sends the next packet. Since it's bored, you'll lose priority and process focus. Just dump out data as fast as you can. There's too many variables, database, network, query queue, rollback, etc. Eliminate as many as you can, so toss receiving process inattention.

It's the same rule for transformation. Get rid of database stages and just use sequential and hashed files. You've removed network and database from the performance equation so it's just cpu, memory, and disk. Optimize and then parallelize your pipeline.

For loading, the rule is, if you're not bulk loading, you're not data warehousing. Find ways to maximize your load. It's often faster to identify and separate inserts and updates, because then you can mass load inserts and mass apply updates. I've spent a lot of time talking about bulk loading updates into work tables and then leveraging a parallel databases capabilities to parallel dml update. A handy set of generic scripts to do these things is often your best friend. A metadata driven load script can generate dynamic load scripts such that some carefully applied standards will allow you do achieve phenomenal performance. Even with PX, I'd still use these alternative loading methods as you can get better update results.

DSguru2B · Post by **DSguru2B** » Tue Jan 16, 2007 4:09 pm

Do you have the sample mass update scripts. Maybe you can put it up as an FAQ. This will help all of us a lot Ken.

PS: Loved the slurpee example

rleishman · Post by **rleishman** » Tue Jan 16, 2007 8:18 pm

I had EXACTLY the same problem when we first started building DS jobs. Getting <100rows/sec. We now get ~5000/sec - We did the following:

- Use Bulk Loader, not OCI.
- Ensure FKs disabled, triggers disabled, and PK/UK that are enforced by non-unique indexes are disabled. This allows the use of Direct Path load.
- Place an IPC Stage between the source OCI and the Transformer.

I wrote this threadwhich explains it all.

This was all good, it doubled rows/sec or better, but it was still unimpressive. I finally found the problem: Routines. We created all sorts of library routines like NVL(), GREATEST(), etc, that we could call either from column derivations, or from other routines. We did a bunch of complex validation - all in routines - and they all called the library routines, which called each-other. It was a many-deep nested routine call stack.

We replaced all of the routines with TRANSFORMS. The complex validation routines were replaced by a series of stage-variables. We ended up performing exactly the same functionality without routines, but it ran 5 TIMES FASTER.