Slow Loading Performance - Oracle 10g

johm73 · Post by **johm73** » Fri Jan 12, 2007 3:40 pm

Source: 10gR2
Target: 10gR2

Pulling about 210K rows
54K get rejected to seq file
156K get loaded to target

Total load time takes 10 min. Rows/S is around 270. I've tried multiple array size settings, tuned query on source, but I cannot get any better performance than that. I've worked with other ETL tools and generally get around 3000-5000 Rows/S.

I'm curious of others thoughts on my current performance and what my expectations should be. I realize it is very system dependant, but does anyone know DataStage Server tuning parameters on AIX that could be "looked-into"?

I find it hard to believe that such a simple job is taking this long to run.

Thanks.

Jeff

I_Server_Whale · Post by **I_Server_Whale** » Fri Jan 12, 2007 3:46 pm

Jeff,

Hmm.....Are they just inserts, or just updates or upserts?

Whale.

DSguru2B · Post by **DSguru2B** » Fri Jan 12, 2007 3:50 pm

One reason is rejects. Try to get rid of your rejects. Correction: Try to get rid of rejects that create warnings in your log. Are you getting warnings? Also if your doing upserts, they are the worst. Try splitting your streams and doing inserts and updates seperately.

I_Server_Whale · Post by **I_Server_Whale** » Fri Jan 12, 2007 4:00 pm

Also, How are you handling the rejects in the transformer?

kcbland · Post by **kcbland** » Fri Jan 12, 2007 4:02 pm

1. What is your source database resource load when your process is running?
2. What is your target database resource load when your process is running?
3. What is your DataStage resource load when your process is running?
4. What is your DataStage job process CPU utilization when running?
5. What is the nature of your source data? 100M rows of 1 integer column will perform differently than 1M rows of 1000 integer columns when your unit of measure is ROWS/SEC.
6. What is the nature of the target table? Is it a partitioned table? Does it have a lot of indexes? Are there triggers on the table?
7. What is the DML you're using? INSERT ONLY will generate a lot of rejects on duplicated keys. INSERT else UPDATE tries the insert first, then does the update. UPDATE else INSERT tries the update first, then does the insert.
8. If most of your data is updates, that's significantly slower than doing inserts. If most of your data is inserts, what's the table growth factors?
9. What is the SQL pulling out of the source table? Is it a straight table dump or a complicated query?

This is just off the top of my head. Your job design probably looks like OCI --> XFM --> OCI, which is the simplest job to design and almost guaranteed to be the slowest job in performance. You're a premium member, you have complete access to read the methods and design characteristics whereby we move millions of rows at significant speeds using Server.

johm73 · Post by **johm73** » Fri Jan 12, 2007 4:03 pm

[quote="DSguru2B"]One reason is rejects. Try to get rid of your rejects. Correction: Try to get rid of rejects that create warnings in your log. Are you getting warnings? Also if your doing upserts, they are the worst. Try splitting your streams and doing inserts and updates seperately.[/quote]

They are only inserts, no updates at all. There is only 1 warning of type Reject in the log. It gives me the total number of rejected rows. I push the lookups down to the database within the source query rather than use hash files (they were making my job even slower). So, I know the rejected (failed lookup) rows right away by checking for null in 2 specific mapped columns.

Do you guys have general thoughts on the throughput (rows/S) a well tuned job should be getting? Are my initial thoughts ok or out-of-bounds with DataStage?

kcbland · Post by **kcbland** » Fri Jan 12, 2007 4:04 pm

Plus ditch the rejects.

The error mechanism is too verbose and generates significant network traffic plus all of the job logging drops you to a crawl. If you're simple using the rejects mechanism to determine new rows versus existing, you're better off with a hashed lookup.

kcbland · Post by **kcbland** » Fri Jan 12, 2007 4:10 pm

You don't measure in ROWS/SEC. Performance is measured by CPU utilization. If your DS process thread can run at its maximum speed, which for a single-threaded process is a fully utilized CPU, then you have a tuned job. If the process is waiting on network traffic, as in a query sending data across the network, coupled with the time the database takes to collect and send the data, then the DS process will probably be using less than a whole CPU.

You need to concentrate on isolating processing tasks to remove as many variables as you can. Jobs should not mix source data extraction and target table loading within the same stream. You cannot optimize or parallelize in those cases. If you need to invoke multiple job instances and partition your processing stream, creating parallel transformation pipelines, you can't because of the intermixing of databases into the mix.

You've also limited yourself to not using hashed files, the primary means for excellent reference processing.

DSguru2B · Post by **DSguru2B** » Fri Jan 12, 2007 4:13 pm

I am loading Oracle 10g with ~1.5KB record size with a throughput of 2.5Krows/sec. I am talking about 2 cpu server engine.All inserts. It also depends upon your record size, table indices etc. Out of curiosity. What transaction size/array size combo you tried ?

johm73 · Post by **johm73** » Fri Jan 12, 2007 4:13 pm

[quote="kcbland"]1. What is your source database resource load when your process is running?
2. What is your target database resource load when your process is running?
3. What is your DataStage resource load when your process is running?
4. What is your DataStage job process CPU utilization when running?
5. What is the nature of your source data? 100M rows of 1 integer column will perform differently than 1M rows of 1000 integer columns when your unit of measure is ROWS/SEC.
6. What is the nature of the target table? Is it a partitioned table? Does it have a lot of indexes? Are there triggers on the table?
7. What is the DML you're using? INSERT ONLY will generate a lot of rejects on duplicated keys. INSERT else UPDATE tries the insert first, then does the update. UPDATE else INSERT tries the update first, then does the insert.
8. If most of your data is updates, that's significantly slower than doing inserts. If most of your data is inserts, what's the table growth factors?
9. What is the SQL pulling out of the source table? Is it a straight table dump or a complicated query?

This is just off the top of my head. Your job design probably looks like OCI --> XFM --> OCI, which is the simplest job to design and almost guaranteed to be the slowest job in performance. You're a premium member, you have complete access to read the methods and design characteristics whereby we move millions of rows at significant speeds using Server.[/quote]

The source database and target database are the same, just different schemas.

I'm the only resource on the database or the datastage server; nothing else is running on them.

only doing inserts.

the query isn't too complex but the tables are somewhat large. But I've got it tuned as well as i can get it..for now as always room for improvement i'm sure.

Can you tell me how I access these methods and design to get max performance? Post links or give me search text to use...Please.

johm73 · Post by **johm73** » Fri Jan 12, 2007 4:22 pm

[quote="kcbland"]You don't measure in ROWS/SEC. Performance is measured by CPU utilization. If your DS process thread can run at its maximum speed, which for a single-threaded process is a fully utilized CPU, then you have a tuned job. If the process is waiting on network traffic, as in a query sending data across the network, coupled with the time the database takes to collect and send the data, then the DS process will probably be using less than a whole CPU.

You need to concentrate on isolating processing tasks to remove as many variables as you can. Jobs should not mix source data extraction and target table loading within the same stream. You cannot optimize or parallelize in those cases. If you need to invoke multiple job instances and partition your processing stream, creating parallel transformation pipelines, you can't because of the intermixing of databases into the mix.

You've also limited yourself to not using hashed files, the primary means for excellent reference processing.[/quote]

I only removed the hash files because they were slowing my jobs down. I get better performance now that I've pushed those lookups into the database. Besides, building the hash file is other jobs and accessing them in another job was extending the overall processing time of my entire load process. hash files are good, no doubt, but not the end all.

johm73 · Post by **johm73** » Fri Jan 12, 2007 4:25 pm

[quote="DSguru2B"]I am loading Oracle 10g with ~1.5KB record size with a throughput of 2.5Krows/sec. I am talking about 2 cpu server engine.All inserts. It also depends upon your record size, table indices etc. Out of curiosity. What transaction size/array size combo you tried ?[/quote]

my row size is 104. so, I used 1024/104=9.8 array size. trans size i set to 0.

DSguru2B · Post by **DSguru2B** » Fri Jan 12, 2007 4:27 pm

Try Array size 600. Try higher numbers. 10 wont bring you much. Also try transaction size with 0 then 10k or 20k. If you absolutely have to rollback if error occurs then specify transaction size as 0 else try with other numbers.

johm73 · Post by **johm73** » Fri Jan 12, 2007 4:30 pm

[quote="DSguru2B"]Try Array size 600. Try higher numbers. 10 wont bring you much. Also try transaction size with 0 then 10k or 20k. If you absolutely have to rollback if error occurs then specify transaction size as 0 else try with other numbers.[/quote]

Ok. I'll give some of that a shot. I've tried other numbers like 32767 and 100000. They gave me the same result. The calculation I gave came from some searching I did and I thought I'd try that.

any thoughts on what the pre-fetch memory should be?

DSguru2B · Post by **DSguru2B** » Fri Jan 12, 2007 5:33 pm

You can set that to 500 as well but thats not the bottleneck. Its your target database that has a backlock effect on the source. If you put @FALSE in the constraint of your output link you will see that the input link rows/sec will soar. An array size of 10 is hardly going to make any difference. Where did you get that calculation from?