Performance question with ORAOCI8 stages

ShaneMuir · Post by **ShaneMuir** » Thu Nov 09, 2006 12:57 am

Hi All

Is there any reason why having multiple input streams to a single ORAOCI8 stage would cause performance problems?

For some reason the job will be running along nicely until it reaches about 10000 records then the rows/sec drops off severely. To the point where I am at this point having to split the file prior to processing into smaller chunks.

There are 2 other lookups using ORA8 stages on the same stream prior to loading. I know that using these stages carries a performance overhead, but we are using it determine whether the record already exists and to gather other information. We could use hash files but there are millions of rows in each table.

The main bottleneck appears to be the insert. There are 2 insert input streams running into 2 different tables on the one ORA8 stage. The transaction size is set to 0 and the array size to 1000.

Is there any reason why performance would taper off after a certain period?

WoMaWil · Post by **WoMaWil** » Thu Nov 09, 2006 6:32 am

Generaly the increasing number of stream should be an advantage and not a dissadvantage. Give me more details on your streams.

Maybe that via interim sequential stage you can increase performance.

ShaneMuir · Post by **ShaneMuir** » Thu Nov 09, 2006 7:38 am

WoMaWil wrote:Generaly the increasing number of stream should be an advantage and not a dissadvantage. Give me more details on your streams.

Ok maybe i have misled a little.

The job looks something like

Code: Select all

Seq1 ---> Trf1 ---> Trf2 ---> Trf3 --\
            \      /                  \
             \    /                    \
            OraOCI8a                    OraOCI8b
             /    \                    /
            /      \                  /
Seq2 ---> Trf4 ---> Trf5 ---> Trf6 --/

OraOCI8a has 4 lookups. Trf1 and Trf4 check a different table each to see if the record exists and filters if it does. Trf2 and Trf5 get further information from a different table each. Trf3 and Trf6 set default values where a value is present. Each stream into OraOCI8b loads a different table. Effectively it is two separate streams sharing the Oracle Stages.
I have done tests where I replace OraOCI8b with a sequential file and the speed increases greatly. But I am at a loss to explain why it slows down so much.

WoMaWil wrote: Maybe that via interim sequential stage you can increase performance.

I might try landing it and loading it from there and see what happens.

chulett · Post by **chulett** » Thu Nov 09, 2006 7:51 am

It will still slow down... and it's all about Oracle. Are you doing pure inserts or one of those silly insert-else-update type actions? Have you had your DBAs monitor what is happening with regards to the target table while you are loading?

Factors can be a number of things like the indexes on the target or how rollback is being handled in the database, to name a couple. One quick experiment would be to change the Transaction Size to something other than zero and see if that 'fixes' things. Keep it a multiple of your Array Size but not a huge value... so 5,000 perhaps.

It that doesn't help, as much detail about your target environment as you can stand to give would probably help.

ps. And rethink your stand on OCI lookups. Who cares how many rows are in the tables you would end up hashing? It's all about only putting there just the rows that you need for each run. But that's a topic for another post, let's solve this problem first.

DSguru2B · Post by **DSguru2B** » Thu Nov 09, 2006 7:53 am

What is the array size and transaction size that you have specified in your target OCI stage?
That can make a considerable amount of difference in the way your job behaves.

WoMaWil · Post by **WoMaWil** » Thu Nov 09, 2006 7:56 am

Shane,

you may profit (if the rows in tables in ORAOCI8a are few relative to Seq1 and Seq2) filling the content of ORAOCI8a into Hashfiles.

I suppose that none of the tables in ORAOCI8a is filled in OORAOCI8b otherwise you should use transaction size 1 and no hash files.

Maybe that the way you made your job is completly correct, but as it is transfered into code something may get wrong.

Try to devide the job into two jobs.

(1) seq1-trf1(with Oraoci8a1 as lookup)-trf2(with Oraoci8a2 as lookup)-trf3-oraoci8b
(2) seq2-trf4(with Oraoci8a1 as lookup)-trf5(with Oraoci8a2 as lookup)-trf6-oraoci8b

It is not logic but I would bet it will increase performance.

chulett · Post by **chulett** » Thu Nov 09, 2006 7:56 am

What is the fourth paragraph in the first post?

(Dang, Wolfgang snuck in before me... that was for Guru)

ray.wurlod · Post by **ray.wurlod** » Thu Nov 09, 2006 8:59 am

ShaneMuir wrote:Is there any reason why performance would taper off after a certain period?

Yes, the reason is that you're equating rows/sec with performance.

Rows stop when all rows have been sent to the server, but there they're all queued waiting for the COMMIT. The clock keeps running while the database then inserts the rows.

Eschew rows/sec as a measure of performance. It's meaningless on so many levels. Prefer MB/min.

DSguru2B · Post by **DSguru2B** » Thu Nov 09, 2006 9:59 am

Darn, i should really start reading the posts completely rather than skimming through it

One reason why this might be happening is that if the table is huge (input in huge) and you have transaction size set to 0. Maybe the table temp size is filling up and hence slowing down the process. Try putting the commit level to every 10k and see if that helps.

chulett · Post by **chulett** » Thu Nov 09, 2006 10:10 am

Darn, you should really start reading replies too before you go and give advice that's already been given.

DSguru2B · Post by **DSguru2B** » Thu Nov 09, 2006 11:36 am

ShaneMuir · Post by **ShaneMuir** » Thu Nov 09, 2006 6:15 pm

Thanks for all your input guys.

It seems the general 'vibe' here is to try a transaction size. Initially it was set to 0 for ease of re-run but since we are checking to see if a record already exists and filtering it, that seems kind of pointless.

Will give it a go and let you know how it all went

chulett · Post by **chulett** » Thu Nov 09, 2006 7:19 pm

Keep in mind the fact that you aren't being asked to change it forever, just to see what affect it has on your job.

ShaneMuir · Post by **ShaneMuir** » Thu Nov 09, 2006 7:21 pm

chulett wrote:Keep in mind the fact that you aren't being asked to change it forever, just to see what affect it has on your job.

Without a doubt - just testing at present.

ShaneMuir · Post by **ShaneMuir** » Thu Nov 09, 2006 7:50 pm

Ok this is a little embarrasing, but its seems that I have been misleading everybody a little

Turns out that the Array size is set to 1, not 1000. (Its true what they say 'When you assume you make an ASS out of U and ME) Pretty sure that this could have a detrimental effect.

The reason that it is set to 1 is for error capturing (another thing that i left of my drawing above

).

Will test if its still worth setting a transaction size however.

DSXchange

Performance question with ORAOCI8 stages

Performance question with ORAOCI8 stages

Re: Performance question with ORAOCI8 stages