Performance question with ORAOCI8 stages
Moderators: chulett, rschirm, roy
Performance question with ORAOCI8 stages
Hi All
Is there any reason why having multiple input streams to a single ORAOCI8 stage would cause performance problems?
For some reason the job will be running along nicely until it reaches about 10000 records then the rows/sec drops off severely. To the point where I am at this point having to split the file prior to processing into smaller chunks.
There are 2 other lookups using ORA8 stages on the same stream prior to loading. I know that using these stages carries a performance overhead, but we are using it determine whether the record already exists and to gather other information. We could use hash files but there are millions of rows in each table.
The main bottleneck appears to be the insert. There are 2 insert input streams running into 2 different tables on the one ORA8 stage. The transaction size is set to 0 and the array size to 1000.
Is there any reason why performance would taper off after a certain period?
Is there any reason why having multiple input streams to a single ORAOCI8 stage would cause performance problems?
For some reason the job will be running along nicely until it reaches about 10000 records then the rows/sec drops off severely. To the point where I am at this point having to split the file prior to processing into smaller chunks.
There are 2 other lookups using ORA8 stages on the same stream prior to loading. I know that using these stages carries a performance overhead, but we are using it determine whether the record already exists and to gather other information. We could use hash files but there are millions of rows in each table.
The main bottleneck appears to be the insert. There are 2 insert input streams running into 2 different tables on the one ORA8 stage. The transaction size is set to 0 and the array size to 1000.
Is there any reason why performance would taper off after a certain period?
Ok maybe i have misled a little.WoMaWil wrote:Generaly the increasing number of stream should be an advantage and not a dissadvantage. Give me more details on your streams.
The job looks something like
Code: Select all
Seq1 ---> Trf1 ---> Trf2 ---> Trf3 --\
\ / \
\ / \
OraOCI8a OraOCI8b
/ \ /
/ \ /
Seq2 ---> Trf4 ---> Trf5 ---> Trf6 --/
I have done tests where I replace OraOCI8b with a sequential file and the speed increases greatly. But I am at a loss to explain why it slows down so much.
I might try landing it and loading it from there and see what happens.WoMaWil wrote: Maybe that via interim sequential stage you can increase performance.
It will still slow down... and it's all about Oracle. Are you doing pure inserts or one of those silly insert-else-update type actions? Have you had your DBAs monitor what is happening with regards to the target table while you are loading?
Factors can be a number of things like the indexes on the target or how rollback is being handled in the database, to name a couple. One quick experiment would be to change the Transaction Size to something other than zero and see if that 'fixes' things. Keep it a multiple of your Array Size but not a huge value... so 5,000 perhaps.
It that doesn't help, as much detail about your target environment as you can stand to give would probably help.
ps. And rethink your stand on OCI lookups. Who cares how many rows are in the tables you would end up hashing? It's all about only putting there just the rows that you need for each run. But that's a topic for another post, let's solve this problem first.
Factors can be a number of things like the indexes on the target or how rollback is being handled in the database, to name a couple. One quick experiment would be to change the Transaction Size to something other than zero and see if that 'fixes' things. Keep it a multiple of your Array Size but not a huge value... so 5,000 perhaps.
It that doesn't help, as much detail about your target environment as you can stand to give would probably help.
ps. And rethink your stand on OCI lookups. Who cares how many rows are in the tables you would end up hashing? It's all about only putting there just the rows that you need for each run. But that's a topic for another post, let's solve this problem first.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Shane,
you may profit (if the rows in tables in ORAOCI8a are few relative to Seq1 and Seq2) filling the content of ORAOCI8a into Hashfiles.
I suppose that none of the tables in ORAOCI8a is filled in OORAOCI8b otherwise you should use transaction size 1 and no hash files.
Maybe that the way you made your job is completly correct, but as it is transfered into code something may get wrong.
Try to devide the job into two jobs.
(1) seq1-trf1(with Oraoci8a1 as lookup)-trf2(with Oraoci8a2 as lookup)-trf3-oraoci8b
(2) seq2-trf4(with Oraoci8a1 as lookup)-trf5(with Oraoci8a2 as lookup)-trf6-oraoci8b
It is not logic but I would bet it will increase performance.
you may profit (if the rows in tables in ORAOCI8a are few relative to Seq1 and Seq2) filling the content of ORAOCI8a into Hashfiles.
I suppose that none of the tables in ORAOCI8a is filled in OORAOCI8b otherwise you should use transaction size 1 and no hash files.
Maybe that the way you made your job is completly correct, but as it is transfered into code something may get wrong.
Try to devide the job into two jobs.
(1) seq1-trf1(with Oraoci8a1 as lookup)-trf2(with Oraoci8a2 as lookup)-trf3-oraoci8b
(2) seq2-trf4(with Oraoci8a1 as lookup)-trf5(with Oraoci8a2 as lookup)-trf6-oraoci8b
It is not logic but I would bet it will increase performance.
Wolfgang Hürter
Amsterdam
Amsterdam
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: Performance question with ORAOCI8 stages
Yes, the reason is that you're equating rows/sec with performance.ShaneMuir wrote:Is there any reason why performance would taper off after a certain period?
Rows stop when all rows have been sent to the server, but there they're all queued waiting for the COMMIT. The clock keeps running while the database then inserts the rows.
Eschew rows/sec as a measure of performance. It's meaningless on so many levels. Prefer MB/min.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Darn, i should really start reading the posts completely rather than skimming through it
One reason why this might be happening is that if the table is huge (input in huge) and you have transaction size set to 0. Maybe the table temp size is filling up and hence slowing down the process. Try putting the commit level to every 10k and see if that helps.
One reason why this might be happening is that if the table is huge (input in huge) and you have transaction size set to 0. Maybe the table temp size is filling up and hence slowing down the process. Try putting the commit level to every 10k and see if that helps.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Ok this is a little embarrasing, but its seems that I have been misleading everybody a little
Turns out that the Array size is set to 1, not 1000. (Its true what they say 'When you assume you make an ASS out of U and ME) Pretty sure that this could have a detrimental effect.
The reason that it is set to 1 is for error capturing (another thing that i left of my drawing above ).
Will test if its still worth setting a transaction size however.
Turns out that the Array size is set to 1, not 1000. (Its true what they say 'When you assume you make an ASS out of U and ME) Pretty sure that this could have a detrimental effect.
The reason that it is set to 1 is for error capturing (another thing that i left of my drawing above ).
Will test if its still worth setting a transaction size however.