Improving Job performance
Moderators: chulett, rschirm, roy
Improving Job performance
Hello All,
Following is the pictorial view of an under performing job I have:
Green-> Hashed Files
Orange-> Transformer
Red -> OCI Stage
Info about jobs:
---------------------------------------------------------------------------
OCI:
Source count records < 400K
Array Size in OCI =1000, Transaction size =0, Transaction handling 10k
---------------------------------------------------------------------------
HASHED FILES:
All Hashed files Type 18
'Allow Stage write cache' is enabled, Modulus is 1, Separation is 3
Hashed file key column varies from 1 to 3
Private caching enabled, cache size (256MB)
Data in Hashed files is not growing; max row in Hashed file is < 2k
Hashed files have only required column and column width is defined as required.
---------------------------------------------------------------------------
TRANSFORMER:
Four custom routines are used for data quality check on ~50 source columns
---------------------------------------------------------------------------
SERVER:
Memory Size: 29696 MB
Good Memory Size: 29696 MB
Number Of Processors: 4
Processor Clock Speed: 1648 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
---------------------------------------------------------------------------
Job performance is 50rows/sec. What can I do to improve the job performance
Let me know if you guys need any other info.
-
LD
Following is the pictorial view of an under performing job I have:
Green-> Hashed Files
Orange-> Transformer
Red -> OCI Stage
Info about jobs:
---------------------------------------------------------------------------
OCI:
Source count records < 400K
Array Size in OCI =1000, Transaction size =0, Transaction handling 10k
---------------------------------------------------------------------------
HASHED FILES:
All Hashed files Type 18
'Allow Stage write cache' is enabled, Modulus is 1, Separation is 3
Hashed file key column varies from 1 to 3
Private caching enabled, cache size (256MB)
Data in Hashed files is not growing; max row in Hashed file is < 2k
Hashed files have only required column and column width is defined as required.
---------------------------------------------------------------------------
TRANSFORMER:
Four custom routines are used for data quality check on ~50 source columns
---------------------------------------------------------------------------
SERVER:
Memory Size: 29696 MB
Good Memory Size: 29696 MB
Number Of Processors: 4
Processor Clock Speed: 1648 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
---------------------------------------------------------------------------
Job performance is 50rows/sec. What can I do to improve the job performance
Let me know if you guys need any other info.
-
LD
First question in my mind is what Update Actions are you using in your target OCI stages? You could also determine if the lookups are a bottleneck by putting an @FALSE contraint in the last transformer for all of the output links so all the prep work is done but nothing is written to the target tables. If it still seems slow then your hashed file lookups are at least one of the culprits, otherwise I'd suspect the target work.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
All target OCI stages? If that's the case then your upserts don't seem to be the issue, the hashed lookups are. You said that 'cached writes' were enabled, which doesn't make sense with the design you've posted - did you mean cached reads?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 135
- Joined: Tue Aug 14, 2007 4:27 am
- Location: Mumbai
-
- Participant
- Posts: 135
- Joined: Tue Aug 14, 2007 4:27 am
- Location: Mumbai
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 63
- Joined: Fri Jun 22, 2007 7:27 am
- Location: Kolkata
Is there any complex query with condition at the source OCI?
How fast the data is being retrived from source database?
You are doing insert and update at the same time. Is your target table are properly indexed on key columns particularly for update process? I will prefer to separate OCI stage or even a separate jobs for insert and update.
How fast the data is being retrived from source database?
You are doing insert and update at the same time. Is your target table are properly indexed on key columns particularly for update process? I will prefer to separate OCI stage or even a separate jobs for insert and update.
In order to determine what can be done to "tune" this job, you need to establish some reference values first. Start off with just your source OCI stage and write straight to a sequential file at /dev/null; i.e. a copy of this job with just 2 stages. What is your Rows/Second speed for this job? This will be your absolute maximum attainable speed. If it isn't as fast as you expected, you will have to tune your query or speed up your database or transfer less data or a combination of those three changes.
Once that is done, implement the changes in the original job. What is the throughput now? Has it changed at all? If the OCI read speed was acceptable, then you need to find out if your OCI write speed is acceptable. Make a copy of the job and write to a flat file instead of OCI. What is the throughput? Now write a job that reads the sequential file and writes to OCI. What is that throughput?
Once that is done, implement the changes in the original job. What is the throughput now? Has it changed at all? If the OCI read speed was acceptable, then you need to find out if your OCI write speed is acceptable. Make a copy of the job and write to a flat file instead of OCI. What is the throughput? Now write a job that reads the sequential file and writes to OCI. What is that throughput?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 63
- Joined: Fri Jun 22, 2007 7:27 am
- Location: Kolkata
@gateleys Thanks for your suggestion, speed increased from 90 to 140 using inter process. 140 is with all seq file as source and target. I'll post the numbers after replacing the source and targets with OCI stages.
@Swapnil I'll try your suggestion
@Ray yes the inter-process row buffering is enabled. I'll try with more transformers and will post the results here is sometime.
@Arun As this is Upsert job I drop index before this job starts and recreate index later.
@ArndW I have tried removing all OCI and replacing them with seq file. the speed increased from 50 to 90.
Using the Inter process increased the speed to 140row/sec
I'll get the maximum attainable speed as you have suggested.
-
Shailesh
@Swapnil I'll try your suggestion
@Ray yes the inter-process row buffering is enabled. I'll try with more transformers and will post the results here is sometime.
@Arun As this is Upsert job I drop index before this job starts and recreate index later.
@ArndW I have tried removing all OCI and replacing them with seq file. the speed increased from 50 to 90.
Using the Inter process increased the speed to 140row/sec
I'll get the maximum attainable speed as you have suggested.
-
Shailesh
Hi All,
I tried few suggestions and here are the outcome:
#Max attainable speed using select and dumping data in seq file is 5.5k row/sec
Max attainable speed using sqlldr 'direct load' from is >10k row/sec.
I have a question on this, why is there almost 4k row/sec drop when using datastage ? Can the throughput like sqlldr in datastage ?
#Swapnil I tried to include most of the lookups in select query; but the select was becoming more complex. Though I included 2 of the lookups in sql joins but nothing significant gain in performance.
#Next I'll try adding more transformers so that number of lookups per Transformer can be reduced.
-
Shailesh
I tried few suggestions and here are the outcome:
#Max attainable speed using select and dumping data in seq file is 5.5k row/sec
Max attainable speed using sqlldr 'direct load' from is >10k row/sec.
I have a question on this, why is there almost 4k row/sec drop when using datastage ? Can the throughput like sqlldr in datastage ?
#Swapnil I tried to include most of the lookups in select query; but the select was becoming more complex. Though I included 2 of the lookups in sql joins but nothing significant gain in performance.
#Next I'll try adding more transformers so that number of lookups per Transformer can be reduced.
-
Shailesh