Terrible Performance due to log file
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 13
- Joined: Fri Jan 06, 2006 9:20 am
Terrible Performance due to log file
Hi there,
I'm having major performance issues with a DataStage job in our production environment.
2 caveats. 1 I've started to look at DataStage last week, so please excuse my greeness, secondly I've inherited everything on this!
Right, heres my problem
I've an Oracle 9i table, it has 2 columns, both of them form the primary key.
I've got a data file with approx. 50,000 records in it.
My job has an FTP stage, creates a sequential file, then is transformed, with records going to an Oracle 9i stage. The job is running on a server remote to oracle.
The settings for the Oracle stage are
Update Action: Insert rows without clearing
Transaction Isolation: Read Committed
Columns: 2 both are decimal (though the numbers on the table are integers!)
Rows per transaction: 1000
This job takes a long time to run.. at best I'm getting 1 insert a second, but it slow degrades.
It appears as though the size of the datastage log file is dramatically impacting the performance of the job. Before I purged it, it had reached 250mb and inserts were taking 1 minute each, I cleared the log and started to get 4/5 inserts a sec (not exactly fantastic), slowly over time it degrades and gets to 3/4 secs an insert.
Now my timings are based on warnings that I get in director.
I am getting thousands of warnings due to ORA-00001 unique constraint violated. This is because, out of a 50,000 record file, only 100 or so records will be genuine inserts, the rest will already exist.
So as I can make out, I've got 2 issues.
1 - change the process so that all these entries don't written to the log file slowing it down
-> how can I do this? Change way warnings are reported, or find a way to eliminate the duplicates from the file
2 - look at the performance of the job to improve it.
-> I've viewed some of the other posts on performance and based on this I thought I could try changing the Oracle 9i stage to an Oracle OCI Load, but when I move my mouse over this it says Oracle 8.1 bulk loader - will that be okay for Oracle 9i. I saw that a suggestion is to write out inserts to a separate sequential file - does that mean literally insert statements ? I could write a cntl file and try to use sql/loader..
anyone with some advice on what I could try, please bare in mind that I've very little DS experience.
thanks for reading this much!!
J.
I'm having major performance issues with a DataStage job in our production environment.
2 caveats. 1 I've started to look at DataStage last week, so please excuse my greeness, secondly I've inherited everything on this!
Right, heres my problem
I've an Oracle 9i table, it has 2 columns, both of them form the primary key.
I've got a data file with approx. 50,000 records in it.
My job has an FTP stage, creates a sequential file, then is transformed, with records going to an Oracle 9i stage. The job is running on a server remote to oracle.
The settings for the Oracle stage are
Update Action: Insert rows without clearing
Transaction Isolation: Read Committed
Columns: 2 both are decimal (though the numbers on the table are integers!)
Rows per transaction: 1000
This job takes a long time to run.. at best I'm getting 1 insert a second, but it slow degrades.
It appears as though the size of the datastage log file is dramatically impacting the performance of the job. Before I purged it, it had reached 250mb and inserts were taking 1 minute each, I cleared the log and started to get 4/5 inserts a sec (not exactly fantastic), slowly over time it degrades and gets to 3/4 secs an insert.
Now my timings are based on warnings that I get in director.
I am getting thousands of warnings due to ORA-00001 unique constraint violated. This is because, out of a 50,000 record file, only 100 or so records will be genuine inserts, the rest will already exist.
So as I can make out, I've got 2 issues.
1 - change the process so that all these entries don't written to the log file slowing it down
-> how can I do this? Change way warnings are reported, or find a way to eliminate the duplicates from the file
2 - look at the performance of the job to improve it.
-> I've viewed some of the other posts on performance and based on this I thought I could try changing the Oracle 9i stage to an Oracle OCI Load, but when I move my mouse over this it says Oracle 8.1 bulk loader - will that be okay for Oracle 9i. I saw that a suggestion is to write out inserts to a separate sequential file - does that mean literally insert statements ? I could write a cntl file and try to use sql/loader..
anyone with some advice on what I could try, please bare in mind that I've very little DS experience.
thanks for reading this much!!
J.
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
Re: Terrible Performance due to log file
Hi James!
Tough way to break into DataStage. What you need to do is figure out what you want to do with the other 49,900 rows that all ready exist in your table. If you just want to overide the values, change your oracle stage to perform a (update existing rows or insert new rows) instead of inserting without clearing.
Ask yourself this quetion. Is there going to be a row in the target table that does not exist in your source? If the answer is yes. Do you want to keep the row in the target without a match in the source? If the answer is yes for question 1 or both the update then insert option I mentioned above will work fine for you. If you answered yes and no respectively you should clear table then insert.
Your performance degredation has more to do with the Oracle primary key violation than it does with data stage. Figure out what you want to do with the data. then change the job to work accordingly.
Good luck!
Tough way to break into DataStage. What you need to do is figure out what you want to do with the other 49,900 rows that all ready exist in your table. If you just want to overide the values, change your oracle stage to perform a (update existing rows or insert new rows) instead of inserting without clearing.
Ask yourself this quetion. Is there going to be a row in the target table that does not exist in your source? If the answer is yes. Do you want to keep the row in the target without a match in the source? If the answer is yes for question 1 or both the update then insert option I mentioned above will work fine for you. If you answered yes and no respectively you should clear table then insert.
Your performance degredation has more to do with the Oracle primary key violation than it does with data stage. Figure out what you want to do with the data. then change the job to work accordingly.
Good luck!
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
-
- Participant
- Posts: 13
- Joined: Fri Jan 06, 2006 9:20 am
Re: Terrible Performance due to log file
thanks for the reply, yes in at the deep end as usual.
well I'm not sure what my options are with this table.
I can't use 'update existing or insert new' because both columns form the primary key, I tried this and I got an oracle error (sorry can't remember which one - just said about the primary key)
I can't use replace existing as the columns are used as foreign keys.
So basically I can't update or delete from this table very easily..
I would LOVE to redesign the tables but this is not an option unfortunately..
well I'm not sure what my options are with this table.
I can't use 'update existing or insert new' because both columns form the primary key, I tried this and I got an oracle error (sorry can't remember which one - just said about the primary key)
I can't use replace existing as the columns are used as foreign keys.
So basically I can't update or delete from this table very easily..
I would LOVE to redesign the tables but this is not an option unfortunately..
Perform a lookup against a hash file that contains the target table primary key columns spooled from the table. Use a constraint to only insert rows not in the hash lookup. That will run a heck of a lot faster than using database exception processing to let know the row is already there.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 13
- Joined: Fri Jan 06, 2006 9:20 am
thanks for the reply... could you give me a bit more details please. Sorry I new to DataStage.. could you say what stages I would use to do the above.. put a bit of meat on the bones so to speak.. I'll be able to look it up in the docs then..kcbland wrote:Perform a lookup against a hash file that contains the target table primary key columns spooled from the table. Use a constraint to only insert rows not in the hash lookup. That will run a heck of a lot faster than using database exception processing to let know the row is already there.
Job #1:
OCI (target table) --> Transformer --> HASH file
Job #2:
FTP (source data) --> Transformer --> SEQ file
Job #3:
SEQ file --> Transformer w/reference to HASH file, constraint only allows output rows where unique identifier in source not in HASH reference --> SEQ file
Job #4:
SEQ file --> Transformer w/reject link --> OCI (insert only SQL, commit 0)
Job #5:
Sequencer or Batch job control --> Run Jobs 1 & 2 simultaneously, then run 3 and then run 4. Use restart enabled Sequencer configuration.
OCI (target table) --> Transformer --> HASH file
Job #2:
FTP (source data) --> Transformer --> SEQ file
Job #3:
SEQ file --> Transformer w/reference to HASH file, constraint only allows output rows where unique identifier in source not in HASH reference --> SEQ file
Job #4:
SEQ file --> Transformer w/reject link --> OCI (insert only SQL, commit 0)
Job #5:
Sequencer or Batch job control --> Run Jobs 1 & 2 simultaneously, then run 3 and then run 4. Use restart enabled Sequencer configuration.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 13
- Joined: Fri Jan 06, 2006 9:20 am
sorry for the dumb question, when you say OCI, do you mean the oracle 9i stage or the bulk loader stage??kcbland wrote:Job #1:
OCI (target table) --> Transformer --> HASH file
Job #2:
FTP (source data) --> Transformer --> SEQ file
Job #3:
SEQ file --> Transformer w/reference to HASH file, constraint only allows output rows where unique identifier in source not in HASH reference --> SEQ file
Job #4:
SEQ file --> Transformer w/reject link --> OCI (insert only SQL, commit 0)
Job #5:
Sequencer or Batch job control --> Run Jobs 1 & 2 simultaneously, then run 3 and then run 4. Use restart enabled Sequencer configuration.
thanks
-
- Participant
- Posts: 13
- Joined: Fri Jan 06, 2006 9:20 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Did you even look at the DataStage log (via Director)? My first question would be why so much stuff is being logged; in an ideal world there should be only a few tens of entries per run, and no warnings at all.
Is auto-purge disabled, so that the log grows indefinitely? If so, manually purge the log then enable auto-purge.
Simply having a very large DataStage log will degrade performance.
Once that's fixed, then investigate what you can do with Oracle. Don't use the ORABULK stage for bulk loading - it's performance isn't that good either. Instead, write your own optimized control file and have DataStage produce the data file(s) using a Sequential File stage.
If you have updates, separate these and use an OCI stage to effect them.
Is auto-purge disabled, so that the log grows indefinitely? If so, manually purge the log then enable auto-purge.
Simply having a very large DataStage log will degrade performance.
Once that's fixed, then investigate what you can do with Oracle. Don't use the ORABULK stage for bulk loading - it's performance isn't that good either. Instead, write your own optimized control file and have DataStage produce the data file(s) using a Sequential File stage.
If you have updates, separate these and use an OCI stage to effect them.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 13
- Joined: Fri Jan 06, 2006 9:20 am
Yes, too many warnings are being written to the log file due to the duplicate key oracle error. Each error produces 3 warnings for the log, multiply this by 50,000 and the log is getting pretty full.ray.wurlod wrote:Did you even look at the DataStage log (via Director)? My first question would be why so much stuff is being logged; in an ideal world there should be only a few tens of entries per run, and no warnings at all.
Is auto-purge disabled, so that the log grows indefinitely? If so, manually purge the log then enable auto-purge.
Simply having a very large DataStage log will degrade performance.
Once that's fixed, then investigate what you can do with Oracle. Don't use the ORABULK stage for bulk loading - it's performance isn't that good either. Instead, write your own optimized control file and have DataStage produce the data file(s) using a Sequential File stage.
If you have updates, separate these and use an OCI stage to effect them.
I've change the log settings to auto-purge after 1 revision, this should help, but even so performance is lousy, talking about 28 hrs to load 50k records.
I like the sound of truncating then inserting into a new table, then inserting into existing table where not exists new table.. as it will save me having to completely revisit the job and spend a couple of weeks getting upto speed with hash file stages etc.. now if I can get hold of my friendly dba...
thanks
It's got nothing to do with database 'Archive' errors. And keep in mind the fact that Auto Purge of a job's log will only kick in if the job finishes without warnings.
James - you need to take the time to get up to speed on hashed file stages and precheck lookups to determine insert v. update (etc) as those kinds of things are the heart and soul of DataStage Server.
James - you need to take the time to get up to speed on hashed file stages and precheck lookups to determine insert v. update (etc) as those kinds of things are the heart and soul of DataStage Server.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 13
- Joined: Fri Jan 06, 2006 9:20 am
yeah, hence when I started looking at the job, it hadn't completed for several runs and the log was massive.chulett wrote:It's got nothing to do with database 'Archive' errors. And keep in mind the fact that Auto Purge of a job's log will only kick in if the job finishes without warnings.
James - you need to take the time to get up to speed on hashed file stages and precheck lookups to determine insert v. update (etc) as those kinds of things are the heart and soul of DataStage Server.
i appreciate your comment about taking the time to learn DataStage, unfortunately its a production fix & they (management) are thinking about removing DataStage later this year AND my inbox is starting to overflow =D
I think that it looks like a really useful tool