Terrible Performance due to log file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

jamesrender
Participant
Posts: 13
Joined: Fri Jan 06, 2006 9:20 am

Terrible Performance due to log file

Post by jamesrender »

Hi there,

I'm having major performance issues with a DataStage job in our production environment.

2 caveats. 1 I've started to look at DataStage last week, so please excuse my greeness, secondly I've inherited everything on this!

Right, heres my problem

I've an Oracle 9i table, it has 2 columns, both of them form the primary key.

I've got a data file with approx. 50,000 records in it.

My job has an FTP stage, creates a sequential file, then is transformed, with records going to an Oracle 9i stage. The job is running on a server remote to oracle.

The settings for the Oracle stage are

Update Action: Insert rows without clearing
Transaction Isolation: Read Committed
Columns: 2 both are decimal (though the numbers on the table are integers!)
Rows per transaction: 1000


This job takes a long time to run.. at best I'm getting 1 insert a second, but it slow degrades.

It appears as though the size of the datastage log file is dramatically impacting the performance of the job. Before I purged it, it had reached 250mb and inserts were taking 1 minute each, I cleared the log and started to get 4/5 inserts a sec (not exactly fantastic), slowly over time it degrades and gets to 3/4 secs an insert.

Now my timings are based on warnings that I get in director.

I am getting thousands of warnings due to ORA-00001 unique constraint violated. This is because, out of a 50,000 record file, only 100 or so records will be genuine inserts, the rest will already exist.

So as I can make out, I've got 2 issues.

1 - change the process so that all these entries don't written to the log file slowing it down

-> how can I do this? Change way warnings are reported, or find a way to eliminate the duplicates from the file

2 - look at the performance of the job to improve it.

-> I've viewed some of the other posts on performance and based on this I thought I could try changing the Oracle 9i stage to an Oracle OCI Load, but when I move my mouse over this it says Oracle 8.1 bulk loader - will that be okay for Oracle 9i. I saw that a suggestion is to write out inserts to a separate sequential file - does that mean literally insert statements ? I could write a cntl file and try to use sql/loader..


anyone with some advice on what I could try, please bare in mind that I've very little DS experience.

thanks for reading this much!!

J.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Terrible Performance due to log file

Post by kwwilliams »

Hi James!

Tough way to break into DataStage. What you need to do is figure out what you want to do with the other 49,900 rows that all ready exist in your table. If you just want to overide the values, change your oracle stage to perform a (update existing rows or insert new rows) instead of inserting without clearing.

Ask yourself this quetion. Is there going to be a row in the target table that does not exist in your source? If the answer is yes. Do you want to keep the row in the target without a match in the source? If the answer is yes for question 1 or both the update then insert option I mentioned above will work fine for you. If you answered yes and no respectively you should clear table then insert.

Your performance degredation has more to do with the Oracle primary key violation than it does with data stage. Figure out what you want to do with the data. then change the job to work accordingly.

Good luck!
jamesrender
Participant
Posts: 13
Joined: Fri Jan 06, 2006 9:20 am

Re: Terrible Performance due to log file

Post by jamesrender »

thanks for the reply, yes in at the deep end as usual.

well I'm not sure what my options are with this table.

I can't use 'update existing or insert new' because both columns form the primary key, I tried this and I got an oracle error (sorry can't remember which one - just said about the primary key)

I can't use replace existing as the columns are used as foreign keys.

So basically I can't update or delete from this table very easily..

I would LOVE to redesign the tables but this is not an option unfortunately..
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Perform a lookup against a hash file that contains the target table primary key columns spooled from the table. Use a constraint to only insert rows not in the hash lookup. That will run a heck of a lot faster than using database exception processing to let know the row is already there.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
jamesrender
Participant
Posts: 13
Joined: Fri Jan 06, 2006 9:20 am

Post by jamesrender »

kcbland wrote:Perform a lookup against a hash file that contains the target table primary key columns spooled from the table. Use a constraint to only insert rows not in the hash lookup. That will run a heck of a lot faster than using database exception processing to let know the row is already there.
thanks for the reply... could you give me a bit more details please. Sorry I new to DataStage.. could you say what stages I would use to do the above.. put a bit of meat on the bones so to speak.. I'll be able to look it up in the docs then..
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Job #1:
OCI (target table) --> Transformer --> HASH file

Job #2:
FTP (source data) --> Transformer --> SEQ file

Job #3:
SEQ file --> Transformer w/reference to HASH file, constraint only allows output rows where unique identifier in source not in HASH reference --> SEQ file

Job #4:
SEQ file --> Transformer w/reject link --> OCI (insert only SQL, commit 0)

Job #5:
Sequencer or Batch job control --> Run Jobs 1 & 2 simultaneously, then run 3 and then run 4. Use restart enabled Sequencer configuration.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
jamesrender
Participant
Posts: 13
Joined: Fri Jan 06, 2006 9:20 am

Post by jamesrender »

kcbland wrote:Job #1:
OCI (target table) --> Transformer --> HASH file

Job #2:
FTP (source data) --> Transformer --> SEQ file

Job #3:
SEQ file --> Transformer w/reference to HASH file, constraint only allows output rows where unique identifier in source not in HASH reference --> SEQ file

Job #4:
SEQ file --> Transformer w/reject link --> OCI (insert only SQL, commit 0)

Job #5:
Sequencer or Batch job control --> Run Jobs 1 & 2 simultaneously, then run 3 and then run 4. Use restart enabled Sequencer configuration.
sorry for the dumb question, when you say OCI, do you mean the oracle 9i stage or the bulk loader stage??

thanks
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

ORAOCI or ODBC, whichever you choose. The bulk loader is the ORABULK stage.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
gpatton
Premium Member
Premium Member
Posts: 47
Joined: Mon Jan 05, 2004 8:21 am

Post by gpatton »

Another option is to create a temporary table with the two columns in it.

Change the existing job to truncate then load into the table.

Then use an after SQL function to issue an insert into the target table using a not exists in the where clause.
jamesrender
Participant
Posts: 13
Joined: Fri Jan 06, 2006 9:20 am

Post by jamesrender »

kcbland wrote:ORAOCI or ODBC, whichever you choose. The bulk loader is the ORABULK stage.
okay, I'll give this a go, presumably there is some tutorial or guide in the documentation to help me get started with this....

thanks! wish me luck!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Did you even look at the DataStage log (via Director)? My first question would be why so much stuff is being logged; in an ideal world there should be only a few tens of entries per run, and no warnings at all.

Is auto-purge disabled, so that the log grows indefinitely? If so, manually purge the log then enable auto-purge.

Simply having a very large DataStage log will degrade performance.

Once that's fixed, then investigate what you can do with Oracle. Don't use the ORABULK stage for bulk loading - it's performance isn't that good either. Instead, write your own optimized control file and have DataStage produce the data file(s) using a Sequential File stage.

If you have updates, separate these and use an OCI stage to effect them.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jamesrender
Participant
Posts: 13
Joined: Fri Jan 06, 2006 9:20 am

Post by jamesrender »

ray.wurlod wrote:Did you even look at the DataStage log (via Director)? My first question would be why so much stuff is being logged; in an ideal world there should be only a few tens of entries per run, and no warnings at all.

Is auto-purge disabled, so that the log grows indefinitely? If so, manually purge the log then enable auto-purge.

Simply having a very large DataStage log will degrade performance.

Once that's fixed, then investigate what you can do with Oracle. Don't use the ORABULK stage for bulk loading - it's performance isn't that good either. Instead, write your own optimized control file and have DataStage produce the data file(s) using a Sequential File stage.

If you have updates, separate these and use an OCI stage to effect them.
Yes, too many warnings are being written to the log file due to the duplicate key oracle error. Each error produces 3 warnings for the log, multiply this by 50,000 and the log is getting pretty full.

I've change the log settings to auto-purge after 1 revision, this should help, but even so performance is lousy, talking about 28 hrs to load 50k records.

I like the sound of truncating then inserting into a new table, then inserting into existing table where not exists new table.. as it will save me having to completely revisit the job and spend a couple of weeks getting upto speed with hash file stages etc.. now if I can get hold of my friendly dba...

thanks
rafidwh
Participant
Posts: 179
Joined: Mon Oct 10, 2005 11:30 pm

Post by rafidwh »

You are right ,contact your DBA,I think you are facing an Archiev error.The DBA should purge all the logs in the Database.
Check the records also,handle the null values if any in the primary key columns before populating to Target Database
Good luck
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It's got nothing to do with database 'Archive' errors. And keep in mind the fact that Auto Purge of a job's log will only kick in if the job finishes without warnings.

James - you need to take the time to get up to speed on hashed file stages and precheck lookups to determine insert v. update (etc) as those kinds of things are the heart and soul of DataStage Server.
-craig

"You can never have too many knives" -- Logan Nine Fingers
jamesrender
Participant
Posts: 13
Joined: Fri Jan 06, 2006 9:20 am

Post by jamesrender »

chulett wrote:It's got nothing to do with database 'Archive' errors. And keep in mind the fact that Auto Purge of a job's log will only kick in if the job finishes without warnings.

James - you need to take the time to get up to speed on hashed file stages and precheck lookups to determine insert v. update (etc) as those kinds of things are the heart and soul of DataStage Server.
yeah, hence when I started looking at the job, it hadn't completed for several runs and the log was massive.

i appreciate your comment about taking the time to learn DataStage, unfortunately its a production fix & they (management) are thinking about removing DataStage later this year AND my inbox is starting to overflow =D

I think that it looks like a really useful tool
Post Reply