Hi ,
I am pulling data from Oracle using transformer to change i/p and load it in to a hash file. Transformer has many IF THEN ELSE condition used. I tried to optimize the job using stage variables and inter process row buffering. But still not satisfied. Plz let me know what else i need to change in Hash file. I am using default type 30 dynamic file.
Thanks
Job optimization
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 73
- Joined: Wed Jun 28, 2006 3:27 pm
- Location: NJ
Welcome aboard. It's absolutely impossible to tell if your Oracle query is slow, the hashed file is improperly sized, or your transformation logic is slow. We also don't know how many rows of data you are processing.
Breakup your job and write your Oracle results just to a text file with no transformation and then you'll see how fast the data will flow from Oracle.
Transform the text file and write to another text file and you'll see how fast your transformation logic can fly.
Then, take the resulting text file and just load it into a hashed file and you'll see how fast DataStage can put your data into a hashed file.
Whichever of these three jobs is the slowest is the one that has the bottleneck. You'll probably see that it's the Oracle piece, but if not, then it's probably the transform piece.
Since your design is basically single threaded, you'll find you have an opportunity to take the middle transformation job and run multiple job instances and use every single cpu on your server. These multi-processing concepts are discussed here so have fun reading.
Breakup your job and write your Oracle results just to a text file with no transformation and then you'll see how fast the data will flow from Oracle.
Transform the text file and write to another text file and you'll see how fast your transformation logic can fly.
Then, take the resulting text file and just load it into a hashed file and you'll see how fast DataStage can put your data into a hashed file.
Whichever of these three jobs is the slowest is the one that has the bottleneck. You'll probably see that it's the Oracle piece, but if not, then it's probably the transform piece.
Since your design is basically single threaded, you'll find you have an opportunity to take the middle transformation job and run multiple job instances and use every single cpu on your server. These multi-processing concepts are discussed here so have fun reading.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 73
- Joined: Wed Jun 28, 2006 3:27 pm
- Location: NJ
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I very much doubt that it's the Oracle stage that's causing the problem; it's far more likely that it's how you've constructed the SQL query. Are you using generated SQL, user-defined SQL or a stored procedure to extract your rows? Can you post the SQL so we can suggest how it might be improved?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 73
- Joined: Wed Jun 28, 2006 3:27 pm
- Location: NJ
The WHERE causes the database to discriminate or search for data, which may or may not use indexes or partition elimination to more readily find the result rows.
The order by causes the entire query to completely execute and gather all result rows before sending anything to the requester, in this case a DS job. At that time, it's simply a matter of DS catching the rows as they traverse the database and network on its way over to the DS process.
Why sort the data? Once it goes into the hashed file it's row order is lost. Are you doing something that requires ordering?
The order by causes the entire query to completely execute and gather all result rows before sending anything to the requester, in this case a DS job. At that time, it's simply a matter of DS catching the rows as they traverse the database and network on its way over to the DS process.
Why sort the data? Once it goes into the hashed file it's row order is lost. Are you doing something that requires ordering?
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle