Job optimization

urshit_1983 · Post by **urshit_1983** » Wed Jun 28, 2006 3:38 pm

Hi ,

I am pulling data from Oracle using transformer to change i/p and load it in to a hash file. Transformer has many IF THEN ELSE condition used. I tried to optimize the job using stage variables and inter process row buffering. But still not satisfied. Plz let me know what else i need to change in Hash file. I am using default type 30 dynamic file.

Thanks

kcbland · Post by **kcbland** » Wed Jun 28, 2006 3:48 pm

Welcome aboard. It's absolutely impossible to tell if your Oracle query is slow, the hashed file is improperly sized, or your transformation logic is slow. We also don't know how many rows of data you are processing.

Breakup your job and write your Oracle results just to a text file with no transformation and then you'll see how fast the data will flow from Oracle.

Transform the text file and write to another text file and you'll see how fast your transformation logic can fly.

Then, take the resulting text file and just load it into a hashed file and you'll see how fast DataStage can put your data into a hashed file.

Whichever of these three jobs is the slowest is the one that has the bottleneck. You'll probably see that it's the Oracle piece, but if not, then it's probably the transform piece.

Since your design is basically single threaded, you'll find you have an opportunity to take the middle transformation job and run multiple job instances and use every single cpu on your server. These multi-processing concepts are discussed here so have fun reading.

urshit_1983 · Post by **urshit_1983** » Wed Jun 28, 2006 4:16 pm

Hey Kenn,

yes i found out its Oracle stage causing prob. What shall I do now? could you guide me out.

ray.wurlod · Post by **ray.wurlod** » Wed Jun 28, 2006 4:19 pm

I very much doubt that it's the Oracle stage that's causing the problem; it's far more likely that it's how you've constructed the SQL query. Are you using generated SQL, user-defined SQL or a stored procedure to extract your rows? Can you post the SQL so we can suggest how it might be improved?

urshit_1983 · Post by **urshit_1983** » Wed Jun 28, 2006 4:23 pm

Hi Ray,

I am using generated SQL and user defined as well the where clause and order by clause to sort data before changing in tx.

kcbland · Post by **kcbland** » Wed Jun 28, 2006 4:37 pm

The WHERE causes the database to discriminate or search for data, which may or may not use indexes or partition elimination to more readily find the result rows.

The order by causes the entire query to completely execute and gather all result rows before sending anything to the requester, in this case a DS job. At that time, it's simply a matter of DS catching the rows as they traverse the database and network on its way over to the DS process.

Why sort the data? Once it goes into the hashed file it's row order is lost. Are you doing something that requires ordering?