jreddy wrote:how long should a simple server job that reads from a view (view joins 4 tables.. but when i run this view in Toad, it gives results in 7 mins)
This is invalid. Just running a query in Toad is not the same as spooling the data. If you do a "select count(*) from view" query, the database will optimize and cheat and not work thru the data, it will do index scans and other such methods to get around doing a lot of work to give you the count. In addition, you did not go thru the motions of actually spooling data across a network connection. This introduce runtime lag as well. To correctly benchmark this operation, you should have a DataStage job spool the data directly to a sequential text file without any references or transformation. This will accurately tell you how long it takes DataStage to gather that data from the view.
In addition, not knowing your view SQL, you could be viewing the first fetch set of result data. If there are points in the dataset that you haven't reached, you don't see that the view has performance issues. NEVER base a benchmark on the return of the first row, it's the LAST row that counts.
jreddy wrote:
returns about 2.8 million rows, goes thru a transformer (where i also assign a surrogate key to it by using an oracle sequence) and populates the target table.
This is unfortunate. You probably can eliminate this OCI lookup for every single row by simply getting the max surrogate assigned and passing it in as a job parameter. Then, simply keep adding incrementing a stage variable that initializes as that job parameter and you reduce the load on the job. You should also just generate a pure insert file and then bulk load it.
jreddy wrote:
Its taking about 8 hrs to run it. I have done the following: set the oracle 'rows per transaction' as 50000, array size at 1000. What else could i do to improve its performance (in reading or writing to database)
Fiddling with knobs won't cut it. You have a fundamental design choice to make, do you stay with your current design or do you break it into smaller, tunable, modular components and use bulk loading.
jreddy wrote:
I also had another job that had 23GB flat file that had to be uploaded.. took about 18hrs.. is this normal..
Normal is subjective. Unless you take advantage of job instantiation to divide-n-conquer massive files, unless you use bulk loading, unless you design for bulk loading, unless you modularize and focus job designs on mitigating outside influences to maximize individual tasks... You'll never get your massive volumes loaded quickly, efficiently, with audit trails and restart points.