reading millions of records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
datastagedw
Participant
Posts: 53
Joined: Fri Mar 07, 2008 1:17 am

reading millions of records

Post by datastagedw »

hello all,

I have about 100 million records in an oracle table. i am trying to read it by using oracle enterprise stage. My job is quite simple, reading data from oracle followed by transformer , surrogate key generator stage and then loading into oracle again. i am using oracle enterprise stage once again for the target as well. I am running the job on 8 nodes. this job is taking around 18 hours to complete.I want to know whether i can speed this process up by using any environmental variables? Or shall we have to run multiple instances of the job so that to speed up the process.

i beleive there would be some way to avoid this long time. while reading the data i am using a where clause.But the query is quite simple with only one where clause condition. do we have any other DB stages to read the data apart from oracle enterprise stage to read the data. Is the loading of data into oracle slowing down the process? i am quite confused. Please help me out.

thanks once again.
ETL DEVELOPER
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Are you loading the target by bulk load or by the slower upsert? Are there indexes on the target table? Have you tried different array and transaction sizes in the source and target settings? Is there an Oracle DBA who can monitor the target data load? The target stage is likely to be the bottleneck in your job.
datastagedw
Participant
Posts: 53
Joined: Fri Mar 07, 2008 1:17 am

Post by datastagedw »

hello,

thanks for the ideas. well the target has index on some of the columns. source does not have any indices. I am using "load" and "append" method for loading into the target. this is a one time load into the target which does not have any records at all. I have tried the option of rebuild index mode in the target, but it does not help. As you mentioned, can you throw more light on array size and transaction. I have read and heard a lot but am not sure of its function. how can it increase the performance and how much shall i have to set this value to it.


thanks once again.
ETL DEVELOPER
datastagedw
Participant
Posts: 53
Joined: Fri Mar 07, 2008 1:17 am

Post by datastagedw »

hello,

thanks for the ideas. well the target has index on some of the columns. source does not have any indices. I am using "load" and "append" method for loading into the target. this is a one time load into the target which does not have any records at all. I have tried the option of rebuild index mode in the target, but it does not help. As you mentioned, can you throw more light on array size and transaction. I have read and heard a lot but am not sure of its function. how can it increase the performance and how much shall i have to set this value to it.


thanks once again.
ETL DEVELOPER
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

If Its one time load, then can you drop the indexes and create them after loading.
John Smith
Charter Member
Charter Member
Posts: 193
Joined: Tue Sep 05, 2006 8:01 pm
Location: Australia

Post by John Smith »

do you have RI (referential integrity) enabled in the table you are loading? FKs are notorious in slowing down any loads.
If there is you might have to drop all the FKs , do a direct load (check your SQL Loader ref) and once it's loaded re-enable your FKs.
munch9
Premium Member
Premium Member
Posts: 34
Joined: Fri Sep 15, 2006 7:26 am

Post by munch9 »

Another thing to check is the value of $APT_ORACLE_LOAD_OPTIONS.

If DIRECT=FALSE then oracle will still use INSERT logic.

Try setting DIRECT=TRUE and if it is acceptable in your situation specify UNRECOVERABLE to minimise the amount of logging performed.
eg OPTIONS(DIRECT=TRUE) UNRECOVERABLE

If you cannot use direct path they try setting the ROWS value to a higher number, this is number of rows loaded between each commit.
eg OPTIONS(DIRECT=FALSE, ROWS=nnnn)
You may need to try a few different values for nnnn to get the best results.
datastagedw
Participant
Posts: 53
Joined: Fri Mar 07, 2008 1:17 am

Post by datastagedw »

hello,

Thanks once again. Yes I did try the $APT_ORACLE_LOAD_OPTIONS. I set the ROWS= 50000 or so.but it really does not work . I have tried the direct=true and parallel=true also. Anyway let me check with UNRECOVERABLE option that you suggested. One thing I would like to mention. There is index mentioned on the omposite primary keys and also the target table is range partitioned based on a date column. But the partitioning is not even, I mean the whole data prior to year 2008 is in one partition and then from jan 2008 data is partitioned on monthly basis. Does it effect the performance.

Also I have checked the performance analysis, it shows that the target stage is taking almost 50% of the overall elapsed time. Maybe this information would be helpful.

thanks once again
ETL DEVELOPER
datastagedw
Participant
Posts: 53
Joined: Fri Mar 07, 2008 1:17 am

Post by datastagedw »

hello all,

I am still looking for some help for this topic. can anyone throw some more light.Can i do it in 2 jobs, like loading into a dataset first and then in a seperate job loading from dataset to oracle. does this help. because the DBA is not so happy dropping the indexes. i tried the options available in oracle stage, it does not help. please help me with a better performance tuning for this.

thanks
ETL DEVELOPER
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Wait, you have an Oracle DBA? They why aren't they telling you why the load is slow? It's their job to optimize the performance of that database. They should be able to monitor the table while DataStage is running to find out what the bottleneck is. You can disable the index from the DataStage before-sql and after-sql tabs so the index is only down while the load is happening but then you need to do some additional referential integrity checking in DataStage to avoid creation a situation where the index wont come back on.
CLOPES
Participant
Posts: 52
Joined: Tue Jul 22, 2003 8:05 am
Location: France
Contact:

Post by CLOPES »

Hello,

We had some problems too with OCI and lot of rows.
We decide to read table using "Read Method=Table" and "Partition Table="your_table.
Also, you can test in the Additional Connection adding the ArraySize=#PAR_ORA_ARRAY_SIZE# where #PAR_ORA_ARRAY_SIZE# equals to a value that you have to test. (Try 15000 first).
For the update way, see your DBA and the "INI_TRANS" table value. Put it to 8 because you have 8 nodes.

HTH
Post Reply