reading millions of records
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 53
- Joined: Fri Mar 07, 2008 1:17 am
reading millions of records
hello all,
I have about 100 million records in an oracle table. i am trying to read it by using oracle enterprise stage. My job is quite simple, reading data from oracle followed by transformer , surrogate key generator stage and then loading into oracle again. i am using oracle enterprise stage once again for the target as well. I am running the job on 8 nodes. this job is taking around 18 hours to complete.I want to know whether i can speed this process up by using any environmental variables? Or shall we have to run multiple instances of the job so that to speed up the process.
i beleive there would be some way to avoid this long time. while reading the data i am using a where clause.But the query is quite simple with only one where clause condition. do we have any other DB stages to read the data apart from oracle enterprise stage to read the data. Is the loading of data into oracle slowing down the process? i am quite confused. Please help me out.
thanks once again.
I have about 100 million records in an oracle table. i am trying to read it by using oracle enterprise stage. My job is quite simple, reading data from oracle followed by transformer , surrogate key generator stage and then loading into oracle again. i am using oracle enterprise stage once again for the target as well. I am running the job on 8 nodes. this job is taking around 18 hours to complete.I want to know whether i can speed this process up by using any environmental variables? Or shall we have to run multiple instances of the job so that to speed up the process.
i beleive there would be some way to avoid this long time. while reading the data i am using a where clause.But the query is quite simple with only one where clause condition. do we have any other DB stages to read the data apart from oracle enterprise stage to read the data. Is the loading of data into oracle slowing down the process? i am quite confused. Please help me out.
thanks once again.
ETL DEVELOPER
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Are you loading the target by bulk load or by the slower upsert? Are there indexes on the target table? Have you tried different array and transaction sizes in the source and target settings? Is there an Oracle DBA who can monitor the target data load? The target stage is likely to be the bottleneck in your job.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 53
- Joined: Fri Mar 07, 2008 1:17 am
hello,
thanks for the ideas. well the target has index on some of the columns. source does not have any indices. I am using "load" and "append" method for loading into the target. this is a one time load into the target which does not have any records at all. I have tried the option of rebuild index mode in the target, but it does not help. As you mentioned, can you throw more light on array size and transaction. I have read and heard a lot but am not sure of its function. how can it increase the performance and how much shall i have to set this value to it.
thanks once again.
thanks for the ideas. well the target has index on some of the columns. source does not have any indices. I am using "load" and "append" method for loading into the target. this is a one time load into the target which does not have any records at all. I have tried the option of rebuild index mode in the target, but it does not help. As you mentioned, can you throw more light on array size and transaction. I have read and heard a lot but am not sure of its function. how can it increase the performance and how much shall i have to set this value to it.
thanks once again.
ETL DEVELOPER
-
- Participant
- Posts: 53
- Joined: Fri Mar 07, 2008 1:17 am
hello,
thanks for the ideas. well the target has index on some of the columns. source does not have any indices. I am using "load" and "append" method for loading into the target. this is a one time load into the target which does not have any records at all. I have tried the option of rebuild index mode in the target, but it does not help. As you mentioned, can you throw more light on array size and transaction. I have read and heard a lot but am not sure of its function. how can it increase the performance and how much shall i have to set this value to it.
thanks once again.
thanks for the ideas. well the target has index on some of the columns. source does not have any indices. I am using "load" and "append" method for loading into the target. this is a one time load into the target which does not have any records at all. I have tried the option of rebuild index mode in the target, but it does not help. As you mentioned, can you throw more light on array size and transaction. I have read and heard a lot but am not sure of its function. how can it increase the performance and how much shall i have to set this value to it.
thanks once again.
ETL DEVELOPER
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia
-
- Charter Member
- Posts: 193
- Joined: Tue Sep 05, 2006 8:01 pm
- Location: Australia
Another thing to check is the value of $APT_ORACLE_LOAD_OPTIONS.
If DIRECT=FALSE then oracle will still use INSERT logic.
Try setting DIRECT=TRUE and if it is acceptable in your situation specify UNRECOVERABLE to minimise the amount of logging performed.
eg OPTIONS(DIRECT=TRUE) UNRECOVERABLE
If you cannot use direct path they try setting the ROWS value to a higher number, this is number of rows loaded between each commit.
eg OPTIONS(DIRECT=FALSE, ROWS=nnnn)
You may need to try a few different values for nnnn to get the best results.
If DIRECT=FALSE then oracle will still use INSERT logic.
Try setting DIRECT=TRUE and if it is acceptable in your situation specify UNRECOVERABLE to minimise the amount of logging performed.
eg OPTIONS(DIRECT=TRUE) UNRECOVERABLE
If you cannot use direct path they try setting the ROWS value to a higher number, this is number of rows loaded between each commit.
eg OPTIONS(DIRECT=FALSE, ROWS=nnnn)
You may need to try a few different values for nnnn to get the best results.
-
- Participant
- Posts: 53
- Joined: Fri Mar 07, 2008 1:17 am
hello,
Thanks once again. Yes I did try the $APT_ORACLE_LOAD_OPTIONS. I set the ROWS= 50000 or so.but it really does not work . I have tried the direct=true and parallel=true also. Anyway let me check with UNRECOVERABLE option that you suggested. One thing I would like to mention. There is index mentioned on the omposite primary keys and also the target table is range partitioned based on a date column. But the partitioning is not even, I mean the whole data prior to year 2008 is in one partition and then from jan 2008 data is partitioned on monthly basis. Does it effect the performance.
Also I have checked the performance analysis, it shows that the target stage is taking almost 50% of the overall elapsed time. Maybe this information would be helpful.
thanks once again
Thanks once again. Yes I did try the $APT_ORACLE_LOAD_OPTIONS. I set the ROWS= 50000 or so.but it really does not work . I have tried the direct=true and parallel=true also. Anyway let me check with UNRECOVERABLE option that you suggested. One thing I would like to mention. There is index mentioned on the omposite primary keys and also the target table is range partitioned based on a date column. But the partitioning is not even, I mean the whole data prior to year 2008 is in one partition and then from jan 2008 data is partitioned on monthly basis. Does it effect the performance.
Also I have checked the performance analysis, it shows that the target stage is taking almost 50% of the overall elapsed time. Maybe this information would be helpful.
thanks once again
ETL DEVELOPER
-
- Participant
- Posts: 53
- Joined: Fri Mar 07, 2008 1:17 am
hello all,
I am still looking for some help for this topic. can anyone throw some more light.Can i do it in 2 jobs, like loading into a dataset first and then in a seperate job loading from dataset to oracle. does this help. because the DBA is not so happy dropping the indexes. i tried the options available in oracle stage, it does not help. please help me with a better performance tuning for this.
thanks
I am still looking for some help for this topic. can anyone throw some more light.Can i do it in 2 jobs, like loading into a dataset first and then in a seperate job loading from dataset to oracle. does this help. because the DBA is not so happy dropping the indexes. i tried the options available in oracle stage, it does not help. please help me with a better performance tuning for this.
thanks
ETL DEVELOPER
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Wait, you have an Oracle DBA? They why aren't they telling you why the load is slow? It's their job to optimize the performance of that database. They should be able to monitor the table while DataStage is running to find out what the bottleneck is. You can disable the index from the DataStage before-sql and after-sql tabs so the index is only down while the load is happening but then you need to do some additional referential integrity checking in DataStage to avoid creation a situation where the index wont come back on.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Hello,
We had some problems too with OCI and lot of rows.
We decide to read table using "Read Method=Table" and "Partition Table="your_table.
Also, you can test in the Additional Connection adding the ArraySize=#PAR_ORA_ARRAY_SIZE# where #PAR_ORA_ARRAY_SIZE# equals to a value that you have to test. (Try 15000 first).
For the update way, see your DBA and the "INI_TRANS" table value. Put it to 8 because you have 8 nodes.
HTH
We had some problems too with OCI and lot of rows.
We decide to read table using "Read Method=Table" and "Partition Table="your_table.
Also, you can test in the Additional Connection adding the ArraySize=#PAR_ORA_ARRAY_SIZE# where #PAR_ORA_ARRAY_SIZE# equals to a value that you have to test. (Try 15000 first).
For the update way, see your DBA and the "INI_TRANS" table value. Put it to 8 because you have 8 nodes.
HTH