Performance issue reading data from Oracle connector
Moderators: chulett, rschirm, roy
Performance issue reading data from Oracle connector
Hi All,
I'm select few columns from a table with around 70mil records (all records are being selected, no where clause).
And joining the data from the above with a data set of around 7 mil records.
When 6 fields are selected from oracle stage, the job finishes in less than 5 mins whereas the job takes around 30 mins to finish when 30 fields are selected.
Is this something to do with indexes?
When 6 are selected most fields have indexes defined except for 1, when 30 fields are selected, all except 4 fields have indexes defined on them.
Thanks for any help!
I'm select few columns from a table with around 70mil records (all records are being selected, no where clause).
And joining the data from the above with a data set of around 7 mil records.
When 6 fields are selected from oracle stage, the job finishes in less than 5 mins whereas the job takes around 30 mins to finish when 30 fields are selected.
Is this something to do with indexes?
When 6 are selected most fields have indexes defined except for 1, when 30 fields are selected, all except 4 fields have indexes defined on them.
Thanks for any help!
Can you make a copy of the job with no join, just read your Oracle source and write to a PEEK stage, measuring the time for 6 and for 30 fields. Also, can you estimate the size in bytes of the average record length for 6 and 30 field options?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Thank you for your reply.
I have run jobs as suggested and here are the results:
1. Job with 6 columns runs 3 min 3 sec [148720 rows/sec] --- Avg record length in bytes = 57
2. Job with 30 columns runs 7 min 26 secs [368061 rows/sec] --- Avg record length in bytes = 143
Is the difference due to indexes?
I have run jobs as suggested and here are the results:
1. Job with 6 columns runs 3 min 3 sec [148720 rows/sec] --- Avg record length in bytes = 57
2. Job with 30 columns runs 7 min 26 secs [368061 rows/sec] --- Avg record length in bytes = 143
Is the difference due to indexes?
Re: Performance issue reading data from Oracle connector
FYI - The time differential comes from the join and the amount of data it is throwing around, not any perceived 'performance issue' with the select from the Oracle connector.dsuser7 wrote:When 6 fields are selected from oracle stage, the job finishes in less than 5 mins whereas the job takes around 30 mins to finish when 30 fields are selected.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Re: Performance issue reading data from Oracle connector
Thanks Craig, just thought of the same!
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
Re: Performance issue reading data from Oracle connector
Have you tried enabling partitioned reads? If the table is a partitioned table you can have it read from each Oracle partition. If the table is not partitioned you can still read in parallel:
1. Enable partitioned reads = Yes
2. Partitioned reads method = Rowid round robin
3. table name for partitioned reads = The table name you are extracting data from
Doing this will create an Oracle process for each node in your configuration file, which should increase the number of bytes being read per second to help you reduce the overall runtime of the job.
1. Enable partitioned reads = Yes
2. Partitioned reads method = Rowid round robin
3. table name for partitioned reads = The table name you are extracting data from
Doing this will create an Oracle process for each node in your configuration file, which should increase the number of bytes being read per second to help you reduce the overall runtime of the job.
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
-
- Premium Member
- Posts: 536
- Joined: Thu Oct 11, 2007 1:48 am
- Location: Bangalore
Re: Performance issue reading data from Oracle connector
Hi,
I have tested the parallel read with this logic but it seems data is duplicated in other partitions.
I have tested the parallel read with this logic but it seems data is duplicated in other partitions.
Thanks
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: