Performance issue reading data from Oracle connector

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

dsuser7
Participant
Posts: 41
Joined: Sat Nov 29, 2008 2:21 am

Performance issue reading data from Oracle connector

Post by dsuser7 »

Hi All,

I'm select few columns from a table with around 70mil records (all records are being selected, no where clause).

And joining the data from the above with a data set of around 7 mil records.

When 6 fields are selected from oracle stage, the job finishes in less than 5 mins whereas the job takes around 30 mins to finish when 30 fields are selected.

Is this something to do with indexes?

When 6 are selected most fields have indexes defined except for 1, when 30 fields are selected, all except 4 fields have indexes defined on them.

Thanks for any help!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Can you make a copy of the job with no join, just read your Oracle source and write to a PEEK stage, measuring the time for 6 and for 30 fields. Also, can you estimate the size in bytes of the average record length for 6 and 30 field options?
dsuser7
Participant
Posts: 41
Joined: Sat Nov 29, 2008 2:21 am

Post by dsuser7 »

Thank you for your reply.

I have run jobs as suggested and here are the results:

1. Job with 6 columns runs 3 min 3 sec [148720 rows/sec] --- Avg record length in bytes = 57
2. Job with 30 columns runs 7 min 26 secs [368061 rows/sec] --- Avg record length in bytes = 143

Is the difference due to indexes?
dsuser7
Participant
Posts: 41
Joined: Sat Nov 29, 2008 2:21 am

Post by dsuser7 »

Would creating indexes on the other columns help performance of the job?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

With no where clause? Nope.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsuser7
Participant
Posts: 41
Joined: Sat Nov 29, 2008 2:21 am

Post by dsuser7 »

There is no where clause in the select query. So the increase in time taken for the job to finish is just because the number of columns selected are more (bytes are more)?
dsuser7
Participant
Posts: 41
Joined: Sat Nov 29, 2008 2:21 am

Post by dsuser7 »

Yikes, just realized, I inverted the rows/sec stats for the jobs, it should be

1. Job with 6 columns runs 3 min 3 sec [368,061 rows/sec] --- Avg record length in bytes = 57
2. Job with 30 columns runs 7 min 26 secs [148,720 rows/sec] --- Avg record length in bytes = 143
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes. At a very high level, more bytes = more time.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsuser7
Participant
Posts: 41
Joined: Sat Nov 29, 2008 2:21 am

Post by dsuser7 »

This really breaks my heart, because the time difference between 3.3min and 7min may not seem like much; but the actual job I'm concerned with joins 70 mil records with 7 mil dataset and it take around 40min.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

In response to your stats - Rows per second is a fairly meaningless metric. You're better off comparing bytes per second using the total actual bytes rather than the ARL.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Performance issue reading data from Oracle connector

Post by chulett »

dsuser7 wrote:When 6 fields are selected from oracle stage, the job finishes in less than 5 mins whereas the job takes around 30 mins to finish when 30 fields are selected.
FYI - The time differential comes from the join and the amount of data it is throwing around, not any perceived 'performance issue' with the select from the Oracle connector.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsuser7
Participant
Posts: 41
Joined: Sat Nov 29, 2008 2:21 am

Re: Performance issue reading data from Oracle connector

Post by dsuser7 »

Thanks Craig, just thought of the same!
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Performance issue reading data from Oracle connector

Post by kwwilliams »

Have you tried enabling partitioned reads? If the table is a partitioned table you can have it read from each Oracle partition. If the table is not partitioned you can still read in parallel:

1. Enable partitioned reads = Yes
2. Partitioned reads method = Rowid round robin
3. table name for partitioned reads = The table name you are extracting data from

Doing this will create an Oracle process for each node in your configuration file, which should increase the number of bytes being read per second to help you reduce the overall runtime of the job.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Re: Performance issue reading data from Oracle connector

Post by prasson_ibm »

Hi,
I have tested the parallel read with this logic but it seems data is duplicated in other partitions. :(
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If that is true with rowid round robin partitioning then you have duplicate rows in source.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply