Running Oracle connector in parallel mode

akarsh · Post by **akarsh** » Wed Feb 05, 2014 4:03 am

Hi,

I am trying to extract data from oracle table using oracle connector.

execution mode is default parallel.

When i am setting Enable partitioned reads option yes and giving partitioning read method as rowid hash, sql giving more no of records rather than if i keep Enable partitioned reads option as no. (Also i queried the DB directly and its matching with the record count if Enable partitioned reads option is No)

My intention is to run this stage parallel for better performance.

Please suggest how can i do this.

boxtoby · Post by **boxtoby** » Wed Feb 05, 2014 10:21 am

Hi

I tried using the parallel reads option and it actually reduced the rows per second from 11k to 8k. I suspect this is because DS inserts an additional where clause to do the partitioning which simply slows down the processing.

In addition, because I was using my own sql statement with table aliases, I had to remove the aliases as it conflicted with the inserted where clause.

No problem with auto generated sql statements.

I also got double the number of rows on a look up link but I'm not sure why!

Anyway, I would be interected to know how you get on with performance.

Hope this helps,
Bob.

akarsh · Post by **akarsh** » Fri Feb 14, 2014 6:41 am

Hi All,

Still help is needed.

Problem is i have 90M record in my oracle table. While extracting it is reading appx 10K record per sec and taking 2.5 hrs. Please suggest how can i increase extraction rate.

boxtoby · Post by **boxtoby** » Fri Feb 14, 2014 7:14 am

Hi,

I think I would start with oracle itself.

If you run your sql statement in an oracle IDE does it still take 2.5 hours to complete?

Is the table partioned? (A table with 90m rows is a good candidate for partioning!)

Have the stats been run on the table recently? (Your dba will know) Poor stats will slow things done significantly.

Cheers,
Bob.

chulett · Post by **chulett** » Fri Feb 14, 2014 7:40 am

First you need to determine the bottleneck which may very well be in the job or the target rather than 'the extraction'. Use an @FALSE constraint so that nothing is done other than the reads, how long does it take then?

akarsh · Post by **akarsh** » Mon Feb 17, 2014 1:55 am

Hi Chullet,

I am just using the below query
select
Field1
from Table

And taking distinct value using sort stage as its huge data.
Job is first reading data from Oracle stage and then passing to sort stage and not using pipelining.
Also please tell me where to use @FALSE constraint.

boxtoby · Post by **boxtoby** » Mon Feb 17, 2014 4:16 am

Hi,

Try changing your sql to:

select DISTINCT
Field1
from Table

This will considerably reduce the amount of data coming from the database server which alone should speed things up.

This should also remove the need for the sort stage which again should speed things up.

Hope this help,
Bob.

akarsh · Post by **akarsh** » Mon Feb 17, 2014 5:36 am

Hi BOB,

Earlier i was using DISTINCT in query itself but was taking more time. So i am doing it in DS itself.

akarsh · Post by **akarsh** » Mon Feb 17, 2014 7:05 am

Hi BOB,

I added DISTINCT in query and also changed some settings as below
Array Size - 40000
PreFetch Row Count - 40000
PreFetch Buffer - 4096

Now job is taking just 4.5 mins, but results are not matching
In DS
select DISTINCT
Field1
from Table
giving o/p as 573684 records
In Oracle DB
select DISTINCT
Field1
from Table
giving o/p as
1022779

Not sure why such huge difference.

boxtoby · Post by **boxtoby** » Mon Feb 17, 2014 7:11 am

Hi,

That is strange.

Check the director log for any warnings which might indicate why records are being lost.

Are DS and oracle looking at the same schema.table?

Remove all parallel settings from the stage, ie. keep it as simple as possible.

Cheers,
Bob.

akarsh · Post by **akarsh** » Mon Feb 17, 2014 7:43 am

Thanks BOB :D
In oracle was checking in diff schema :D
Finally Job took only 3.35 Mins, Below are the settings

Transaction Settings

Record Count - 40000
Mark End of wave - Yes

Session Settings

Array Size - 40000
PreFetch Row Count - 40000
PreFetch Buffer - 4096

chulett · Post by **chulett** » Mon Feb 17, 2014 8:20 am

akarsh wrote:please tell me where to use @FALSE constraint.

Constraints are an element of Transformer output links. For the record, you could have added one before the Sort stage and set that value there to gauge how long the source portion of the job consumed. The remainder would have been either the sorting or the target writes and (if needed) you could continue to play the same game to isolate each.

ssnegi · Post by **ssnegi** » Mon Feb 17, 2014 9:58 pm

Use parallel hint. It speeds up processing considerably.

select /*+ parallel(table,4) */ DISTINCT
Field1
from Table

This is for 4 processors. you can optimize for any number.

chulett · Post by **chulett** » Mon Feb 17, 2014 11:46 pm

It may speed up processing, maybe even considerably. But as always, it... depends. On many things. SMP or MPP architecture, I/O bandwidth, available CPUs and memory for sorting / hashing / I/O buffers for example are needed. And generally it doesn't help a simple select like this one. Never mind that Oracle will happily ignore the request if that number of parallel server processes are not available.

And just to clarify, the hint is for the degree of parallelism, four parallel server processses are created which has nothing to do with the number of processors.

DSXchange