Running Oracle connector in parallel mode

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
akarsh
Participant
Posts: 51
Joined: Fri May 09, 2008 4:03 am
Location: Pune

Running Oracle connector in parallel mode

Post by akarsh »

Hi,

I am trying to extract data from oracle table using oracle connector.

execution mode is default parallel.

When i am setting Enable partitioned reads option yes and giving partitioning read method as rowid hash, sql giving more no of records rather than if i keep Enable partitioned reads option as no. (Also i queried the DB directly and its matching with the record count if Enable partitioned reads option is No)

My intention is to run this stage parallel for better performance.

Please suggest how can i do this.
Thanks,
Akarsh Kapoor
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Hi

I tried using the parallel reads option and it actually reduced the rows per second from 11k to 8k. I suspect this is because DS inserts an additional where clause to do the partitioning which simply slows down the processing.

In addition, because I was using my own sql statement with table aliases, I had to remove the aliases as it conflicted with the inserted where clause.

No problem with auto generated sql statements.

I also got double the number of rows on a look up link but I'm not sure why!

Anyway, I would be interected to know how you get on with performance.

Hope this helps,
Bob.
Bob Oxtoby
akarsh
Participant
Posts: 51
Joined: Fri May 09, 2008 4:03 am
Location: Pune

Running Oracle connector in parallel mode

Post by akarsh »

Hi All,

Still help is needed.

Problem is i have 90M record in my oracle table. While extracting it is reading appx 10K record per sec and taking 2.5 hrs. Please suggest how can i increase extraction rate.
Thanks,
Akarsh Kapoor
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Hi,

I think I would start with oracle itself.

If you run your sql statement in an oracle IDE does it still take 2.5 hours to complete?

Is the table partioned? (A table with 90m rows is a good candidate for partioning!)

Have the stats been run on the table recently? (Your dba will know) Poor stats will slow things done significantly.

Cheers,
Bob.
Bob Oxtoby
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

First you need to determine the bottleneck which may very well be in the job or the target rather than 'the extraction'. Use an @FALSE constraint so that nothing is done other than the reads, how long does it take then?
-craig

"You can never have too many knives" -- Logan Nine Fingers
akarsh
Participant
Posts: 51
Joined: Fri May 09, 2008 4:03 am
Location: Pune

Running Oracle connector in parallel mode

Post by akarsh »

Hi Chullet,

I am just using the below query
select
Field1
from Table

And taking distinct value using sort stage as its huge data.
Job is first reading data from Oracle stage and then passing to sort stage and not using pipelining.
Also please tell me where to use @FALSE constraint.
Thanks,
Akarsh Kapoor
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Hi,

Try changing your sql to:

select DISTINCT
Field1
from Table

This will considerably reduce the amount of data coming from the database server which alone should speed things up.

This should also remove the need for the sort stage which again should speed things up.

Hope this help,
Bob.
Bob Oxtoby
akarsh
Participant
Posts: 51
Joined: Fri May 09, 2008 4:03 am
Location: Pune

Running Oracle connector in parallel mode

Post by akarsh »

Hi BOB,

Earlier i was using DISTINCT in query itself but was taking more time. So i am doing it in DS itself.
Thanks,
Akarsh Kapoor
akarsh
Participant
Posts: 51
Joined: Fri May 09, 2008 4:03 am
Location: Pune

Re: Running Oracle connector in parallel mode

Post by akarsh »

Hi BOB,

I added DISTINCT in query and also changed some settings as below
Array Size - 40000
PreFetch Row Count - 40000
PreFetch Buffer - 4096

Now job is taking just 4.5 mins, but results are not matching
In DS
select DISTINCT
Field1
from Table
giving o/p as 573684 records
In Oracle DB
select DISTINCT
Field1
from Table
giving o/p as
1022779

Not sure why such huge difference.
Thanks,
Akarsh Kapoor
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Hi,

That is strange.

Check the director log for any warnings which might indicate why records are being lost.

Are DS and oracle looking at the same schema.table?

Remove all parallel settings from the stage, ie. keep it as simple as possible.

Cheers,
Bob.
Bob Oxtoby
akarsh
Participant
Posts: 51
Joined: Fri May 09, 2008 4:03 am
Location: Pune

Post by akarsh »

Thanks BOB :D
In oracle was checking in diff schema :D
Finally Job took only 3.35 Mins, Below are the settings

Transaction Settings

Record Count - 40000
Mark End of wave - Yes

Session Settings

Array Size - 40000
PreFetch Row Count - 40000
PreFetch Buffer - 4096
Thanks,
Akarsh Kapoor
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Running Oracle connector in parallel mode

Post by chulett »

akarsh wrote:please tell me where to use @FALSE constraint.
Constraints are an element of Transformer output links. For the record, you could have added one before the Sort stage and set that value there to gauge how long the source portion of the job consumed. The remainder would have been either the sorting or the target writes and (if needed) you could continue to play the same game to isolate each.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Reply

Post by ssnegi »

Use parallel hint. It speeds up processing considerably.

select /*+ parallel(table,4) */ DISTINCT
Field1
from Table

This is for 4 processors. you can optimize for any number.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It may speed up processing, maybe even considerably. But as always, it... depends. On many things. SMP or MPP architecture, I/O bandwidth, available CPUs and memory for sorting / hashing / I/O buffers for example are needed. And generally it doesn't help a simple select like this one. Never mind that Oracle will happily ignore the request if that number of parallel server processes are not available.

And just to clarify, the hint is for the degree of parallelism, four parallel server processses are created which has nothing to do with the number of processors.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply