Oracle Connector - Read in Parallel

nagarjuna · Post by **nagarjuna** » Tue Jan 03, 2012 2:30 pm

Hi ,

I am using the connectors in Datastage 8.5 version . I am just checking different options in the connectors . I am using Oracle connector to read a full table in parallel & write to dataset . The table contains nearly 100 million records . I checked the documentation & found that we have
following options to read data in parallel ..
1) Rowid Hash uses rowid , ora_hash , Mod functions

2) Rowid Range uses rowid

3) Oracle partitions uses oracle partitions on the table

4) Modulus uses a column & mod function

5) Min Max range Uses a columns & finds the range

6) Rowid Round Robin uses Mod & Row_id_row_number functions

Here the table from which I am unloading is not partitioned & I am unloading all columns . Trying to figure out , what is the best method to use ?

According my requirement , 3rd & 4th are not the options . Can somebody explain the best method to read in this scenario ? I have limited access to check the functionality of all the above options .

Please do let me know if the requirement is not clear .

Thanks in advance .

ray.wurlod · Post by **ray.wurlod** » Tue Jan 03, 2012 2:54 pm

The requirement is not clear.

You can read about these Oracle partitioning mechanisms in Oracle manuals or on the internet.

nagarjuna · Post by **nagarjuna** » Tue Jan 03, 2012 3:08 pm

Thanks Ray for your response .

In short , My question is : What is the preferred method ( Rowhash vs RowRange ) for extracting data in parallel ?Which one gives a good performance ? I don't have indexes on the table .It's a simple query ( Select all columns from table) & table is not partitioned .

nagarjuna · Post by **nagarjuna** » Tue Jan 03, 2012 7:39 pm

I searched in internet & I got very limited info about these . I am not so sure if these ( Row Range , Row Hash ) are related to oracle partitioning methods .
Any inputs would greatly appreciated .

ray.wurlod · Post by **ray.wurlod** » Tue Jan 03, 2012 8:35 pm

They're definitely not DataStage partitioning algorithms.

chulett · Post by **chulett** » Tue Jan 03, 2012 9:12 pm

They're definitely Oracle partitioning algorithms.

qt_ky · Post by **qt_ky** » Tue Jan 03, 2012 10:24 pm

The 6 options listed are choices in the drop-down menu of the Oracle Connector stage property "Partitioned reads method" which is under the "Enable partitioned reads" Yes/No option.

These are documented in the Connectivity Guide for Oracle Databases PDF that should be on your client. If not, it's on the IBM web site.

Support for partitioned read methods:
The connector supports these partitioned read methods: Rowid range, Rowid round robin, Rowid hash, Modulus, Minimum and maximum range, and Oracle partitions.

For all partitioned read methods except the Oracle partitions method, the connector modifies the WHERE clause in the specified SELECT statement. If the WHERE clause is not included in the specified SELECT statement, the connector adds a WHERE clause. For the Oracle partitions method, the connector modifies the specified SELECT statement by adding a PARTITON(partition_name) clause. When the specified SELECT statement contains subqueries, the connector modifies the first SELECT...FROM subquery in the SELECT statement.

It goes on to describe each method...

I doubt any of this applies in your case because your table is not partitioned in Oracle to begin with.

chulett · Post by **chulett** » Wed Jan 04, 2012 6:40 am

Ding ding ding!

chandra.shekhar@tcs.com · Wed Jan 04, 2012 7:00 am

Keep your settings as follows:
Partition read method :Rowid Range
Increase your array size and row count 20000 and 40000.

and check it.

as per the server idle ratio, increase the count of array size and row count.
do not enter any value in below two checks.

1>Table name for partition reads.
2>Partition or sub partition name for partition read

Thanks

nagarjuna · Post by **nagarjuna** » Wed Jan 04, 2012 7:50 am

Thanks Eric for the response . I read the documentation .Howver , I am not able to figure it out which option is better RowHash or RowRange .The table may not be partitioned but those options are still applicable in my case . I tried with Row Hash & it is firing 4 parallel queries as I am running on 4 nodes .

qt_ky · Post by **qt_ky** » Wed Jan 04, 2012 12:46 pm

You're welcome!

How you partition before landing data in the dataset should depend on what you need to do with the data in subsequent jobs.