Loading a Oracle table with Partitions

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Loading a Oracle table with Partitions

Post by jreddy »

Hello,
I need to load 23Mil rows from a view (from another DB on another box) to a table, that is partitioned on 2 columns and lies on yet another box.

Am using Oracle bulk load (Write method LOAD on oracle enterprise stage) to truncate the table and insert data from the source view.
And the performance is very bad, each million takes about a min to load even if this is the only process running on the box and there is minimum activity over the network.

I am assuming that since the source is a view, reading data is sequential, but is there a way to speed up the read and write process. There are no other transformations to data in between, its just a staging process to get data from source into staging area.

thanks
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You first need to identify the bottleneck in your processing. If you write a job that just reads the view into a peek stage what speeds do you achieve - are they the same as when you are writing to Oracle or slower?
jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Post by jreddy »

Thank You ArndW,
I ran a test as you suggested and observed that when i read from a view to a peek stage, the speeds are a little over 5k rows/sec more than what i had when i was writing to the oracle table.

Is there anyway, we can improve the speeds at which we can read data or is it always sequential read when reading from an Oracle source (assuming our source does not have any partitions)

I tried using the Partition table option (on source oracle enterprise stage), but i got the error
APT_OracleRowid8PartTable: failed to get the block count.

and i sort of expected it, as its a view.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I haven't used that option for PX Oracle but would think that you should be able to get it working. But it might be the case that the read speed is being limited by network throughput and not the Oracle database (unlikely in this case, but still possible). You could try doing what we used to do with server jobs - put in a 2nd oracle stage identical to the first and change your SELECT statements to the view so each gets only half the rows (i.e. MOD(CUSTNO,2)=1 or the like) and merge the two streams in the job and still write to a PEEK stage. Has the rows/second speed changed?
jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Post by jreddy »

Thank you again ArndW,
I think we are in the right track, as the performance increased, i'd say i saw double the speeds that i had initially when i split up the source into 2 oracle stages. But i need to increase it further, as its now loading 23 mil in 14 min or so.

But i am not sure i understand how splitting the volume is helping increase the performance.
You mentioned about network throughput, I am the only one running jobs on the DS server or executing queries against this DB, so how is network throughput the culprit in my case.. pls let me know, can they be modified at unix/db level

thanks in advance
jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Post by jreddy »

I forgot to add that i also added a hint to the source SQL, /*+ PARALLEL (tablename) */ and i think that also helped faster reads, but i would definitely like to work more on the lines of increasing network throughput to gain some more performance.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

No, by splitting it into 2 reads you have shown that the bottleneck is not the network. You should look into the detail docs for Oracle enterprise and get parallel reads for this view working, then you can remove the 2nd stage and use the parallelism from the APT_CONFIG file to drive your throughput.
Post Reply