[Resolved] Partial Unload of records with Oracle Enterprise

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

[Resolved] Partial Unload of records with Oracle Enterprise

Post by roy »

Hi All,
I'm facing a mind bogling situation and I'd love some help to resolve it.

Enterprise Edition 7.5.0.1
Oracle 8.1.7
(both on same machine Sun os)

I have an over 200 million rows table of 5 columns.
I built both a server job and a PX job (using Oracle enterprise stage in PX) both gives the over 200 million rows number.
Table is partitioned with 29 parts.


I designed 2 Px jobs:
1. Using Oracle Enterprise stage > modify > seq file > peek to log for rejects.
2. Same job but using DRS instead Oracle Enterprise.

Job 1 returns almost 130 million rows written to seq file (around 7GB file).

Job 2 returns all over 200 million rows written to seq file as expected(around 12GB file).

Both jobs finished with no warnings nor rejects
Both write to the same FS.

Any more info I forgot to supply will be posted by request.

Unloading it using DRS 4500 rows/seconds in over 12 hours is not an option.

Naturally A case with support is already open and a solution when found will be update here.
(simply since not resolving this is not an option)

Any idea people :?:

(was hoping for a :P on my 1600th post :( )

Thanks in advance,
Last edited by roy on Sun Sep 25, 2005 8:33 am, edited 1 time in total.
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
cmmurari
Participant
Posts: 34
Joined: Sun Jan 02, 2005 9:55 am
Location: Singapore

Post by cmmurari »

Hi Roy,

I am quite confused, how come you have difference in row count

Job 1 returns almost 130 million rows written to seq file (around 7GB file).

Job 2 returns all over 200 million rows written to seq file as expected (around 12GB file).

pls help to give some information

Thanks,
Krishna
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Well if I knew that I would have posted a FAQ post with a solution :)
I need to find out why this happens :(
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How many processing nodes? If more than one, what partitioning did you use? Was the Oracle Enterprise stage trying to SELECT in parallel? Any warnings logged? Have you tried forcing the Oracle Enterprise stage to execute in sequential mode (as a diagnostic technique only!)?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi Ray :),
I tried 4, 6 (showed X5 processes in director monitor) and 29 nodes (only 10 processes showed on the monitor when I run with the 29 nodes though and cpu was around 30% idle of it's 4 dual processors)

Funny (as in fishy) thing is that always the first node showed no rows processed in the director monitor.
max thruput I got was arround 80K r/s and after 26 minutes the job finishes having read almost 130 Million rows.

As I mentioned no warning were logged and no rejects.

I'll try to coordinate a DBA to monitor one of my runs personally.

I used both table and auto generated as read method and put the partitioned table smae as the table property (fully qualified) other then that there is no partitioning definition to set.
I'll ry running it with 1 node configuration as well (that will take around 12+ hours as the DRS did, so I'll only have results on Sunday :()
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
I've tested with Oracle Enterprise 1 node also gives only 130 million rows :(.
I run a select sum(cnt) from (seect count(*) cnt from table partition(part_name1) union select count(*) cnt from table partition(part_name2) ....);
on the entire set of partitions showing from the sql navigator and it gives the over 200 million rows number.

I'm going to dro pthe partition table clause in the Oracle Enterprise stage and see how many rows are returned.

Meanwhile no real progress down the support chanels yet :(.

Ideas are welcome :),
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Well it seems that dropping the partition table clause from the Oracle Enterprise stage does the trick.
So I'm left with 1 process reading the Table and later partitioning takes place.
So far it seems to be reading resonably fast around 50-60k records/second downloading to a seq file thru a modify to handle nulls.
I guess I'm lucky it only has 5 columns.

I wonder why using partitioning messes everything :( :?:
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
cmmurari
Participant
Posts: 34
Joined: Sun Jan 02, 2005 9:55 am
Location: Singapore

Post by cmmurari »

I have extracted 172 million records from source(oracle 10G) using Oracle Enterprise Stage and write to Datasets. its working fine no issue. :lol:

roy, pls check in your job do you have execution mode limitation per partition or any special restriction on job runtime options .

Cheers,
Krish
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Krish,
If you can direct me to specific things to check I'll be glad to try it.
To be on the safe side I rebuilt the job from scratch with no constraints just strait forward unload as my test case, so I don't see anything that should have constrained the unload.
Are you using the partition table clause in your Oracle stage?
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
cmmurari
Participant
Posts: 34
Joined: Sun Jan 02, 2005 9:55 am
Location: Singapore

Post by cmmurari »

Hi Roy,

Just i choose Oracle Source Read Method >> Table. and I was not used source property "partition table" .
Regarding Execution properties you can see in Job Properties one of tab Execution and One textbox i.e advanced runtime option

Cheers,
krish,
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Well Having seen your post after I marked this issue as resolved in the topic and posting the reason this happened and seeing its working fine now I assumed you posted since you had it working with the partition table option, otherwise there was no need for your post once the issue is resolved.
No offence and don't get disscorraged from posting, I've been in this situation myself now and them :wink:
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Well I finally managed to hunt down this topic :twisted:
The issue was finally resolved by implementing px75a-ecase-62553 patch supplied by the vendor
Now it seems to work with the partition table option as well.
Were still testing it but if no new post will be made soon assume it was resolved :)
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Post Reply