Extraction is very slow

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
opdas
Participant
Posts: 115
Joined: Wed Feb 01, 2006 7:25 am

Extraction is very slow

Post by opdas »

Hi,
I'm extracting data from Oracle database and using Oracle Enterprise stage and the db is partitioned but the problem is that the data fetching rate is very slow < 400 rows/sec.
I'm using Read Method as User defined query and using columns in the where clause which is indexed.
Nobody else is using that table.

The query I'm using is:

Code: Select all

select count(*) from cdr_data where point_tax_code_target is null and 
rate_dt>to_date(to_char(to_date(sysdate-2))||'23:59:00','DD-MON-YYHH24:MI:SS') and 
rate_dt < to_date(to_char(to_date(sysdate-1))||'23:59:00','DD-MON-YYHH24:MI:SS')
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What speed does your job run when you only have your Oracle stage and write to a dataset? What rate does your select statement run at from your favorite client access tool?
opdas
Participant
Posts: 115
Joined: Wed Feb 01, 2006 7:25 am

Post by opdas »

With Toad as client I get results in 15 mins flat but with datastage its taking 1hr 30 mins approx.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

What does your job does other than extraction?
If output of oracle is directly dumped in to dataset, what is the differnce?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
opdas
Participant
Posts: 115
Joined: Wed Feb 01, 2006 7:25 am

Post by opdas »

Yes , The extracted results are directly dumped into a dataset
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

With TOAD did you actually list all the data, or did the selection part just start returning data after 15 minutes? Another test you could try is to do the same user defined SQL in a server job which writes to /dev/null or passes through a transform stage that has a false constraint.

How many nodes have you defined in your APT CONFIG file? What, if any, stage attributes have you set in your Oracle Enterprise stage? How many processes do actually get fired off at runtime (use APT_DUMPSCORE)
opdas
Participant
Posts: 115
Joined: Wed Feb 01, 2006 7:25 am

Post by opdas »

In TOAD the total time taken to run the whole query is 15 mins.
Do i write in a sequential file in serverjob at /dev/null?

we are using 8 nodes in our APT CONFIG file

how do i find the number of process using APT_DUMPSCORE?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Put the dump score attribute in your job parameters, set the value to "true" and look at the dump score in your log file.
opdas
Participant
Posts: 115
Joined: Wed Feb 01, 2006 7:25 am

Post by opdas »

I'm geting this in the log file:

Code: Select all

main_program: This step has 2 datasets:
ds0: {op0[8p] (parallel CDR_DATA_CUX)
      eAny=>eCollectAny
      op1[8p] (parallel APT_TransformOperatorImplV0S1_M_EXTRACT_CDR_DATA_S1_Transformer_1 in Transformer_1)}
ds1: {op1[8p] (parallel APT_TransformOperatorImplV0S1_M_EXTRACT_CDR_DATA_S1_Transformer_1 in Transformer_1)
      =>
      /etlvolume2/Datasets/CDR_DATA_SRC1.ds}
It has 2 operators:
op0[8p] {(parallel CDR_DATA_CUX)
    on nodes (
      node1[op0,p0]
      node2[op0,p1]
      node3[op0,p2]
      node4[op0,p3]
      node5[op0,p4]
      node6[op0,p5]
      node7[op0,p6]
      node8[op0,p7]
    )}
op1[8p] {(parallel APT_TransformOperatorImplV0S1_M_EXTRACT_CDR_DATA_S1_Transformer_1 in Transformer_1)
    on nodes (
      node1[op1,p0]
      node2[op1,p1]
      node3[op1,p2]
      node4[op1,p3]
      node5[op1,p4]
      node6[op1,p5]
      node7[op1,p6]
      node8[op1,p7]
    )}
It runs 16 processes on 8 nodes.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

That looks OK. I just looked at your query - are you sure that is the correct one, since it only returns 1 row of data?
Post Reply