DSXchange

Posted: **Tue Dec 20, 2011 4:29 pm**

Hi,

I have a simple one to one mapping job is taking long time to run. My job design is OraOci -> Trf -> OraOCI.

To Process 4440 records it took 30mins and the other day it took 58 mins to procsess just 125 records in Production. We have a user defined source extract query to pick the data based on dates. The query is

Code: Select all

select * from (select LZ.*,row_number() over (partition by incident_id order by incident_id) rn from lz_incident lz where lz.LAST_UPDATED_DATE >=  to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS')  and 
lz.LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS')  and  lz.last_updated_date in
 (select max(last_updated_date) from lz_incident group by incident_id)) where rn=1
and
LAST_UPDATED_DATE >=  to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS') 
and LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS')

The target Write Method is 'Load'. I really don't understand the behaviour of this job. Can anybody help me where do I look into the issue.

Thanks in advance.
RJ

Posted: **Tue Dec 20, 2011 4:38 pm**

Split the job into 2 jobs (extracts to dataset) and (dataset to load) and see if it improves.

Posted: **Tue Dec 20, 2011 6:33 pm**

I ran into that on 7.5 some years back with same job design taking 11 hours. After splitting the job exactly as samdsx suggested here, the total time went to 30 minutes on millions of rows. I never found the root cause for performance issue within a single job.

Posted: **Tue Dec 20, 2011 7:06 pm**

OraOci -> peek-->Trf ->peek-->OraOCI

Try this.

DS User

Posted: **Tue Dec 20, 2011 11:12 pm**

SURA wrote:OraOci -> peek-->Trf ->peek-->OraOCI

Try this.

DS User

Why? What do you believe it will accomplish in terms of the original question?

Posted: **Tue Dec 20, 2011 11:23 pm**

How long it takes, if you run the below query in toad or any other clients?

Code: Select all

select * from (select LZ.*,row_number() over (partition by incident_id order by incident_id) rn from lz_incident lz where lz.LAST_UPDATED_DATE >=  to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS')  and
lz.LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS')  and  lz.last_updated_date in
 (select max(last_updated_date) from lz_incident group by incident_id)) where rn=1
and
LAST_UPDATED_DATE >=  to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS')
and LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS')

Posted: **Tue Dec 20, 2011 11:34 pm**

Ray

1) If an issue in the source and it took time(!) to pull the data from the table.

In that case the option what i said will not help.

Whereas something wrong while process in or after tfm on that case if peek is used, you can find the data flow without hanging somewhere is the whole aim.

In other way to narrow down...

DS User

Posted: **Wed Dec 21, 2011 12:31 am**

Is this a parallel or server job? You have OracleOCI listed as the source stage....

Is the table lz_incident indexed on LAST_UPDATED_DATE?

Posted: **Wed Dec 21, 2011 1:58 am**

Hi,
Could You please give the stats of this extract query in both Datastage and Toad ?
Please index columns for Joins
I am not sure about this table and it key column
so i blindly giving this query..
WITH VAL AS ( KEYCOLUMNS,rn )
(SELECT KEYCOLUMNS ,row_number() over (partition by incident_id order by
incident_id) rn from
lz_incident lz
where
lz.LAST_UPDATED_DATE >= to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS')
and lz.LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS') )

SELECT LZ.*,A.rn FROM lz_incident LZ ,VAL A
WHERE
L2 KEYCOLUMNS = A.KEYCOLUMNS
AND rn= 1 AND LAST_UPDATED_DATE >= to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS')
and LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS')

after this extract use sort stage and remove dups stage to get max(LAST_UPDATED_DATE) by its grouping column

Please let me know if i am wrong

Posted: **Wed Dec 21, 2011 1:59 am**

Hi,
Could You please give the stats of this extract query in both Datastage and Toad ?
Please index columns for Joins
I am not sure about this table and it key column
so i blindly giving this query..
WITH VAL AS ( KEYCOLUMNS,rn )
(SELECT KEYCOLUMNS ,row_number() over (partition by incident_id order by
incident_id) rn from
lz_incident lz
where
lz.LAST_UPDATED_DATE >= to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS')
and lz.LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS') )

SELECT LZ.*,A.rn FROM lz_incident LZ ,VAL A
WHERE
L2 KEYCOLUMNS = A.KEYCOLUMNS
AND rn= 1 AND LAST_UPDATED_DATE >= to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS')
and LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS')

after this extract use sort stage and remove dups stage to get max(LAST_UPDATED_DATE) by its grouping column

Please let me know if i am wrong

DSXchange

Job is running very slow

Job is running very slow

Re: Job is running very slow

Re: Job is running very slow

Re: Job is running very slow

Re: Job is running very slow