Job is running very slow

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ravij
Premium Member
Premium Member
Posts: 170
Joined: Mon Oct 10, 2005 7:04 am
Location: India

Job is running very slow

Post by ravij »

Hi,

I have a simple one to one mapping job is taking long time to run. My job design is OraOci -> Trf -> OraOCI.

To Process 4440 records it took 30mins and the other day it took 58 mins to procsess just 125 records in Production. We have a user defined source extract query to pick the data based on dates. The query is

Code: Select all

select * from (select LZ.*,row_number() over (partition by incident_id order by incident_id) rn from lz_incident lz where lz.LAST_UPDATED_DATE >=  to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS')  and 
lz.LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS')  and  lz.last_updated_date in
 (select max(last_updated_date) from lz_incident group by incident_id)) where rn=1
and
LAST_UPDATED_DATE >=  to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS') 
and LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS')
The target Write Method is 'Load'. I really don't understand the behaviour of this job. Can anybody help me where do I look into the issue.

Thanks in advance.
RJ
Ravi
samdsx
Premium Member
Premium Member
Posts: 19
Joined: Wed Aug 18, 2010 8:48 pm

Re: Job is running very slow

Post by samdsx »

Split the job into 2 jobs (extracts to dataset) and (dataset to load) and see if it improves.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

I ran into that on 7.5 some years back with same job design taking 11 hours. After splitting the job exactly as samdsx suggested here, the total time went to 30 minutes on millions of rows. I never found the root cause for performance issue within a single job.
Choose a job you love, and you will never have to work a day in your life. - Confucius
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Job is running very slow

Post by SURA »

OraOci -> peek-->Trf ->peek-->OraOCI

Try this.

DS User
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: Job is running very slow

Post by ray.wurlod »

SURA wrote:OraOci -> peek-->Trf ->peek-->OraOCI

Try this.

DS User
Why? What do you believe it will accomplish in terms of the original question?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

How long it takes, if you run the below query in toad or any other clients?

Code: Select all

select * from (select LZ.*,row_number() over (partition by incident_id order by incident_id) rn from lz_incident lz where lz.LAST_UPDATED_DATE >=  to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS')  and
lz.LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS')  and  lz.last_updated_date in
 (select max(last_updated_date) from lz_incident group by incident_id)) where rn=1
and
LAST_UPDATED_DATE >=  to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS')
and LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS')
pandeeswaran
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Job is running very slow

Post by SURA »

Ray

1) If an issue in the source and it took time(!) to pull the data from the table.

In that case the option what i said will not help.

Whereas something wrong while process in or after tfm on that case if peek is used, you can find the data flow without hanging somewhere is the whole aim.

In other way to narrow down...

DS User
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Is this a parallel or server job? You have OracleOCI listed as the source stage....

Is the table lz_incident indexed on LAST_UPDATED_DATE?
- james wiles


All generalizations are false, including this one - Mark Twain.
manoj_23sakthi
Participant
Posts: 47
Joined: Tue Feb 23, 2010 12:16 am
Location: CHENNAI

Post by manoj_23sakthi »

Hi,
Could You please give the stats of this extract query in both Datastage and Toad ?
Please index columns for Joins
I am not sure about this table and it key column
so i blindly giving this query..
WITH VAL AS ( KEYCOLUMNS,rn )
(SELECT KEYCOLUMNS ,row_number() over (partition by incident_id order by
incident_id) rn from
lz_incident lz
where
lz.LAST_UPDATED_DATE >= to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS')
and lz.LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS') )

SELECT LZ.*,A.rn FROM lz_incident LZ ,VAL A
WHERE
L2 KEYCOLUMNS = A.KEYCOLUMNS
AND rn= 1 AND LAST_UPDATED_DATE >= to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS')
and LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS')

after this extract use sort stage and remove dups stage to get max(LAST_UPDATED_DATE) by its grouping column



Please let me know if i am wrong
manoj_23sakthi
Participant
Posts: 47
Joined: Tue Feb 23, 2010 12:16 am
Location: CHENNAI

Post by manoj_23sakthi »

Hi,
Could You please give the stats of this extract query in both Datastage and Toad ?
Please index columns for Joins
I am not sure about this table and it key column
so i blindly giving this query..
WITH VAL AS ( KEYCOLUMNS,rn )
(SELECT KEYCOLUMNS ,row_number() over (partition by incident_id order by
incident_id) rn from
lz_incident lz
where
lz.LAST_UPDATED_DATE >= to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS')
and lz.LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS') )

SELECT LZ.*,A.rn FROM lz_incident LZ ,VAL A
WHERE
L2 KEYCOLUMNS = A.KEYCOLUMNS
AND rn= 1 AND LAST_UPDATED_DATE >= to_date('#pFROM_DATE#', 'YYYY-MM-DD-HH24:MI:SS')
and LAST_UPDATED_DATE < to_date('#pTO_DATE#', 'YYYY-MM-DD-HH24:MI:SS')

after this extract use sort stage and remove dups stage to get max(LAST_UPDATED_DATE) by its grouping column



Please let me know if i am wrong
Post Reply