Boosting the Performance of the job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
varshanswamy
Participant
Posts: 48
Joined: Thu Mar 11, 2004 10:32 pm

Boosting the Performance of the job

Post by varshanswamy »

Hi,

I have a job which has an input of 380000 records,
consists of the input sql query as follows

SELECT
A.ACCESS_KEY,
A.START_DT_TM,
A.QRY_STRNG_TEXT
FROM
ACCES_EVENT A
WHERE
A.START_DT_TM >= TO_DATE(#LOGDTTM#,'YYYYMMDD')
AND
A.START_DT_TM < (TO_DATE(#LOGDTTM#,'YYYYMMDD')+1)
AND EXISTS
(
SELECT
WEB_SITE_ID
FROM
WEB_SITE
WHERE
SVCE_ID IN ('GIB', 'HIBM')
AND
WEB_SITE_ID = A.WEB_SITE_ID
)


later on I use a pivot stage and trnasform the incoming access_key, qry string text as access_key, value pair combintaion as follows

input data
1, '9999-12-31','a=b&c=d&e=f'
2,'9999-12-31','a=b&l=m&q=d'

output data
1,'9999-12-31',a=b
1,'9999-12-31',c=d
1,'9999-12-31',e=f
2,'9999-12-31',a=b
2,'9999-12-31',l=m
2,'9999-12-31',q=d

Later I use a transformer to transform this data as

1, '9999-12-31',a, b
1,'9999-12-31',c,d
1,'9999-12-31', e,f
2,'9999-12-31',a,b
2,'9999-12-31',l,m
2,'9999-12-31',q,d

But th problem is the entire operation is takeing 1 hour 53 minutes for around 380000 records.

Could anyone give me tips on how i could improve the performance.

regards,
varsha
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Your SQL query includes a correlated subquery; that's probably the main culprit. Correlated subqueries must be executed for every row of the outer query; you are executing 380001 queries in the database.

Create a job that runs this query into a Peek stage, and time just the query. Seek ways to write the query without using a correlated subquery.

The Pivot and Transformer stages are probably the appropriate mechanisms for the remaining processing. Create a job that takes the result of the query from a persistent Data Set (created by a variant of the earlier test job) and runs it through the Pivot and Transformer logic. That will give you a measure of the cost of that part of the processing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

As suggessted by Ray, avoid using subquery and it will improve the performance.

Why do you want to use this subquery? This can be achieved in a simple join itself.

Regards
Saravanan
satish_valavala
Participant
Posts: 123
Joined: Wed May 18, 2005 7:41 am
Location: USA

Post by satish_valavala »

Yes, avoid that sub query and make a simple join.
While writing user defined SQLs in DS jobs, try to avoid using functions (Eg: DATE() etc.,) in WHERE clause.
Regards
VS
varshanswamy
Participant
Posts: 48
Joined: Thu Mar 11, 2004 10:32 pm

Post by varshanswamy »

elavenil wrote:As suggessted by Ray, avoid using subquery and it will improve the performance.

Why do you want to use this subquery? This can be achieved in a simple join itself.

Regards
Saravanan
I have split the entire process the process which is selecting the data is
and putting it into a dataset. Later another job reads data from the dataset and does the further processing. Still the first job takes less than a second while the job which is doing the processing using pivot and trnasformer is taking a total of 1 hour 50 minutes.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

When you select, ORDER BY the pivot keys. This will help a little.
The only other suggestion I can make is to capture performance statistics for the Pivot and Transformer stages; this will highlight any "hot spots". For example, how complex are your Transformations?
Statistics gathering is enabled on the Tracing tab of the Job Run Options dialog when you run a job from Director.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
varshanswamy
Participant
Posts: 48
Joined: Thu Mar 11, 2004 10:32 pm

Post by varshanswamy »

ray.wurlod wrote:When you select, ORDER BY the pivot keys. This will help a little.
The only other suggestion I can make is to capture performance statistics for the Pivot and Transformer stages; this will highlight any "hot spots". For example, how complex are your Transformations?
Statistics gathering is enabled on the Tracing tab of the Job Run Options dialog when you run a job from Director.
In order to do this do I need to have the tracing options enabled in my administrator, because I am unable to locate the trace tab for Job Run Options dialog when you run a job from Director, and since it is production environment I may not be allowed to do so, could I enable the trace tab only for this job.
lshort
Premium Member
Premium Member
Posts: 139
Joined: Tue Oct 29, 2002 11:40 am
Location: Toronto

Post by lshort »

This is one of those situations where if you dont absolutly need the job to be "Parallel" a simple basic program could do this quite quickly and easily.

I know!...I know! Dorothy, We aint in "Server" anymore. 8)
Lance Short
"infinite diversity in infinite combinations"
***
"The absence of evidence is not evidence of absence."
lshort
Premium Member
Premium Member
Posts: 139
Joined: Tue Oct 29, 2002 11:40 am
Location: Toronto

Post by lshort »

Create a job

Seq -->Transformer-->Seq

Each row read as single field (use delim that does not exist in file ie. '|')
The a use routine simlar to that below to transform the row to the target sequential file. This should work. Or Something very close to it.
I dont know if it is faster....but it was sure fun to write. :-)

"Coding is FUNdamental"

Input = "1, 9999-12-31,a=b&c=d&e=f"

VariableValue = Field(Input,",",3)
N = Count(VariableValue,'&') + 1
For X = 1 to N
StaticValue = Field(Input,",",1):",":Field(Input,",",2)
StaticValue := ",":Field(VariableValue,"&",X):char(13):char(10)
RowValue := StaticValue
Next X
Convert "=" to "," in RowValue
Ans = RowValue
Lance Short
"infinite diversity in infinite combinations"
***
"The absence of evidence is not evidence of absence."
Post Reply