Boosting the Performance of the job
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 48
- Joined: Thu Mar 11, 2004 10:32 pm
Boosting the Performance of the job
Hi,
I have a job which has an input of 380000 records,
consists of the input sql query as follows
SELECT
A.ACCESS_KEY,
A.START_DT_TM,
A.QRY_STRNG_TEXT
FROM
ACCES_EVENT A
WHERE
A.START_DT_TM >= TO_DATE(#LOGDTTM#,'YYYYMMDD')
AND
A.START_DT_TM < (TO_DATE(#LOGDTTM#,'YYYYMMDD')+1)
AND EXISTS
(
SELECT
WEB_SITE_ID
FROM
WEB_SITE
WHERE
SVCE_ID IN ('GIB', 'HIBM')
AND
WEB_SITE_ID = A.WEB_SITE_ID
)
later on I use a pivot stage and trnasform the incoming access_key, qry string text as access_key, value pair combintaion as follows
input data
1, '9999-12-31','a=b&c=d&e=f'
2,'9999-12-31','a=b&l=m&q=d'
output data
1,'9999-12-31',a=b
1,'9999-12-31',c=d
1,'9999-12-31',e=f
2,'9999-12-31',a=b
2,'9999-12-31',l=m
2,'9999-12-31',q=d
Later I use a transformer to transform this data as
1, '9999-12-31',a, b
1,'9999-12-31',c,d
1,'9999-12-31', e,f
2,'9999-12-31',a,b
2,'9999-12-31',l,m
2,'9999-12-31',q,d
But th problem is the entire operation is takeing 1 hour 53 minutes for around 380000 records.
Could anyone give me tips on how i could improve the performance.
regards,
varsha
I have a job which has an input of 380000 records,
consists of the input sql query as follows
SELECT
A.ACCESS_KEY,
A.START_DT_TM,
A.QRY_STRNG_TEXT
FROM
ACCES_EVENT A
WHERE
A.START_DT_TM >= TO_DATE(#LOGDTTM#,'YYYYMMDD')
AND
A.START_DT_TM < (TO_DATE(#LOGDTTM#,'YYYYMMDD')+1)
AND EXISTS
(
SELECT
WEB_SITE_ID
FROM
WEB_SITE
WHERE
SVCE_ID IN ('GIB', 'HIBM')
AND
WEB_SITE_ID = A.WEB_SITE_ID
)
later on I use a pivot stage and trnasform the incoming access_key, qry string text as access_key, value pair combintaion as follows
input data
1, '9999-12-31','a=b&c=d&e=f'
2,'9999-12-31','a=b&l=m&q=d'
output data
1,'9999-12-31',a=b
1,'9999-12-31',c=d
1,'9999-12-31',e=f
2,'9999-12-31',a=b
2,'9999-12-31',l=m
2,'9999-12-31',q=d
Later I use a transformer to transform this data as
1, '9999-12-31',a, b
1,'9999-12-31',c,d
1,'9999-12-31', e,f
2,'9999-12-31',a,b
2,'9999-12-31',l,m
2,'9999-12-31',q,d
But th problem is the entire operation is takeing 1 hour 53 minutes for around 380000 records.
Could anyone give me tips on how i could improve the performance.
regards,
varsha
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Your SQL query includes a correlated subquery; that's probably the main culprit. Correlated subqueries must be executed for every row of the outer query; you are executing 380001 queries in the database.
Create a job that runs this query into a Peek stage, and time just the query. Seek ways to write the query without using a correlated subquery.
The Pivot and Transformer stages are probably the appropriate mechanisms for the remaining processing. Create a job that takes the result of the query from a persistent Data Set (created by a variant of the earlier test job) and runs it through the Pivot and Transformer logic. That will give you a measure of the cost of that part of the processing.
Create a job that runs this query into a Peek stage, and time just the query. Seek ways to write the query without using a correlated subquery.
The Pivot and Transformer stages are probably the appropriate mechanisms for the remaining processing. Create a job that takes the result of the query from a persistent Data Set (created by a variant of the earlier test job) and runs it through the Pivot and Transformer logic. That will give you a measure of the cost of that part of the processing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 123
- Joined: Wed May 18, 2005 7:41 am
- Location: USA
-
- Participant
- Posts: 48
- Joined: Thu Mar 11, 2004 10:32 pm
I have split the entire process the process which is selecting the data iselavenil wrote:As suggessted by Ray, avoid using subquery and it will improve the performance.
Why do you want to use this subquery? This can be achieved in a simple join itself.
Regards
Saravanan
and putting it into a dataset. Later another job reads data from the dataset and does the further processing. Still the first job takes less than a second while the job which is doing the processing using pivot and trnasformer is taking a total of 1 hour 50 minutes.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
When you select, ORDER BY the pivot keys. This will help a little.
The only other suggestion I can make is to capture performance statistics for the Pivot and Transformer stages; this will highlight any "hot spots". For example, how complex are your Transformations?
Statistics gathering is enabled on the Tracing tab of the Job Run Options dialog when you run a job from Director.
The only other suggestion I can make is to capture performance statistics for the Pivot and Transformer stages; this will highlight any "hot spots". For example, how complex are your Transformations?
Statistics gathering is enabled on the Tracing tab of the Job Run Options dialog when you run a job from Director.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 48
- Joined: Thu Mar 11, 2004 10:32 pm
In order to do this do I need to have the tracing options enabled in my administrator, because I am unable to locate the trace tab for Job Run Options dialog when you run a job from Director, and since it is production environment I may not be allowed to do so, could I enable the trace tab only for this job.ray.wurlod wrote:When you select, ORDER BY the pivot keys. This will help a little.
The only other suggestion I can make is to capture performance statistics for the Pivot and Transformer stages; this will highlight any "hot spots". For example, how complex are your Transformations?
Statistics gathering is enabled on the Tracing tab of the Job Run Options dialog when you run a job from Director.
This is one of those situations where if you dont absolutly need the job to be "Parallel" a simple basic program could do this quite quickly and easily.
I know!...I know! Dorothy, We aint in "Server" anymore.
I know!...I know! Dorothy, We aint in "Server" anymore.
Lance Short
"infinite diversity in infinite combinations"
***
"The absence of evidence is not evidence of absence."
"infinite diversity in infinite combinations"
***
"The absence of evidence is not evidence of absence."
Create a job
Seq -->Transformer-->Seq
Each row read as single field (use delim that does not exist in file ie. '|')
The a use routine simlar to that below to transform the row to the target sequential file. This should work. Or Something very close to it.
I dont know if it is faster....but it was sure fun to write.
"Coding is FUNdamental"
Input = "1, 9999-12-31,a=b&c=d&e=f"
VariableValue = Field(Input,",",3)
N = Count(VariableValue,'&') + 1
For X = 1 to N
StaticValue = Field(Input,",",1):",":Field(Input,",",2)
StaticValue := ",":Field(VariableValue,"&",X):char(13):char(10)
RowValue := StaticValue
Next X
Convert "=" to "," in RowValue
Ans = RowValue
Seq -->Transformer-->Seq
Each row read as single field (use delim that does not exist in file ie. '|')
The a use routine simlar to that below to transform the row to the target sequential file. This should work. Or Something very close to it.
I dont know if it is faster....but it was sure fun to write.
"Coding is FUNdamental"
Input = "1, 9999-12-31,a=b&c=d&e=f"
VariableValue = Field(Input,",",3)
N = Count(VariableValue,'&') + 1
For X = 1 to N
StaticValue = Field(Input,",",1):",":Field(Input,",",2)
StaticValue := ",":Field(VariableValue,"&",X):char(13):char(10)
RowValue := StaticValue
Next X
Convert "=" to "," in RowValue
Ans = RowValue
Lance Short
"infinite diversity in infinite combinations"
***
"The absence of evidence is not evidence of absence."
"infinite diversity in infinite combinations"
***
"The absence of evidence is not evidence of absence."