DSXchange

Posted: **Fri Jun 03, 2005 4:34 pm**

Hi,

I have a job which has an input of 380000 records,
consists of the input sql query as follows

SELECT
A.ACCESS_KEY,
A.START_DT_TM,
A.QRY_STRNG_TEXT
FROM
ACCES_EVENT A
WHERE
A.START_DT_TM >= TO_DATE(#LOGDTTM#,'YYYYMMDD')
AND
A.START_DT_TM < (TO_DATE(#LOGDTTM#,'YYYYMMDD')+1)
AND EXISTS
(
SELECT
WEB_SITE_ID
FROM
WEB_SITE
WHERE
SVCE_ID IN ('GIB', 'HIBM')
AND
WEB_SITE_ID = A.WEB_SITE_ID
)

later on I use a pivot stage and trnasform the incoming access_key, qry string text as access_key, value pair combintaion as follows

input data
1, '9999-12-31','a=b&c=d&e=f'
2,'9999-12-31','a=b&l=m&q=d'

output data
1,'9999-12-31',a=b
1,'9999-12-31',c=d
1,'9999-12-31',e=f
2,'9999-12-31',a=b
2,'9999-12-31',l=m
2,'9999-12-31',q=d

Later I use a transformer to transform this data as

1, '9999-12-31',a, b
1,'9999-12-31',c,d
1,'9999-12-31', e,f
2,'9999-12-31',a,b
2,'9999-12-31',l,m
2,'9999-12-31',q,d

But th problem is the entire operation is takeing 1 hour 53 minutes for around 380000 records.

Could anyone give me tips on how i could improve the performance.

regards,
varsha

Posted: **Fri Jun 03, 2005 5:02 pm**

Your SQL query includes a correlated subquery; that's probably the main culprit. Correlated subqueries must be executed for every row of the outer query; you are executing 380001 queries in the database.

Create a job that runs this query into a Peek stage, and time just the query. Seek ways to write the query without using a correlated subquery.

The Pivot and Transformer stages are probably the appropriate mechanisms for the remaining processing. Create a job that takes the result of the query from a persistent Data Set (created by a variant of the earlier test job) and runs it through the Pivot and Transformer logic. That will give you a measure of the cost of that part of the processing.

Posted: **Sat Jun 04, 2005 9:29 am**

As suggessted by Ray, avoid using subquery and it will improve the performance.

Why do you want to use this subquery? This can be achieved in a simple join itself.

Regards
Saravanan

Posted: **Sat Jun 04, 2005 5:33 pm**

Yes, avoid that sub query and make a simple join.
While writing user defined SQLs in DS jobs, try to avoid using functions (Eg: DATE() etc.,) in WHERE clause.

Posted: **Mon Jun 06, 2005 10:09 am**

elavenil wrote:As suggessted by Ray, avoid using subquery and it will improve the performance.

Why do you want to use this subquery? This can be achieved in a simple join itself.

Regards
Saravanan

I have split the entire process the process which is selecting the data is
and putting it into a dataset. Later another job reads data from the dataset and does the further processing. Still the first job takes less than a second while the job which is doing the processing using pivot and trnasformer is taking a total of 1 hour 50 minutes.

Posted: **Mon Jun 06, 2005 2:52 pm**

When you select, ORDER BY the pivot keys. This will help a little.
The only other suggestion I can make is to capture performance statistics for the Pivot and Transformer stages; this will highlight any "hot spots". For example, how complex are your Transformations?
Statistics gathering is enabled on the Tracing tab of the Job Run Options dialog when you run a job from Director.

Posted: **Mon Jun 06, 2005 5:53 pm**

ray.wurlod wrote:When you select, ORDER BY the pivot keys. This will help a little.
The only other suggestion I can make is to capture performance statistics for the Pivot and Transformer stages; this will highlight any "hot spots". For example, how complex are your Transformations?
Statistics gathering is enabled on the Tracing tab of the Job Run Options dialog when you run a job from Director.

In order to do this do I need to have the tracing options enabled in my administrator, because I am unable to locate the trace tab for Job Run Options dialog when you run a job from Director, and since it is production environment I may not be allowed to do so, could I enable the trace tab only for this job.

Posted: **Tue Jun 07, 2005 7:15 am**

This is one of those situations where if you dont absolutly need the job to be "Parallel" a simple basic program could do this quite quickly and easily.

I know!...I know! Dorothy, We aint in "Server" anymore.

Posted: **Tue Jun 07, 2005 7:51 am**

Create a job

Seq -->Transformer-->Seq

Each row read as single field (use delim that does not exist in file ie. '|')
The a use routine simlar to that below to transform the row to the target sequential file. This should work. Or Something very close to it.
I dont know if it is faster....but it was sure fun to write.

"Coding is FUNdamental"

Input = "1, 9999-12-31,a=b&c=d&e=f"

VariableValue = Field(Input,",",3)
N = Count(VariableValue,'&') + 1
For X = 1 to N
StaticValue = Field(Input,",",1):",":Field(Input,",",2)
StaticValue := ",":Field(VariableValue,"&",X):char(13):char(10)
RowValue := StaticValue
Next X
Convert "=" to "," in RowValue
Ans = RowValue

DSXchange

Boosting the Performance of the job

Boosting the Performance of the job