Improve performance

Vishal1982 · Post by **Vishal1982** » Wed Jul 29, 2009 8:46 am

Hi All,
I am designing a fact table ,which contains 17 dimension tables.To lookup into these 17 tables(reference tables) and retriving data based on certain condition ,but the job execution is taking more time to execute the large number of records,so please let me know how i can improve the the time execution?

Thanks in Advance!!

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Wed Jul 29, 2009 8:52 am

What is your design? i.e. source, target, machine spec, volume etc etc.

What activity is consuming most time and resource ?

Without any information it is not possible for others to respond.

ray.wurlod · Post by **ray.wurlod** » Wed Jul 29, 2009 4:49 pm

Have you loaded the dimension tables into hashed files? Have you enabled read cache for these?

(You should have loaded the dimension tables into hashed files as part of the processing of dimension data, so there is probably no extra overhead.)

keshav0307 · Post by **keshav0307** » Wed Jul 29, 2009 6:24 pm

if possible, Partition the data and process in parallel

Vishal1982 · Post by **Vishal1982** » Thu Jul 30, 2009 4:03 am

Hi,

To discuss about my job in detail.
my source and target is remote oracle server,and i am passing the record to output fact table which is table rh_fact_asset,primary table is asset (contain records 10000) and other 17 dimension table each one containing about average of 1000 records for lookup and following is the my one stage variable condition :
IF( (Asset.ITEMSETID= rh_dim_sets.SETID) Or (Asset.ITEMSETID= @NULL And rh_dim_sets.SETID ='NOSETID')) And (( Asset.CLASSSTRUCTUREID= rh_dim_classstructure.CLASSSTRUCTUREID) Or( Asset.CLASSSTRUCTUREID= @NULL And rh_dim_classstructure.CLASSSTRUCTUREID="NOCSTR") ) And (( Asset.FAILURECODE= rh_dim_failurecode.FAILURECODE ) Or (Asset.FAILURECODE= @NULL And rh_dim_failurecode.FAILURECODE="NOFAILURECODE") ) And ((Asset.GROUPNAME= rh_dim_metergroup.GROUPNAME) Or (Asset.GROUPNAME= @NULL And rh_dim_metergroup.GROUPNAME="NOGROUP") ) And ( (Asset.GLACCOUNT= rh_dim_chartofaccounts.GLACCOUNT ) Or (Asset.GLACCOUNT= @NULL And rh_dim_chartofaccounts.GLACCOUNT="NOGLACCOUNT") ) And ((Oconv(Asset.INSTALLDATE, "D4/YMD")=Oconv(rh_dim_installdate.CDATE, "D4/YMD") Or (Asset.INSTALLDATE = @NULL And Oconv(rh_dim_installdate.CDATE,"D4/YMD")="0001/01/01"))) And (( Asset.ITEMNUM = rh_dim_item.ITEMNUM ) Or (Asset.ITEMNUM= @NULL And rh_dim_item.ITEMNUM="NOITEMNUM")) And ((Asset.LOCATION= rh_dim_locations.LOCATION ) Or ( Asset.LOCATION= @NULL And rh_dim_locations.LOCATION="NOLOCATION") ) And ((Asset.MANUFACTURER= rh_dim_manufacturer.COMPANY) Or (Asset.MANUFACTURER= @NULL And rh_dim_manufacturer.COMPANY="NOCOMPANY")) And ((Asset.ORGID= rh_dim_org.ORGID) Or (Asset.ORGID= @NULL And rh_dim_org.ORGID="NOORGID") ) And ((Asset.PRIORITY= rh_dim_workpriority.PRIORITY ) Or (Asset.PRIORITY= @NULL And rh_dim_workpriority.PRIORITY="NOWOPRTY")) And (( Asset.SHIFTNUM= rh_dim_shift.SHIFTNUM ) Or ( Asset.SHIFTNUM= @NULL And rh_dim_shift.SHIFTNUM="NOSHIFT") ) And ((Oconv(Asset.STATUSDATE, "D4/YMD")=Oconv(rh_dim_statudate.CDATE, "D4/YMD") Or (Asset.STATUSDATE = @NULL And Oconv(rh_dim_statudate.CDATE,"D4/YMD")="0001/01/01"))) And (( Asset.VENDOR= rh_dim_vendor.COMPANY) Or (Asset.VENDOR= @NULL And rh_dim_vendor.COMPANY="NOCOMPANY") ) And (( Asset.SITEID= rh_dim_site.SITEID) Or ( Asset.SITEID= @NULL And rh_dim_site.SITEID="NOSITE") ) And ((Oconv(Asset.WARRANTYEXPDATE, "D4/YMD")=Oconv(rh_dim_warrantyexpdate.CDATE, "D4/YMD") Or (Asset.WARRANTYEXPDATE = @NULL And Oconv(rh_dim_warrantyexpdate.CDATE,"D4/YMD")="0001/01/01")) )Then 'Y' Else 'N'

based on which the record passes to the fact table, for some dimension it fetching appropriate record fast but for some of dimensions it's taking time.

Let me know how to process the hashed files for this problem.

i cann't use the parallel job for this as our requirement is to process on server job only.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Jul 30, 2009 4:34 am

What is the current performance? i.e. rows / second, CPU consumption, I/O usage etc.

This will give you an idea of what you can optimize.

Can you provide the job design so others can comment.

Some of the points you can look into in the code supplied are

1.) Avoid OConv on both sides of the comparison operator.
i.e. rather than Oconv(dt1,"D4/YMD) = OConv(dt2, "D4/YMD"), it can simply be dt1 = dt2

2.) Use IsNull(field) rather than field = @NULL

3.) Provide the condition that will fail most frequently at the front.

4.) Use Found or NotFound instead of matching key with dimension values

Also what is the use of '01/01/0001'? If you can set it to some other constant, you can use point 1 there also.

chulett · Post by **chulett** » Thu Jul 30, 2009 5:34 am

Yikes! All of that for CDD? You need to investigate the use of a NullToValue() transform and the CRC32 function. But again you need to make sure you identify the true bottleneck in your processing - is it one particular lookup? The stage variables? The database itself? You really need to break things down and isolate the issue before you run off and solve the 'wrong' problem.

Vishal1982 · Post by **Vishal1982** » Thu Jul 30, 2009 5:48 am

Following is my job design for all of reference purpose:

chulett · Post by **chulett** » Thu Jul 30, 2009 5:53 am

Oh goodness, no wonder. Hashed files are the heart and soul of Server jobs, especially as reference lookups and you really need to learn how to leverage those. Direct database lookups are the slowest 'solution' you can use here.

Do you have anyone there working with you who can help set you on the right path?

Vishal1982 · Post by **Vishal1982** » Thu Jul 30, 2009 6:00 am

Hi chulett ,
I know that hashed files provides fast access to data but as i am not using any key columns for data retrieving ,so may i know how to process the hashed files in my job to replace with the ODBC stage.

chulett · Post by **chulett** » Thu Jul 30, 2009 6:12 am

Explain what "not using key columns" means, are you saying there are no "where" clauses in your lookups?

Also, do any of them return more than a single record? If so, that would throw a kink into replacing them with hashed files.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Jul 30, 2009 6:25 am

You need to have one pre-job job which will populate the hashed files.

Changes to this job will be then a simple replacement of ODBC by hashed files and providing its properties.

If by "not using key" you mean "not using the key defined in the database", using hashed files will help you overcome that.

Especially for 17k fact rows with 1k dim rows you will be off a flying start.

If references for InstallDate, ChangeDate, WarrantyExpDate and StatusDate are simple date / calendar dimensions, I will recommend doing a computation on fly rather than a reference to the table.

Check whether similar items (such as manufacturer and site, accounts and chart of accounts) can be combined into one reference.

Vishal1982 · Post by **Vishal1982** » Fri Jul 31, 2009 4:51 am

Hi,

I have now loaded all the dimensional table data to hashed files ,but i couldn't able to retrieve those data on another job,please help me how i can retrieve those files on another job?

Is it possible to have the hashed files without key columns?

chulett · Post by **chulett** » Fri Jul 31, 2009 5:20 am

No, not possible. Please answer my previous questions... and now also explain what "couldn't able to" means here as well.

Vishal1982 · Post by **Vishal1982** » Fri Jul 31, 2009 5:38 am

Hi Chulett,
yes,all my dimension tables returns more than one record ,as i have used all the hashed files in my job as input .
My Primary table is Asset and for example consider two lookups (locatoin and site) each lookup table contain surrogate key value,my target table is Fact_Asset and this fact table contain records from asset but two columns of Fact_Asset (i.e location_srkey,site_srkey) will be replaced with surrogate key values from lookup tables. But when I am using hashed files the values from hashed files are not showing any records for input, I want to loop through all the values from hashed files to load the corresponding surrogate key in my fact table.
Also let me know the key column purpose used in Hashed files

Thanks !!