Improve performance
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 38
- Joined: Wed Oct 01, 2008 10:30 am
Improve performance
Hi All,
I am designing a fact table ,which contains 17 dimension tables.To lookup into these 17 tables(reference tables) and retriving data based on certain condition ,but the job execution is taking more time to execute the large number of records,so please let me know how i can improve the the time execution?
Thanks in Advance!!
I am designing a fact table ,which contains 17 dimension tables.To lookup into these 17 tables(reference tables) and retriving data based on certain condition ,but the job execution is taking more time to execute the large number of records,so please let me know how i can improve the the time execution?
Thanks in Advance!!
IBM Websphere Datastage Certified Professional
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Have you loaded the dimension tables into hashed files? Have you enabled read cache for these?
(You should have loaded the dimension tables into hashed files as part of the processing of dimension data, so there is probably no extra overhead.)
(You should have loaded the dimension tables into hashed files as part of the processing of dimension data, so there is probably no extra overhead.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia
-
- Participant
- Posts: 38
- Joined: Wed Oct 01, 2008 10:30 am
Improve performance
Hi,
To discuss about my job in detail.
my source and target is remote oracle server,and i am passing the record to output fact table which is table rh_fact_asset,primary table is asset (contain records 10000) and other 17 dimension table each one containing about average of 1000 records for lookup and following is the my one stage variable condition :
IF( (Asset.ITEMSETID= rh_dim_sets.SETID) Or (Asset.ITEMSETID= @NULL And rh_dim_sets.SETID ='NOSETID')) And (( Asset.CLASSSTRUCTUREID= rh_dim_classstructure.CLASSSTRUCTUREID) Or( Asset.CLASSSTRUCTUREID= @NULL And rh_dim_classstructure.CLASSSTRUCTUREID="NOCSTR") ) And (( Asset.FAILURECODE= rh_dim_failurecode.FAILURECODE ) Or (Asset.FAILURECODE= @NULL And rh_dim_failurecode.FAILURECODE="NOFAILURECODE") ) And ((Asset.GROUPNAME= rh_dim_metergroup.GROUPNAME) Or (Asset.GROUPNAME= @NULL And rh_dim_metergroup.GROUPNAME="NOGROUP") ) And ( (Asset.GLACCOUNT= rh_dim_chartofaccounts.GLACCOUNT ) Or (Asset.GLACCOUNT= @NULL And rh_dim_chartofaccounts.GLACCOUNT="NOGLACCOUNT") ) And ((Oconv(Asset.INSTALLDATE, "D4/YMD")=Oconv(rh_dim_installdate.CDATE, "D4/YMD") Or (Asset.INSTALLDATE = @NULL And Oconv(rh_dim_installdate.CDATE,"D4/YMD")="0001/01/01"))) And (( Asset.ITEMNUM = rh_dim_item.ITEMNUM ) Or (Asset.ITEMNUM= @NULL And rh_dim_item.ITEMNUM="NOITEMNUM")) And ((Asset.LOCATION= rh_dim_locations.LOCATION ) Or ( Asset.LOCATION= @NULL And rh_dim_locations.LOCATION="NOLOCATION") ) And ((Asset.MANUFACTURER= rh_dim_manufacturer.COMPANY) Or (Asset.MANUFACTURER= @NULL And rh_dim_manufacturer.COMPANY="NOCOMPANY")) And ((Asset.ORGID= rh_dim_org.ORGID) Or (Asset.ORGID= @NULL And rh_dim_org.ORGID="NOORGID") ) And ((Asset.PRIORITY= rh_dim_workpriority.PRIORITY ) Or (Asset.PRIORITY= @NULL And rh_dim_workpriority.PRIORITY="NOWOPRTY")) And (( Asset.SHIFTNUM= rh_dim_shift.SHIFTNUM ) Or ( Asset.SHIFTNUM= @NULL And rh_dim_shift.SHIFTNUM="NOSHIFT") ) And ((Oconv(Asset.STATUSDATE, "D4/YMD")=Oconv(rh_dim_statudate.CDATE, "D4/YMD") Or (Asset.STATUSDATE = @NULL And Oconv(rh_dim_statudate.CDATE,"D4/YMD")="0001/01/01"))) And (( Asset.VENDOR= rh_dim_vendor.COMPANY) Or (Asset.VENDOR= @NULL And rh_dim_vendor.COMPANY="NOCOMPANY") ) And (( Asset.SITEID= rh_dim_site.SITEID) Or ( Asset.SITEID= @NULL And rh_dim_site.SITEID="NOSITE") ) And ((Oconv(Asset.WARRANTYEXPDATE, "D4/YMD")=Oconv(rh_dim_warrantyexpdate.CDATE, "D4/YMD") Or (Asset.WARRANTYEXPDATE = @NULL And Oconv(rh_dim_warrantyexpdate.CDATE,"D4/YMD")="0001/01/01")) )Then 'Y' Else 'N'
based on which the record passes to the fact table, for some dimension it fetching appropriate record fast but for some of dimensions it's taking time.
Let me know how to process the hashed files for this problem.
i cann't use the parallel job for this as our requirement is to process on server job only.
To discuss about my job in detail.
my source and target is remote oracle server,and i am passing the record to output fact table which is table rh_fact_asset,primary table is asset (contain records 10000) and other 17 dimension table each one containing about average of 1000 records for lookup and following is the my one stage variable condition :
IF( (Asset.ITEMSETID= rh_dim_sets.SETID) Or (Asset.ITEMSETID= @NULL And rh_dim_sets.SETID ='NOSETID')) And (( Asset.CLASSSTRUCTUREID= rh_dim_classstructure.CLASSSTRUCTUREID) Or( Asset.CLASSSTRUCTUREID= @NULL And rh_dim_classstructure.CLASSSTRUCTUREID="NOCSTR") ) And (( Asset.FAILURECODE= rh_dim_failurecode.FAILURECODE ) Or (Asset.FAILURECODE= @NULL And rh_dim_failurecode.FAILURECODE="NOFAILURECODE") ) And ((Asset.GROUPNAME= rh_dim_metergroup.GROUPNAME) Or (Asset.GROUPNAME= @NULL And rh_dim_metergroup.GROUPNAME="NOGROUP") ) And ( (Asset.GLACCOUNT= rh_dim_chartofaccounts.GLACCOUNT ) Or (Asset.GLACCOUNT= @NULL And rh_dim_chartofaccounts.GLACCOUNT="NOGLACCOUNT") ) And ((Oconv(Asset.INSTALLDATE, "D4/YMD")=Oconv(rh_dim_installdate.CDATE, "D4/YMD") Or (Asset.INSTALLDATE = @NULL And Oconv(rh_dim_installdate.CDATE,"D4/YMD")="0001/01/01"))) And (( Asset.ITEMNUM = rh_dim_item.ITEMNUM ) Or (Asset.ITEMNUM= @NULL And rh_dim_item.ITEMNUM="NOITEMNUM")) And ((Asset.LOCATION= rh_dim_locations.LOCATION ) Or ( Asset.LOCATION= @NULL And rh_dim_locations.LOCATION="NOLOCATION") ) And ((Asset.MANUFACTURER= rh_dim_manufacturer.COMPANY) Or (Asset.MANUFACTURER= @NULL And rh_dim_manufacturer.COMPANY="NOCOMPANY")) And ((Asset.ORGID= rh_dim_org.ORGID) Or (Asset.ORGID= @NULL And rh_dim_org.ORGID="NOORGID") ) And ((Asset.PRIORITY= rh_dim_workpriority.PRIORITY ) Or (Asset.PRIORITY= @NULL And rh_dim_workpriority.PRIORITY="NOWOPRTY")) And (( Asset.SHIFTNUM= rh_dim_shift.SHIFTNUM ) Or ( Asset.SHIFTNUM= @NULL And rh_dim_shift.SHIFTNUM="NOSHIFT") ) And ((Oconv(Asset.STATUSDATE, "D4/YMD")=Oconv(rh_dim_statudate.CDATE, "D4/YMD") Or (Asset.STATUSDATE = @NULL And Oconv(rh_dim_statudate.CDATE,"D4/YMD")="0001/01/01"))) And (( Asset.VENDOR= rh_dim_vendor.COMPANY) Or (Asset.VENDOR= @NULL And rh_dim_vendor.COMPANY="NOCOMPANY") ) And (( Asset.SITEID= rh_dim_site.SITEID) Or ( Asset.SITEID= @NULL And rh_dim_site.SITEID="NOSITE") ) And ((Oconv(Asset.WARRANTYEXPDATE, "D4/YMD")=Oconv(rh_dim_warrantyexpdate.CDATE, "D4/YMD") Or (Asset.WARRANTYEXPDATE = @NULL And Oconv(rh_dim_warrantyexpdate.CDATE,"D4/YMD")="0001/01/01")) )Then 'Y' Else 'N'
based on which the record passes to the fact table, for some dimension it fetching appropriate record fast but for some of dimensions it's taking time.
Let me know how to process the hashed files for this problem.
i cann't use the parallel job for this as our requirement is to process on server job only.
IBM Websphere Datastage Certified Professional
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
What is the current performance? i.e. rows / second, CPU consumption, I/O usage etc.
This will give you an idea of what you can optimize.
Can you provide the job design so others can comment.
Some of the points you can look into in the code supplied are
1.) Avoid OConv on both sides of the comparison operator.
i.e. rather than Oconv(dt1,"D4/YMD) = OConv(dt2, "D4/YMD"), it can simply be dt1 = dt2
2.) Use IsNull(field) rather than field = @NULL
3.) Provide the condition that will fail most frequently at the front.
4.) Use Found or NotFound instead of matching key with dimension values
Also what is the use of '01/01/0001'? If you can set it to some other constant, you can use point 1 there also.
This will give you an idea of what you can optimize.
Can you provide the job design so others can comment.
Some of the points you can look into in the code supplied are
1.) Avoid OConv on both sides of the comparison operator.
i.e. rather than Oconv(dt1,"D4/YMD) = OConv(dt2, "D4/YMD"), it can simply be dt1 = dt2
2.) Use IsNull(field) rather than field = @NULL
3.) Provide the condition that will fail most frequently at the front.
4.) Use Found or NotFound instead of matching key with dimension values
Also what is the use of '01/01/0001'? If you can set it to some other constant, you can use point 1 there also.
Yikes! All of that for CDD? You need to investigate the use of a NullToValue() transform and the CRC32 function. But again you need to make sure you identify the true bottleneck in your processing - is it one particular lookup? The stage variables? The database itself? You really need to break things down and isolate the issue before you run off and solve the 'wrong' problem.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 38
- Joined: Wed Oct 01, 2008 10:30 am
Oh goodness, no wonder. Hashed files are the heart and soul of Server jobs, especially as reference lookups and you really need to learn how to leverage those. Direct database lookups are the slowest 'solution' you can use here.
Do you have anyone there working with you who can help set you on the right path?
Do you have anyone there working with you who can help set you on the right path?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 38
- Joined: Wed Oct 01, 2008 10:30 am
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
You need to have one pre-job job which will populate the hashed files.
Changes to this job will be then a simple replacement of ODBC by hashed files and providing its properties.
If by "not using key" you mean "not using the key defined in the database", using hashed files will help you overcome that.
Especially for 17k fact rows with 1k dim rows you will be off a flying start.
If references for InstallDate, ChangeDate, WarrantyExpDate and StatusDate are simple date / calendar dimensions, I will recommend doing a computation on fly rather than a reference to the table.
Check whether similar items (such as manufacturer and site, accounts and chart of accounts) can be combined into one reference.
Changes to this job will be then a simple replacement of ODBC by hashed files and providing its properties.
If by "not using key" you mean "not using the key defined in the database", using hashed files will help you overcome that.
Especially for 17k fact rows with 1k dim rows you will be off a flying start.
If references for InstallDate, ChangeDate, WarrantyExpDate and StatusDate are simple date / calendar dimensions, I will recommend doing a computation on fly rather than a reference to the table.
Check whether similar items (such as manufacturer and site, accounts and chart of accounts) can be combined into one reference.
-
- Participant
- Posts: 38
- Joined: Wed Oct 01, 2008 10:30 am
-
- Participant
- Posts: 38
- Joined: Wed Oct 01, 2008 10:30 am
Hi Chulett,
yes,all my dimension tables returns more than one record ,as i have used all the hashed files in my job as input .
My Primary table is Asset and for example consider two lookups (locatoin and site) each lookup table contain surrogate key value,my target table is Fact_Asset and this fact table contain records from asset but two columns of Fact_Asset (i.e location_srkey,site_srkey) will be replaced with surrogate key values from lookup tables. But when I am using hashed files the values from hashed files are not showing any records for input, I want to loop through all the values from hashed files to load the corresponding surrogate key in my fact table.
Also let me know the key column purpose used in Hashed files
Thanks !!
yes,all my dimension tables returns more than one record ,as i have used all the hashed files in my job as input .
My Primary table is Asset and for example consider two lookups (locatoin and site) each lookup table contain surrogate key value,my target table is Fact_Asset and this fact table contain records from asset but two columns of Fact_Asset (i.e location_srkey,site_srkey) will be replaced with surrogate key values from lookup tables. But when I am using hashed files the values from hashed files are not showing any records for input, I want to loop through all the values from hashed files to load the corresponding surrogate key in my fact table.
Also let me know the key column purpose used in Hashed files
Thanks !!
IBM Websphere Datastage Certified Professional