Improve performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Vishal1982
Participant
Posts: 38
Joined: Wed Oct 01, 2008 10:30 am

Improve performance

Post by Vishal1982 »

Hi All,
I am designing a fact table ,which contains 17 dimension tables.To lookup into these 17 tables(reference tables) and retriving data based on certain condition ,but the job execution is taking more time to execute the large number of records,so please let me know how i can improve the the time execution?

Thanks in Advance!!
IBM Websphere Datastage Certified Professional
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

What is your design? i.e. source, target, machine spec, volume etc etc.

What activity is consuming most time and resource ?

Without any information it is not possible for others to respond.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Have you loaded the dimension tables into hashed files? Have you enabled read cache for these?

(You should have loaded the dimension tables into hashed files as part of the processing of dimension data, so there is probably no extra overhead.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

if possible, Partition the data and process in parallel
Vishal1982
Participant
Posts: 38
Joined: Wed Oct 01, 2008 10:30 am

Improve performance

Post by Vishal1982 »

Hi,

To discuss about my job in detail.
my source and target is remote oracle server,and i am passing the record to output fact table which is table rh_fact_asset,primary table is asset (contain records 10000) and other 17 dimension table each one containing about average of 1000 records for lookup and following is the my one stage variable condition :
IF( (Asset.ITEMSETID= rh_dim_sets.SETID) Or (Asset.ITEMSETID= @NULL And rh_dim_sets.SETID ='NOSETID')) And (( Asset.CLASSSTRUCTUREID= rh_dim_classstructure.CLASSSTRUCTUREID) Or( Asset.CLASSSTRUCTUREID= @NULL And rh_dim_classstructure.CLASSSTRUCTUREID="NOCSTR") ) And (( Asset.FAILURECODE= rh_dim_failurecode.FAILURECODE ) Or (Asset.FAILURECODE= @NULL And rh_dim_failurecode.FAILURECODE="NOFAILURECODE") ) And ((Asset.GROUPNAME= rh_dim_metergroup.GROUPNAME) Or (Asset.GROUPNAME= @NULL And rh_dim_metergroup.GROUPNAME="NOGROUP") ) And ( (Asset.GLACCOUNT= rh_dim_chartofaccounts.GLACCOUNT ) Or (Asset.GLACCOUNT= @NULL And rh_dim_chartofaccounts.GLACCOUNT="NOGLACCOUNT") ) And ((Oconv(Asset.INSTALLDATE, "D4/YMD")=Oconv(rh_dim_installdate.CDATE, "D4/YMD") Or (Asset.INSTALLDATE = @NULL And Oconv(rh_dim_installdate.CDATE,"D4/YMD")="0001/01/01"))) And (( Asset.ITEMNUM = rh_dim_item.ITEMNUM ) Or (Asset.ITEMNUM= @NULL And rh_dim_item.ITEMNUM="NOITEMNUM")) And ((Asset.LOCATION= rh_dim_locations.LOCATION ) Or ( Asset.LOCATION= @NULL And rh_dim_locations.LOCATION="NOLOCATION") ) And ((Asset.MANUFACTURER= rh_dim_manufacturer.COMPANY) Or (Asset.MANUFACTURER= @NULL And rh_dim_manufacturer.COMPANY="NOCOMPANY")) And ((Asset.ORGID= rh_dim_org.ORGID) Or (Asset.ORGID= @NULL And rh_dim_org.ORGID="NOORGID") ) And ((Asset.PRIORITY= rh_dim_workpriority.PRIORITY ) Or (Asset.PRIORITY= @NULL And rh_dim_workpriority.PRIORITY="NOWOPRTY")) And (( Asset.SHIFTNUM= rh_dim_shift.SHIFTNUM ) Or ( Asset.SHIFTNUM= @NULL And rh_dim_shift.SHIFTNUM="NOSHIFT") ) And ((Oconv(Asset.STATUSDATE, "D4/YMD")=Oconv(rh_dim_statudate.CDATE, "D4/YMD") Or (Asset.STATUSDATE = @NULL And Oconv(rh_dim_statudate.CDATE,"D4/YMD")="0001/01/01"))) And (( Asset.VENDOR= rh_dim_vendor.COMPANY) Or (Asset.VENDOR= @NULL And rh_dim_vendor.COMPANY="NOCOMPANY") ) And (( Asset.SITEID= rh_dim_site.SITEID) Or ( Asset.SITEID= @NULL And rh_dim_site.SITEID="NOSITE") ) And ((Oconv(Asset.WARRANTYEXPDATE, "D4/YMD")=Oconv(rh_dim_warrantyexpdate.CDATE, "D4/YMD") Or (Asset.WARRANTYEXPDATE = @NULL And Oconv(rh_dim_warrantyexpdate.CDATE,"D4/YMD")="0001/01/01")) )Then 'Y' Else 'N'


based on which the record passes to the fact table, for some dimension it fetching appropriate record fast but for some of dimensions it's taking time.

Let me know how to process the hashed files for this problem.

i cann't use the parallel job for this as our requirement is to process on server job only.
IBM Websphere Datastage Certified Professional
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

What is the current performance? i.e. rows / second, CPU consumption, I/O usage etc.

This will give you an idea of what you can optimize.

Can you provide the job design so others can comment.

Some of the points you can look into in the code supplied are

1.) Avoid OConv on both sides of the comparison operator.
i.e. rather than Oconv(dt1,"D4/YMD) = OConv(dt2, "D4/YMD"), it can simply be dt1 = dt2

2.) Use IsNull(field) rather than field = @NULL

3.) Provide the condition that will fail most frequently at the front.

4.) Use Found or NotFound instead of matching key with dimension values

Also what is the use of '01/01/0001'? If you can set it to some other constant, you can use point 1 there also.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yikes! All of that for CDD? You need to investigate the use of a NullToValue() transform and the CRC32 function. But again you need to make sure you identify the true bottleneck in your processing - is it one particular lookup? The stage variables? The database itself? You really need to break things down and isolate the issue before you run off and solve the 'wrong' problem.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Vishal1982
Participant
Posts: 38
Joined: Wed Oct 01, 2008 10:30 am

Post by Vishal1982 »

Following is my job design for all of reference purpose:

Image
IBM Websphere Datastage Certified Professional
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Oh goodness, no wonder. Hashed files are the heart and soul of Server jobs, especially as reference lookups and you really need to learn how to leverage those. Direct database lookups are the slowest 'solution' you can use here.

Do you have anyone there working with you who can help set you on the right path?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Vishal1982
Participant
Posts: 38
Joined: Wed Oct 01, 2008 10:30 am

Post by Vishal1982 »

Hi chulett ,
I know that hashed files provides fast access to data but as i am not using any key columns for data retrieving ,so may i know how to process the hashed files in my job to replace with the ODBC stage.
IBM Websphere Datastage Certified Professional
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Explain what "not using key columns" means, are you saying there are no "where" clauses in your lookups? :?

Also, do any of them return more than a single record? If so, that would throw a kink into replacing them with hashed files.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

You need to have one pre-job job which will populate the hashed files.

Changes to this job will be then a simple replacement of ODBC by hashed files and providing its properties.

If by "not using key" you mean "not using the key defined in the database", using hashed files will help you overcome that.

Especially for 17k fact rows with 1k dim rows you will be off a flying start.

If references for InstallDate, ChangeDate, WarrantyExpDate and StatusDate are simple date / calendar dimensions, I will recommend doing a computation on fly rather than a reference to the table.

Check whether similar items (such as manufacturer and site, accounts and chart of accounts) can be combined into one reference.
Vishal1982
Participant
Posts: 38
Joined: Wed Oct 01, 2008 10:30 am

Post by Vishal1982 »

Hi,

I have now loaded all the dimensional table data to hashed files ,but i couldn't able to retrieve those data on another job,please help me how i can retrieve those files on another job?

Is it possible to have the hashed files without key columns?
IBM Websphere Datastage Certified Professional
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No, not possible. Please answer my previous questions... and now also explain what "couldn't able to" means here as well.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Vishal1982
Participant
Posts: 38
Joined: Wed Oct 01, 2008 10:30 am

Post by Vishal1982 »

Hi Chulett,
yes,all my dimension tables returns more than one record ,as i have used all the hashed files in my job as input .
My Primary table is Asset and for example consider two lookups (locatoin and site) each lookup table contain surrogate key value,my target table is Fact_Asset and this fact table contain records from asset but two columns of Fact_Asset (i.e location_srkey,site_srkey) will be replaced with surrogate key values from lookup tables. But when I am using hashed files the values from hashed files are not showing any records for input, I want to loop through all the values from hashed files to load the corresponding surrogate key in my fact table.
Also let me know the key column purpose used in Hashed files

Thanks !!
IBM Websphere Datastage Certified Professional
Post Reply