Limiting the number of lookups per stage.

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
subramanya
Participant
Posts: 22
Joined: Fri Oct 15, 2004 11:53 pm
Location: Bangalore, India
Contact:

Limiting the number of lookups per stage.

Post by subramanya »

Hi,
I have 20 lookups which can be done in a single transformer stage. Whether dividing them across 4 transformers(say 5 lookups each) will increase the performance ?

Thnx
Subs
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Give it a shot - let us know! :wink:

Too many variables dependant on your particular system and exactly what you are doing in the job for us to give you a definitive answer.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

You can consider grouping lookups so they can be done in a single connection. This will reduce the amount of i/o and improve performance.

Also spreading the lookups on more than one transform will give you clarity.
subramanya
Participant
Posts: 22
Joined: Fri Oct 15, 2004 11:53 pm
Location: Bangalore, India
Contact:

Post by subramanya »

Not Much calculations. :). Its a staging job where i am pooling fields from diffrrent tables.

Subs
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Remember that as you add additional transformers the performance of your job will decrease. In a perfect world it's better to have as few transformers as possible, but in your case I would suspect that this is not reasonable. As Craig states - there are many variables here that will impact the performance of the job with one main variable being the size of the reference hash files.

Make sure to enable cache for each reference link. The width of the incoming stream is also important as well as the number of fields within your reference hash files. Many times I see hash usage where you need to reference the customer hash reference lookup and only need customer number. Instead of only choosing those fields required - the developer will load all of columns when in reality only one or two are required. Make sure the metadata for the hash is correct (field positions, key columns etc.) so that you can choose to use only those columns required.

Note: Don't forget to ensure that in the table def record for a hash that you have checked the box that says - "Metadata supports Multi-valued fields". If you have a job that created the file and then imported this file definition then you will see that this is positional and can help to only choose those fields necessary.

Regards.
amsh76
Charter Member
Charter Member
Posts: 118
Joined: Wed Mar 10, 2004 10:58 pm

Post by amsh76 »

It all depends on what you are trying to achieve. I would think if you can divide the lookups over two or three transformer and then try to use inter process it will speed up of your process..

Also for read cache, please do not enable it if you are not going to get the same input record again.

This is something you will have to play around and based on the resources availabe you will see difference in speed.

Regards
Dsnew
Participant
Posts: 90
Joined: Tue Apr 19, 2005 2:08 pm

Post by Dsnew »

mhester

Can you please what is
"Metadata supports Multi-valued fields"
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

This feature is found on the table def record in the repository.

UniVerse files are positional or the fields are offsets from 0 with 0 being the key. When you create the file (via a DS job) a dictionary will be created that contains the metadata for the file. The key is position 0 and subsequent fields will be 1,2,3,4 etc...

When you first import this table def into the repository and open it for viewing you will notice on the column page that no positions are listed. After you check this box you will then notice that there is now a position and type column. The key will display as 0 and the remaining fields will be 1,2,3 etc...

When loading the metadata for a reference lookup (using a hash) you can choose only those columns that are important and you will notice that the position (once loaded) will be correct.

I also believe that pre-loading the file to memory regardless of if the incoming row will be seen again is valuable - avoids disk reads.

Hope this helps,
Dsnew
Participant
Posts: 90
Joined: Tue Apr 19, 2005 2:08 pm

Post by Dsnew »

I imported from a table, the colums tab did show column numbers from 1,2,3...
I checked the option "Metadata supports Multi-valued fields" ut it did not make any difference to the sequence i.i still was 1,2,3...
Am i doing the right thing or did you mean something else?
:?
Dsnew
Participant
Posts: 90
Joined: Tue Apr 19, 2005 2:08 pm

Post by Dsnew »

Can you also tell me the benefit of checking this option please
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

You need to make sure you are importing the metadata via the manager using the Import>Table Definitions>UniVerse Files

I believe you are referring to the left side column numbers. After you check the "Metadata supports Multi-valued fields" you should see two new columns on the columns page - Postion and Type. These are specific to UniVerse file types - this is what I am referring to.
Dsnew
Participant
Posts: 90
Joined: Tue Apr 19, 2005 2:08 pm

Post by Dsnew »

Oh i got it wrong, i was looking into this while exporting data from Oracle.

Is this option "Metadata supports Multi-valued fields" specific to Universe files only?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

UniVerse and UniData only.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply