Hash file: Selecting all/few columns: Performance Impact

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
dwscblr
Participant
Posts: 19
Joined: Tue May 18, 2004 12:39 am

Hash file: Selecting all/few columns: Performance Impact

Post by dwscblr »

We have a hashfile containing 5 columns. Job 1 and Job2 use this hash file for reference lookups.

Building two hashfiles instead of one common hashfile is not an option available to us.

Job 1 uses 4 columns of the Hashfile and job 2 uses 2 columns of the Hashfile.

As of now for both jobs, while importing the Hashfile definition, we have imported all 5 columns comprising of the key.

My question is can I use only the required columns from the hashfile for each job. i.e. for job 1 use only the 4 columns required, for job2 use only the 2 columns required.

Does this improve performance, degrade performance, no impact on performance?
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

In both scenarios you will need to load the metadata to define the key. If not all of the other columns are used in a particular job or lookup then you do not need to reference or load this metadata.

You will want to ensure that the metadata is positionally correct in the hash ie., key columns are in position 0 column1 is 1 etc... DataStage handles this when it is allowed to manage the creation and deletion of the hash.

Regards,
dhiraj
Participant
Posts: 68
Joined: Sat Dec 06, 2003 7:03 am

Post by dhiraj »

datastage stores all the columns of a record in a sequence. and since your record seems to be very small it should make no difference to performance. whether you retrieve the selective columns or all the columns of the record, datastage has to apply the same hashing algorithm to your keys.
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Hi

The answer to your question is "YES" it does improve the performance if you reduce the number of columns in your hash file. Normally while creating the hash file(Lookup) you need to use only the columns which you will be using in your jobs. Don't load unnecessary columns which takes up resource.


In your case it depends on what your job1 and job2 needs as lookup?. Is both the jobs need the same number records in the lookup?.

Job1 uses 4 out of 5 columns in your hash file. Is Job2 uses the same key fields and 2 columns which is part of Job1. Bases on these questions you can decide what to do.

Thanks
Siva
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Also investigate the "shared" hashed file capability, where the two (or more) jobs can share the same cached hashed file, instead of setting up two (or more) cached copies of the same hashed file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply