Hash file: Selecting all/few columns: Performance Impact

dwscblr · Post by **dwscblr** » Mon Aug 02, 2004 3:35 pm

We have a hashfile containing 5 columns. Job 1 and Job2 use this hash file for reference lookups.

Building two hashfiles instead of one common hashfile is not an option available to us.

Job 1 uses 4 columns of the Hashfile and job 2 uses 2 columns of the Hashfile.

As of now for both jobs, while importing the Hashfile definition, we have imported all 5 columns comprising of the key.

My question is can I use only the required columns from the hashfile for each job. i.e. for job 1 use only the 4 columns required, for job2 use only the 2 columns required.

Does this improve performance, degrade performance, no impact on performance?

mhester · Post by **mhester** » Mon Aug 02, 2004 3:45 pm

In both scenarios you will need to load the metadata to define the key. If not all of the other columns are used in a particular job or lookup then you do not need to reference or load this metadata.

You will want to ensure that the metadata is positionally correct in the hash ie., key columns are in position 0 column1 is 1 etc... DataStage handles this when it is allowed to manage the creation and deletion of the hash.

Regards,

dhiraj · Post by **dhiraj** » Mon Aug 02, 2004 3:55 pm

datastage stores all the columns of a record in a sequence. and since your record seems to be very small it should make no difference to performance. whether you retrieve the selective columns or all the columns of the record, datastage has to apply the same hashing algorithm to your keys.

rasi · Post by **rasi** » Mon Aug 02, 2004 7:50 pm

Hi

The answer to your question is "YES" it does improve the performance if you reduce the number of columns in your hash file. Normally while creating the hash file(Lookup) you need to use only the columns which you will be using in your jobs. Don't load unnecessary columns which takes up resource.

In your case it depends on what your job1 and job2 needs as lookup?. Is both the jobs need the same number records in the lookup?.

Job1 uses 4 out of 5 columns in your hash file. Is Job2 uses the same key fields and 2 columns which is part of Job1. Bases on these questions you can decide what to do.

Thanks
Siva

ray.wurlod · Post by **ray.wurlod** » Mon Aug 02, 2004 10:37 pm

Also investigate the "shared" hashed file capability, where the two (or more) jobs can share the same cached hashed file, instead of setting up two (or more) cached copies of the same hashed file.