We have a hashfile containing 5 columns. Job 1 and Job2 use this hash file for reference lookups.
Building two hashfiles instead of one common hashfile is not an option available to us.
Job 1 uses 4 columns of the Hashfile and job 2 uses 2 columns of the Hashfile.
As of now for both jobs, while importing the Hashfile definition, we have imported all 5 columns comprising of the key.
My question is can I use only the required columns from the hashfile for each job. i.e. for job 1 use only the 4 columns required, for job2 use only the 2 columns required.
Does this improve performance, degrade performance, no impact on performance?
Hash file: Selecting all/few columns: Performance Impact
Moderators: chulett, rschirm, roy
In both scenarios you will need to load the metadata to define the key. If not all of the other columns are used in a particular job or lookup then you do not need to reference or load this metadata.
You will want to ensure that the metadata is positionally correct in the hash ie., key columns are in position 0 column1 is 1 etc... DataStage handles this when it is allowed to manage the creation and deletion of the hash.
Regards,
You will want to ensure that the metadata is positionally correct in the hash ie., key columns are in position 0 column1 is 1 etc... DataStage handles this when it is allowed to manage the creation and deletion of the hash.
Regards,
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
Hi
The answer to your question is "YES" it does improve the performance if you reduce the number of columns in your hash file. Normally while creating the hash file(Lookup) you need to use only the columns which you will be using in your jobs. Don't load unnecessary columns which takes up resource.
In your case it depends on what your job1 and job2 needs as lookup?. Is both the jobs need the same number records in the lookup?.
Job1 uses 4 out of 5 columns in your hash file. Is Job2 uses the same key fields and 2 columns which is part of Job1. Bases on these questions you can decide what to do.
Thanks
Siva
The answer to your question is "YES" it does improve the performance if you reduce the number of columns in your hash file. Normally while creating the hash file(Lookup) you need to use only the columns which you will be using in your jobs. Don't load unnecessary columns which takes up resource.
In your case it depends on what your job1 and job2 needs as lookup?. Is both the jobs need the same number records in the lookup?.
Job1 uses 4 out of 5 columns in your hash file. Is Job2 uses the same key fields and 2 columns which is part of Job1. Bases on these questions you can decide what to do.
Thanks
Siva
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Also investigate the "shared" hashed file capability, where the two (or more) jobs can share the same cached hashed file, instead of setting up two (or more) cached copies of the same hashed file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.