Hi All:
We have done a couple of hash lookups on our server jobs...however there are millions and millions of rows and hence it takes for ever. Need to improve performance.
My question: Is it better to join files or do a Merge on key columns instead of Hash lookups....?
Whichever is better can you give an Example on how to do the same.
Thanks,
MJ
Performance issues with Hash files.
Moderators: chulett, rschirm, roy
Please describe your job design and row counts. Your question is too vague. Hash files can be extremely efficient if utilized correctly. In addition, multiple cpu servers can be under utilized if you do not build a job in a fashion to support multiple cpus. Either thru IPC or job instances you can divide and conquer the data.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
hi kcbland:
At the moment we are only using one CPU we havent gone parallel yet using server. Doing two hash lookup's based on two key's from different tables. In both lookup's there is a common key lookup.
Job Design:
Two Hash's going into transformer. Seq file (Input) going into Transformer and an Seq File (output) coming out of transformer.
The row's go up to 189 million and some more.
Thanks,
Mamta
At the moment we are only using one CPU we havent gone parallel yet using server. Doing two hash lookup's based on two key's from different tables. In both lookup's there is a common key lookup.
Job Design:
Two Hash's going into transformer. Seq file (Input) going into Transformer and an Seq File (output) coming out of transformer.
The row's go up to 189 million and some more.
Thanks,
Mamta
Are the hash files containing more columns than you absolutley need? This is just wasting cpu time reading all characters in a row of data. Investigate eliminating solumns from your hash lookups.
Did you look at the data and overflow files for the hash lookups? A lot of data inthe overflow portion reduces the optimized lookup effect of a hash file. Consider setting a minimum modulus when creating the hash files so that the file is not undersized once populated.
Your design is perfect for multiple instances of the job dividing the source data. By running 4 instances of the job, each using a constraint to look at 1/4th the data ( MOD(@INROWNUM,4) = 0,1,2, or 3) will allow you to use all 4 cpus. This is better than using IPC in your situation.
Did you look at the data and overflow files for the hash lookups? A lot of data inthe overflow portion reduces the optimized lookup effect of a hash file. Consider setting a minimum modulus when creating the hash files so that the file is not undersized once populated.
Your design is perfect for multiple instances of the job dividing the source data. By running 4 instances of the job, each using a constraint to look at 1/4th the data ( MOD(@INROWNUM,4) = 0,1,2, or 3) will allow you to use all 4 cpus. This is better than using IPC in your situation.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Are you telling us that there are millions of rows of data to be processed, or millions of rows in the hashed files? If the latter, some gains may be able to be had by tuning the hashed files.
This is a highly skilled task, and I would recommend hiring a competent consultant to undertake it.
This is a highly skilled task, and I would recommend hiring a competent consultant to undertake it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.