We are having problems with creating hashed files.
We have to perform lookups against large data - either 3 million rows with 50 cols or 20 million rows with few cols.
When creating this file, at some point the performance just slows and the file seems to stop growing. We've looked - the file is well below the 2GB limit.
We'd like to know what the optimal settings are to create hashed files with more than 100 000 rows.
Optimal settings to create a large Hashed file
Moderators: chulett, rschirm, roy
Periodic resizing will give the appearance the file is now longer growing because Monitor shows no more rows flowing to the file, but looking in the physical directory you will find the actual DATA and OVER files are changing. You also need to look at the performance of the process writing the file (prstat, topas, glance, top, etc) and see if it is using the full cpu. If the job is SEQ --> XFM --> HASH, the job should either be fully using a cpu or waiting on the disks to catch up with growing/receiving the data.
The best thing you can do is pre-size the file with a minimum modulus setting based on the high-watermark expectation.
The best thing you can do is pre-size the file with a minimum modulus setting based on the high-watermark expectation.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Forget about memory, it's very light. Worry about what I posted. First verify the problem. If your issue is disk contention, talking about other things doesn't matter. Analyze your system and see what's going on.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
DataStage reports rows flowing when they are flowing. If they are flowing into the cache, fine, you get good rates. But the clock keeps running when the rows are being flushed to disk, even though no more rows are flowing. So the rate appears to diminish.
Optimal depends primarily on the combination of row size, number of rows and internal storage overheads. That's why Hashed File Calculator exists.
Optimal depends primarily on the combination of row size, number of rows and internal storage overheads. That's why Hashed File Calculator exists.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.