Hash File Performance Degrading ??
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 112
- Joined: Mon Jul 11, 2005 7:54 am
Hash File Performance Degrading ??
Hi All,
I have a server job where in the data is ported from a sequential file to a hash file . The record count is 2.4 million. Though this is a straight move to the hash I see that some times the number of rows read per second starts with 3456 but gradually comes down to 335.I found this when I tried to monitor the job. The number of columns are 67.
Regards,
Vinod
I have a server job where in the data is ported from a sequential file to a hash file . The record count is 2.4 million. Though this is a straight move to the hash I see that some times the number of rows read per second starts with 3456 but gradually comes down to 335.I found this when I tried to monitor the job. The number of columns are 67.
Regards,
Vinod
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It's hashed file, not hash file.
Performance is fine. It's the meaningless figures that are the issue here. Ignore them. Data flow rapidly into the write cache. Then, while no more rows are flowing but the cache is being dumped to disk, the clock keeps running, so that the rows/sec figure keeps falling. Ignore it.
And even if you are not using write cache, as the size of the hashed file on disks grows and you are writing to random groups, the duration of the seek between the random pages in the structure gets longer and longer, reducing the total productive time available for writing data.
Performance is fine. It's the meaningless figures that are the issue here. Ignore them. Data flow rapidly into the write cache. Then, while no more rows are flowing but the cache is being dumped to disk, the clock keeps running, so that the rows/sec figure keeps falling. Ignore it.
And even if you are not using write cache, as the size of the hashed file on disks grows and you are writing to random groups, the duration of the seek between the random pages in the structure gets longer and longer, reducing the total productive time available for writing data.
Last edited by ray.wurlod on Mon Aug 20, 2007 3:39 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Re: Hash File Performance Degrading ??
Vinodanand wrote: the number of rows read per second starts with 3456 but gradually comes down to 335.I found this when I tried to monitor the job. The number of columns are 67.
67 columns in a hashed file? Why and how do you plan to use that hashed file?
There could be a fair amount of chance that your file size exceeded 2GB (32 bit DS server) and the throughput completely went down as it exceeded the limit.
There are very good articles on the best use of hashed files in this portal somewhere. Search for them.
There are several posts on this issue either, found one myself see if that helps you:
http://dsxchange.com/viewtopic.php?t=10 ... 006ccfe017
Best regards,
~Kris
-
- Premium Member
- Posts: 112
- Joined: Mon Jul 11, 2005 7:54 am
Hi ,
I am reading from a flat file and wirting it directly to a hash file. As teh file size reaches Seven hundred thousand I can see the write becoming really slow.Its like the First Seven hundred thousand rows get written in 8 mins and the remaining data out of a total 2.4 million takes 3 hours.I also tried increasing the minimum modulus,still it is kiiling . Do i need to change the file type or create enable teh caching attributes(write defferred).
Thanks,
Vinod
I am reading from a flat file and wirting it directly to a hash file. As teh file size reaches Seven hundred thousand I can see the write becoming really slow.Its like the First Seven hundred thousand rows get written in 8 mins and the remaining data out of a total 2.4 million takes 3 hours.I also tried increasing the minimum modulus,still it is kiiling . Do i need to change the file type or create enable teh caching attributes(write defferred).
Thanks,
Vinod
You haven't told what you have set the file type to, so it is impossible to recommend changing it. The cacheing/buffering attributes won't make a performance difference in this case.
If you have a dynamice file, let the job run to completion, then check the modulus of the file and reset the MINIMUM.MODULUS to that value; that way you will avoid the dynamic resizing when writing to the file.
You haven't been reading what Ray is writing about the rows/second display in the job. Due to the way buffering works, the numbers displayed there are skewed. If you really wish to see the actual write speed, try doing a COUNT {filename} every couple of minutes to see actual performance.
Effective over the whole run, how my Kb/Second are you writing?
If you have a dynamice file, let the job run to completion, then check the modulus of the file and reset the MINIMUM.MODULUS to that value; that way you will avoid the dynamic resizing when writing to the file.
You haven't been reading what Ray is writing about the rows/second display in the job. Due to the way buffering works, the numbers displayed there are skewed. If you really wish to see the actual write speed, try doing a COUNT {filename} every couple of minutes to see actual performance.
Effective over the whole run, how my Kb/Second are you writing?
-
- Premium Member
- Posts: 112
- Joined: Mon Jul 11, 2005 7:54 am
-
- Premium Member
- Posts: 112
- Joined: Mon Jul 11, 2005 7:54 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
HFC is indeed on the installation CD (DataStage clients), in the Utilities\Unsupported folder.
However, as Arnd says, don't concern yourself with this yet. Get the default hashed file populated, then see what tuning might be possible.
And please start calling them hashed files.
However, as Arnd says, don't concern yourself with this yet. Get the default hashed file populated, then see what tuning might be possible.
And please start calling them hashed files.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 112
- Joined: Mon Jul 11, 2005 7:54 am
-
- Premium Member
- Posts: 112
- Joined: Mon Jul 11, 2005 7:54 am
Exceeded 2GB limit
Hi All,
I just ran into a another problem . My hash file size exceeded 2GB and I created a 64 bit hash file,now and its running slower than before. The following was the command I used to create
/dsadm/Ascential/DataStage/DSEngine/bin/mkdbfile /dso/dsoweb/clients/jpmg/eligibility/logs/Hsh_PDSPersCovgPrevHist_coal_13235 30 678494 8 20 20 80 1628 -64bit.
Its a Type 30 file and when I summed up the Data 30 and Over 30 it was 2.8 GB. As I have 67 Columns,I think the number of columns is what is killing. The approximate record size is 700 bytes.
-rw-rw-r-- 1 dsosit chs 289136640 Aug 22 17:13 OVER.30
-rw-rw-r-- 1 dsosit chs 2779115520 Aug 22 17:13 DATA.30
The job design is
FlatFile --> transformer--> Hash File. No transformations in teh transformer.
Also When I look up this file it takes 6 hours as my incoming source is 2 million.
It would be great if anyone can help me out.
Regards,
Vinod
I just ran into a another problem . My hash file size exceeded 2GB and I created a 64 bit hash file,now and its running slower than before. The following was the command I used to create
/dsadm/Ascential/DataStage/DSEngine/bin/mkdbfile /dso/dsoweb/clients/jpmg/eligibility/logs/Hsh_PDSPersCovgPrevHist_coal_13235 30 678494 8 20 20 80 1628 -64bit.
Its a Type 30 file and when I summed up the Data 30 and Over 30 it was 2.8 GB. As I have 67 Columns,I think the number of columns is what is killing. The approximate record size is 700 bytes.
-rw-rw-r-- 1 dsosit chs 289136640 Aug 22 17:13 OVER.30
-rw-rw-r-- 1 dsosit chs 2779115520 Aug 22 17:13 DATA.30
The job design is
FlatFile --> transformer--> Hash File. No transformations in teh transformer.
Also When I look up this file it takes 6 hours as my incoming source is 2 million.
It would be great if anyone can help me out.
Regards,
Vinod
Don't enter the '{' and '}' characters. Every VOC has a 'ANALYZE.FILE' command, but I often mistype ANALZYE.FILE and get the same error you did.
64Bit files are going to be slower (and have more storage overhead) than the original files.
Please do the ANALYZE.FILE Hsh_PDSPersCovgPrevHist_coal_13058 and COUNT Hsh_PDSPersCovgPrevHist_coal_13058 on your {original 32 bit} file and post the results. If those commands don't work you have have serious issues in your account and performance is the least of them.
64Bit files are going to be slower (and have more storage overhead) than the original files.
Please do the ANALYZE.FILE Hsh_PDSPersCovgPrevHist_coal_13058 and COUNT Hsh_PDSPersCovgPrevHist_coal_13058 on your {original 32 bit} file and post the results. If those commands don't work you have have serious issues in your account and performance is the least of them.
-
- Premium Member
- Posts: 112
- Joined: Mon Jul 11, 2005 7:54 am
Hi Arnd,
I get the following error when I keyed in the command .
COUNT Hsh_PDSPersCovgPrevHist_coal_13058
syntax error . unexpected sentence without file name.Token was "".Scanned Command was COUNT 'Hsh_PDSPersCovgPrevHist_coal_13058'
and for command
ANALYZE.FILE Hsh_PDSPersCovgPrevHist_coal_13058
Illegal option in command line.
Please help.
My 32 bit file had the same settings as the 64 bit .I changed teh file to 64 bit as my hash grew over 2.2 GB.Please help.
[/img][/list]
I get the following error when I keyed in the command .
COUNT Hsh_PDSPersCovgPrevHist_coal_13058
syntax error . unexpected sentence without file name.Token was "".Scanned Command was COUNT 'Hsh_PDSPersCovgPrevHist_coal_13058'
and for command
ANALYZE.FILE Hsh_PDSPersCovgPrevHist_coal_13058
Illegal option in command line.
Please help.
My 32 bit file had the same settings as the 64 bit .I changed teh file to 64 bit as my hash grew over 2.2 GB.Please help.
[/img][/list]