Creating hashed file takes too long
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 397
- Joined: Wed Apr 12, 2006 2:28 pm
- Location: Tennesse
Creating hashed file takes too long
Hi All , I am running job which does vertical pivot creating 150 columns in output hashed file, I have 10 million rows in input seq file . The job takes 11 hrs to create the hash file ....i am using link patitioner and link collector but it is not helping ....can any one please suggest me how can I improve the performance?
Thanks
in
Advance
Thanks
in
Advance
hi sam here
-
- Premium Member
- Posts: 397
- Joined: Wed Apr 12, 2006 2:28 pm
- Location: Tennesse
Chulett I am using stage variables in transformer instead of pivot stage for doing vertical pivot ....my job design will be
Code: Select all
Seq file --- link partitioner----4 streams of trnfrm ----- link collector ----- hashed file
hi sam here
-
- Charter Member
- Posts: 560
- Joined: Wed Jul 13, 2005 5:36 am
- Location: Ohio
From the initial looks of it, your hash file looks corrupted. To ensure, open the hash file folder through Unix. If its corrupted you will see a large number of files in there instead of three expected ones (you may not be able to open the hash file folder )
Rename the hash file and run the job again.
Rename the hash file and run the job again.
To test if the hashed file is the real culprit - split your job into two.
First job - Write your output into a sequential file.
Second job - Read from the sequential file and write to the hashed file.
This way you can identify, whether its your job design or the hashed file is the bottleneck.
First job - Write your output into a sequential file.
Second job - Read from the sequential file and write to the hashed file.
This way you can identify, whether its your job design or the hashed file is the bottleneck.
Narasimha Kade
Finding answers is simple, all you need to do is come up with the correct questions.
Finding answers is simple, all you need to do is come up with the correct questions.
further to the above - if your hash file IS the culprit - u may well need to tune it as the default settings in terms of modulus , seperation , type (i.e. wholly numeric key , numeric variance to right etc) are often not best suited to your data.
I good place to start is the hashfile calculator (should be on utilities on client folder of installation CD) or the HASH.HELP / HASH.ANALYZE commands that can be used from the Admin tool
I good place to start is the hashfile calculator (should be on utilities on client folder of installation CD) or the HASH.HELP / HASH.ANALYZE commands that can be used from the Admin tool
-
- Premium Member
- Posts: 397
- Joined: Wed Apr 12, 2006 2:28 pm
- Location: Tennesse
pre-allocating the hashed file MINIMUM.MODULUS might make a difference. After the job completes, do a "HASH.HELP {file}" and it will tell you the current MODULUS, then do a "RESIZE {file} MINIMUM.MODULUS {size}" and re-run to see if the speed changes.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Another thing to consider with the hashed file is that it has an upper limit. If you are processing 10 million rows into a hashed file with 150 columns per row, depending on the size of the data, you could be shooting yourself in the foot.
Are you using a dynamic hashed file specification? If so, make sure you start with a monstrous mod for it so it doesn't have to reallocate too often. Also, you might want to consider pivoting the file at the OS level instead of doing it in a job. I posted Perl code that will do it for you, though it wasn't for this many fields. Still, it should be pretty efficient once fleshed out. You can find my Perl utilities at viewtopic.php?t=113429&highlight=perl+scripts. It is the next to the last script in the post. If you need assistance with it, let me know and I'll see if I can get you going.
Are you using a dynamic hashed file specification? If so, make sure you start with a monstrous mod for it so it doesn't have to reallocate too often. Also, you might want to consider pivoting the file at the OS level instead of doing it in a job. I posted Perl code that will do it for you, though it wasn't for this many fields. Still, it should be pretty efficient once fleshed out. You can find my Perl utilities at viewtopic.php?t=113429&highlight=perl+scripts. It is the next to the last script in the post. If you need assistance with it, let me know and I'll see if I can get you going.
Bestest!
John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services
"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services
"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
If I understand you correctly, the minimum mod is 1 with a max of 999999. I don't see why you would need the admin tool to see this.
Bestest!
John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services
"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services
"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The values with which the hashed file is created are on the Options tab of the Hashed File or UniVerse stage. Subsequently you can use an ANALYZE.FILE command to determine the current settings.
Use the Hashed File Calculator to recommend an appropriate setting.
This may be a case where a static hashed file (for example Type 2 or Type 18) performs better than a dynamic hashed file, particularly if the record sizes are relatively homogeneous.
Use the Hashed File Calculator to recommend an appropriate setting.
This may be a case where a static hashed file (for example Type 2 or Type 18) performs better than a dynamic hashed file, particularly if the record sizes are relatively homogeneous.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.