Creating hashed file takes too long

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
samsuf2002
Premium Member
Premium Member
Posts: 397
Joined: Wed Apr 12, 2006 2:28 pm
Location: Tennesse

Creating hashed file takes too long

Post by samsuf2002 »

Hi All , I am running job which does vertical pivot creating 150 columns in output hashed file, I have 10 million rows in input seq file . The job takes 11 hrs to create the hash file ....i am using link patitioner and link collector but it is not helping ....can any one please suggest me how can I improve the performance?

Thanks
in
Advance
hi sam here
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Job design? Bottleneck? The pivot is a notoriously slow stage (especially given your metrics), your issue probably has nothing to do with the hashed file itself.
-craig

"You can never have too many knives" -- Logan Nine Fingers
samsuf2002
Premium Member
Premium Member
Posts: 397
Joined: Wed Apr 12, 2006 2:28 pm
Location: Tennesse

Post by samsuf2002 »

Chulett I am using stage variables in transformer instead of pivot stage for doing vertical pivot ....my job design will be

Code: Select all

Seq file --- link partitioner----4 streams of trnfrm ----- link collector ----- hashed file
hi sam here
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

From the initial looks of it, your hash file looks corrupted. To ensure, open the hash file folder through Unix. If its corrupted you will see a large number of files in there instead of three expected ones (you may not be able to open the hash file folder )

Rename the hash file and run the job again.
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

To test if the hashed file is the real culprit - split your job into two.
First job - Write your output into a sequential file.
Second job - Read from the sequential file and write to the hashed file.
This way you can identify, whether its your job design or the hashed file is the bottleneck.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
fridge
Premium Member
Premium Member
Posts: 136
Joined: Sat Jan 10, 2004 8:51 am

Post by fridge »

further to the above - if your hash file IS the culprit - u may well need to tune it as the default settings in terms of modulus , seperation , type (i.e. wholly numeric key , numeric variance to right etc) are often not best suited to your data.

I good place to start is the hashfile calculator (should be on utilities on client folder of installation CD) or the HASH.HELP / HASH.ANALYZE commands that can be used from the Admin tool
samsuf2002
Premium Member
Premium Member
Posts: 397
Joined: Wed Apr 12, 2006 2:28 pm
Location: Tennesse

Post by samsuf2002 »

I replaced output hashed file with sequential file in the job design i showed above , the job took 20 mins to load all 10 million in the output seq file. Later i am using another job to load seq file into hash file and its very slow like 600 row/sec.
hi sam here
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

pre-allocating the hashed file MINIMUM.MODULUS might make a difference. After the job completes, do a "HASH.HELP {file}" and it will tell you the current MODULUS, then do a "RESIZE {file} MINIMUM.MODULUS {size}" and re-run to see if the speed changes.
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

Another thing to consider with the hashed file is that it has an upper limit. If you are processing 10 million rows into a hashed file with 150 columns per row, depending on the size of the data, you could be shooting yourself in the foot.

Are you using a dynamic hashed file specification? If so, make sure you start with a monstrous mod for it so it doesn't have to reallocate too often. Also, you might want to consider pivoting the file at the OS level instead of doing it in a job. I posted Perl code that will do it for you, though it wasn't for this many fields. Still, it should be pretty efficient once fleshed out. You can find my Perl utilities at viewtopic.php?t=113429&highlight=perl+scripts. It is the next to the last script in the post. If you need assistance with it, let me know and I'll see if I can get you going.
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
vsi
Premium Member
Premium Member
Posts: 507
Joined: Wed Mar 15, 2006 1:44 pm

Post by vsi »

How can i see the min.mod in command from admin tool ?
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

If I understand you correctly, the minimum mod is 1 with a max of 999999. I don't see why you would need the admin tool to see this.
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The values with which the hashed file is created are on the Options tab of the Hashed File or UniVerse stage. Subsequently you can use an ANALYZE.FILE command to determine the current settings.

Use the Hashed File Calculator to recommend an appropriate setting.

This may be a case where a static hashed file (for example Type 2 or Type 18) performs better than a dynamic hashed file, particularly if the record sizes are relatively homogeneous.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply