hash file (100 % hard disk usage)

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
knowledge
Participant
Posts: 101
Joined: Mon Oct 17, 2005 8:14 am

hash file (100 % hard disk usage)

Post by knowledge »

Hi ,
I am having problem using hash file ,
I am processing about 4.5 millions records , passing through hash file to remove duplicates , but it is using all i/o and its using 100 % harddisk ,its slows down the server , wait queue becomes more than 3 ............

Is it because it is exceding hash file storage limit ?
it is type 30 dynamic file
,
What can be the problem?

Earlier i had designed a job where I was appending records in the same hash file and creating that hash file in other job .......but I got the same problem , if i start that job , it takes all i/o and hardisk becomes 100 %full .so i changed the design ,
but now in one of my job As menstioned above I am simply passing records through hash file , but its slowing whole server .....harddisk becomes 100 %
please suggest ,
thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What do you expect? A hashed file is on disk - you are writing 4.5 million rows to disk. Not much else is happening, so it shows the disk subsystem to be 100% busy. That ought to be no surprise.

The correct terminology is hashed file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
knowledge
Participant
Posts: 101
Joined: Mon Oct 17, 2005 8:14 am

Post by knowledge »

[quote="ray.wurlod"]What do you expect? A hashed file is on disk - you are writing 4.5 million rows to disk. Not much else is happening, so it shows the disk subsystem to be 100% busy. That ought to be no surprise.

...[/quote]
Sorry I donot have premium membership.can u please tell me what is the solution ,
Thanks
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Ray answered your question in the publicly visible portion. You don't have a problem, so there is no solution. If you wish to speed this up you will need to talk to your system administrator and discuss what options you have on your hardware platform and UNIX OS to tune I/O buffers.
knowledge
Participant
Posts: 101
Joined: Mon Oct 17, 2005 8:14 am

Post by knowledge »

Thanks for the replay ,

I talked to him , he said I have to tune hash file ..........
knowledge
Participant
Posts: 101
Joined: Mon Oct 17, 2005 8:14 am

Post by knowledge »

Thanks for the replay ,

I talked to him , he said I have to tune hash file ..........
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The only tuning you can do in this case that would make any difference is to pre-set the MINIMUM.MODULUS to a higher number. You will still end up with 100% I/O activity.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not quite true. If 4.5 million rows is smaller than the hashed file write cache size then the use of write cache could be explored, as could not loading unnecessary rows and columns into the hashed file in the first place. It might also be worth exploring the use of static hashed file but, when push comes to shove, 4.5 million rows still have to be written to disk. Write cache will probably make a big difference by removing unproductive seek activity from the equation.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

For less than 30c per day you CAN be a premium member and have full access to all the benefits of premium membership. This money is 100% devoted to defraying the hosting and bandwidth costs incurred by DSXchange, so by obtaining premium membership you are helping to keep DSXchange alive.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply