Hashed File loading getting slow

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
helpmeout
Participant
Posts: 23
Joined: Thu Nov 02, 2006 2:31 pm

Hashed File loading getting slow

Post by helpmeout »

Hi All,

I have problem to load data to Hashed file from Oracle database.
Its working fine before, but since last week its taking more time to load.

Job Design:

Oracle database(ORAOCI9)--->Hashed File stage (hashed file for Look-up for further design)


I did
1) Read from Oracle ,and in transformer i set @NULL ,as constrain,its reading as normal speed(like 3000/secs)
2) read from oracle,-->transformer-->sequential file (like 2000/secs)
3) Delete hash file and create new one.
4) Verify all buffering option at Project level,its same.

notes:- There is no change in job design.

But its same.loading as speed of 300 rows/secs.
So is there any way to tune hashed file.or how to debug this issue.

Any help would be appreciated.

Thanks.
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

Please provide more information

1. Is it a static / dynamic hashed file???
2. How many records are to be loaded in total??
3. After loading how many records approx, the rows per sec value dips down (even if it dips down gradually)???


Moreover, if it is juat an @NULL constraint on a column, then what difference will it make to put the constarint in sql and remove the transformer??
L'arrêt essayant d'être parfait… évoluons.
helpmeout
Participant
Posts: 23
Joined: Thu Nov 02, 2006 2:31 pm

Post by helpmeout »

Its Dynamic (30) hashed file
0.3 million records are there.
Its down from initial state only.
if it is juat an @NULL constraint on a column, then what difference will it make to put the constarint in sql and remove the transformer??
I did not get your question?

Thanks.
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

Is there any data in the hashed file overflow ???
L'arrêt essayant d'être parfait… évoluons.
helpmeout
Participant
Posts: 23
Joined: Thu Nov 02, 2006 2:31 pm

Post by helpmeout »

yes ,there is 7,424.000 size shows in Unix box.
helpmeout
Participant
Posts: 23
Joined: Thu Nov 02, 2006 2:31 pm

Post by helpmeout »

aakashahuja wrote:Is there any data in the hashed file overflow ???
yes ,there is 7,424.000 size shows in Unix box.
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

Right... could be a tuning issue....

First of all, what exactly are you using your transformer for?? (I am sorry but it is not clear from your post)

Then,

What are teh record sizes you are loading into your hashed file???
Did you create the hashed file with all the default parametrs???

From your initial post, I am understanding that when you write to a seq file, you get about 2000 -3000 rps. When you do the same thing to a hash file, rps dips.... is my understanding correct about the order of execution which you have described???

Thanks
L'arrêt essayant d'être parfait… évoluons.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is expected behaviour if write cache is used.

Running data into the cache is very fast. Once all the rows are processed, the cache must be flushed to disk. However, the clock keeps running during this process, so the rows/sec figure (which I have already decried as meaningless) appears to fall.

The same condition obtains if the volume of data exceeds the size of the cache, and DataStage begins to write directly to disk.

Finally, you would expect the rate to fall even if the write cache is disabled. As the dynamic hashed file grows, writes are random to a larger population of groups, so there is more (and larger) seek activity in the file system as adjacent records are written to non-adjacent groups.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

You are a genius Ray isnt it.... :D That answers a part of my thread as well.... (doesnt mean that I dont want more information... the sponge can still soak more in my thread.... :lol: )

Cheers
L'arrêt essayant d'être parfait… évoluons.
helpmeout
Participant
Posts: 23
Joined: Thu Nov 02, 2006 2:31 pm

Post by helpmeout »

aakashahuja wrote:Right... could be a tuning issue....

First of all, what exactly are you using your transformer for?? (I am sorry but it is not clear from your post)

Then,

What are teh record sizes you are loading into your hashed file???
Did you create the hashed file with all the default parametrs???

From your initial post, I am understanding that when you write to a seq file, you get about 2000 -3000 rps. When you do the same thing to a hash file, rps dips.... is my understanding correct about the order of execution which you have described???

Thanks
I am using transformer only for debug purpose.I really don't need it in my job design.And my hashed file created with default paramaters.so can you describe me what would be the solution.(Allow stage write catch-not enable)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The only "solution" involves managing your expectations (in terms of elapsed time) better and realizing that rows/sec is an almost entirely meaningless metric in this context.

Rows/sec doesn't matter if you're not watching the job. So don't watch it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

First, you need to find out how Minimum Modulus and Group size you gonna need....

I will tell you my way....

I take the record size, divide 2048 by record size and then I divide 4096 by record size. Where ever I get a lower value of remainder, I choose that as my Group Size.

P.S:- Record size will not be the size as per your data types. I assume that you know the details to calculate.

Then u simply need to calculate the modulus, using this formula

Modulus = ( Num of rows * Avg record length ) / (Group size * 2048 * 0.8)

In the above calculated value, you can add another 10 % or 20 % as per your data if it is expected to grow.

Hope it helps.

Cheers
Aakas
L'arrêt essayant d'être parfait… évoluons.
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

oops typo..!!! ... the formula for modulus is

Modulus = ( Num of rows * Avg record length ) / (Group size * 2048 * 0.8)
L'arrêt essayant d'être parfait… évoluons.
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

I dont know y its happening.. but that smily is actually 0.8... sometimes I hate these smilies...
L'arrêt essayant d'être parfait… évoluons.
hamzaqk
Participant
Posts: 249
Joined: Tue Apr 17, 2007 5:50 am
Location: islamabad

Post by hamzaqk »

or u can use the Hash file calcultor which is provided with datastage to calculate modulus... its called HFC.exe... easy and always works great !
Post Reply