Page 1 of 1

Hashed File loading getting slow

Posted: Tue Jul 03, 2007 8:49 am
by helpmeout
Hi All,

I have problem to load data to Hashed file from Oracle database.
Its working fine before, but since last week its taking more time to load.

Job Design:

Oracle database(ORAOCI9)--->Hashed File stage (hashed file for Look-up for further design)


I did
1) Read from Oracle ,and in transformer i set @NULL ,as constrain,its reading as normal speed(like 3000/secs)
2) read from oracle,-->transformer-->sequential file (like 2000/secs)
3) Delete hash file and create new one.
4) Verify all buffering option at Project level,its same.

notes:- There is no change in job design.

But its same.loading as speed of 300 rows/secs.
So is there any way to tune hashed file.or how to debug this issue.

Any help would be appreciated.

Thanks.

Posted: Tue Jul 03, 2007 9:25 am
by aakashahuja
Please provide more information

1. Is it a static / dynamic hashed file???
2. How many records are to be loaded in total??
3. After loading how many records approx, the rows per sec value dips down (even if it dips down gradually)???


Moreover, if it is juat an @NULL constraint on a column, then what difference will it make to put the constarint in sql and remove the transformer??

Posted: Tue Jul 03, 2007 9:36 am
by helpmeout
Its Dynamic (30) hashed file
0.3 million records are there.
Its down from initial state only.
if it is juat an @NULL constraint on a column, then what difference will it make to put the constarint in sql and remove the transformer??
I did not get your question?

Thanks.

Posted: Tue Jul 03, 2007 9:39 am
by aakashahuja
Is there any data in the hashed file overflow ???

Posted: Tue Jul 03, 2007 9:47 am
by helpmeout
yes ,there is 7,424.000 size shows in Unix box.

Posted: Tue Jul 03, 2007 9:47 am
by helpmeout
aakashahuja wrote:Is there any data in the hashed file overflow ???
yes ,there is 7,424.000 size shows in Unix box.

Posted: Tue Jul 03, 2007 9:50 am
by aakashahuja
Right... could be a tuning issue....

First of all, what exactly are you using your transformer for?? (I am sorry but it is not clear from your post)

Then,

What are teh record sizes you are loading into your hashed file???
Did you create the hashed file with all the default parametrs???

From your initial post, I am understanding that when you write to a seq file, you get about 2000 -3000 rps. When you do the same thing to a hash file, rps dips.... is my understanding correct about the order of execution which you have described???

Thanks

Posted: Tue Jul 03, 2007 9:56 am
by ray.wurlod
This is expected behaviour if write cache is used.

Running data into the cache is very fast. Once all the rows are processed, the cache must be flushed to disk. However, the clock keeps running during this process, so the rows/sec figure (which I have already decried as meaningless) appears to fall.

The same condition obtains if the volume of data exceeds the size of the cache, and DataStage begins to write directly to disk.

Finally, you would expect the rate to fall even if the write cache is disabled. As the dynamic hashed file grows, writes are random to a larger population of groups, so there is more (and larger) seek activity in the file system as adjacent records are written to non-adjacent groups.

Posted: Tue Jul 03, 2007 10:00 am
by aakashahuja
You are a genius Ray isnt it.... :D That answers a part of my thread as well.... (doesnt mean that I dont want more information... the sponge can still soak more in my thread.... :lol: )

Cheers

Posted: Tue Jul 03, 2007 10:33 am
by helpmeout
aakashahuja wrote:Right... could be a tuning issue....

First of all, what exactly are you using your transformer for?? (I am sorry but it is not clear from your post)

Then,

What are teh record sizes you are loading into your hashed file???
Did you create the hashed file with all the default parametrs???

From your initial post, I am understanding that when you write to a seq file, you get about 2000 -3000 rps. When you do the same thing to a hash file, rps dips.... is my understanding correct about the order of execution which you have described???

Thanks
I am using transformer only for debug purpose.I really don't need it in my job design.And my hashed file created with default paramaters.so can you describe me what would be the solution.(Allow stage write catch-not enable)

Posted: Tue Jul 03, 2007 1:07 pm
by ray.wurlod
The only "solution" involves managing your expectations (in terms of elapsed time) better and realizing that rows/sec is an almost entirely meaningless metric in this context.

Rows/sec doesn't matter if you're not watching the job. So don't watch it.

Posted: Wed Jul 04, 2007 2:44 am
by aakashahuja
First, you need to find out how Minimum Modulus and Group size you gonna need....

I will tell you my way....

I take the record size, divide 2048 by record size and then I divide 4096 by record size. Where ever I get a lower value of remainder, I choose that as my Group Size.

P.S:- Record size will not be the size as per your data types. I assume that you know the details to calculate.

Then u simply need to calculate the modulus, using this formula

Modulus = ( Num of rows * Avg record length ) / (Group size * 2048 * 0.8)

In the above calculated value, you can add another 10 % or 20 % as per your data if it is expected to grow.

Hope it helps.

Cheers
Aakas

Posted: Wed Jul 04, 2007 2:45 am
by aakashahuja
oops typo..!!! ... the formula for modulus is

Modulus = ( Num of rows * Avg record length ) / (Group size * 2048 * 0.8)

Posted: Wed Jul 04, 2007 2:46 am
by aakashahuja
I dont know y its happening.. but that smily is actually 0.8... sometimes I hate these smilies...

Posted: Wed Jul 04, 2007 5:53 am
by hamzaqk
or u can use the Hash file calcultor which is provided with datastage to calculate modulus... its called HFC.exe... easy and always works great !