Hash file performance issue

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
azens
Premium Member
Premium Member
Posts: 24
Joined: Tue Feb 25, 2003 11:59 pm

Hash file performance issue

Post by azens »

We are migrating from DataStage Server 7.5 on windows to 8.5 on unix(Soliaris) and our DW is SQL server 2005. Recently we found performance issue for hash files. A job simply reads data from DB via DRS stage and write into hash file. In 7.5 windows, 44 millions records, it takes 30 minutes but in 8.5 unix, it takes 1 hours and 5 minutes.

At the first place, we thought it's disk configuration issue, however, we change output to sequential file instead of hash file and it takes less than 10 minutes. In the same environemnt, a parallel job uses same query and write into dataset, it takes less than 5 minutes. We have tried different configuration for hash file, but no better performance is gained. According to disk monitoring stats, there's no difference to sequential file, hash file or dataset and it shows disk performance is very good.

We can 99% assure that there is something wrong when DataStage flushes from hash file cache onto disk. It runs ok when wtiting into hash file write-cache and Rows/Sec is high and steady. But then Rows/Sec is freezing when flushing from cache onto file(disk) and it may take longer than 10 minutes without any change of numbers. We have been stuck with this issue for a week and no further improvement. Anyone has similar problem? or suggestions? Much appreciated.

regards,
Ming
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That 'freezing' is to be expected while records are flushing to disk. First suggestion - don't use writing caching. Second suggestion - create the hashed file with a large minimum modulus. Use the Hashed File Calculator to get an idea of what the value should be.
-craig

"You can never have too many knives" -- Logan Nine Fingers
azens
Premium Member
Premium Member
Posts: 24
Joined: Tue Feb 25, 2003 11:59 pm

Post by azens »

chulett wrote:That 'freezing' is to be expected while records are flushing to disk. First suggestion - don't use writing caching. Second suggestion - create the hashed file with a large minimum modulus. Use the ...


Thanks Chulett. I just registered premiunm membership but I still cannot see your full reply. I think I have done your suggestion for testing but the results do not make much different. We also did some change to ZFS configuration but no much improvement. Do you think the IO size matters? Our administrator found when writing hash file, the IO size is only 2K while the disk block size is 128K. Thanks.

regards,
Ming
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes, your file system can make a huge difference. Keep working with your Administrator to improve that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
azens
Premium Member
Premium Member
Posts: 24
Joined: Tue Feb 25, 2003 11:59 pm

Post by azens »

We use ZFS as file system. Is that good option for DataStage 8.5 Server?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Honestly, I don't know squat about ZFS and I really wouldn't worry about which filesystem you are using, rather worry about it being configured correctly and optimized for the type of I/O you are seeing.
-craig

"You can never have too many knives" -- Logan Nine Fingers
azens
Premium Member
Premium Member
Posts: 24
Joined: Tue Feb 25, 2003 11:59 pm

Post by azens »

I think we already solved this issue. The solution is changing ZFS to UFS but we still dont know why ZFS is not working well with hash file.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I would think your official support provider could help answer that question, or perhaps even provide pointers on setting up ZFS appropriately.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Hashed files - and note that there's no such thing as a "hash file" - are accessed randomly, not sequentially. The unit of data transfer is the hashed file's group size (by default 2KB, tunable to other values). A particular "group" is identified by its offset from the beginning of the file. There is no benefit in caching the next few pages, or moving larger volumes of data, because the next access request may be for somewhere nowhere near the previous one.

For not too large hashed files you can cheat, and lock them into a single group, which means that the entire hashed file is read into memory at the first access, then (probably) read from memory for remaining accesses. But it is a cheat, and not as fast as a true memory cache.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
azens
Premium Member
Premium Member
Posts: 24
Joined: Tue Feb 25, 2003 11:59 pm

Post by azens »

chulett wrote:I would think your official support provider could help answer that question, or perhaps even provide pointers on setting up ZFS appropriately.
Unfortunately, IBM support said it's out of their scope. No we got another problem, unix administrator may not allow us to mix ZFS and UFS. We can go either one but that will cost plenty of time to convert existing ZFS to UFS on production.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

If it is AIX , IBM will give you the solution, where as this is from the opponent!. So that you got this answer from IBM!


Best of luck.

DS User
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted, AIX is an IBM product so I can't see how it's "out of their scope". You just may need a different support organization there. Also, a System Administrator should be able to configure the filesystem to be optimal for random access of small sizes (as Ray notes) regardless of which one is being used. IMHO.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply