analyze.shm

attu · Post by **attu** » Tue Jan 06, 2009 2:22 pm

hi,
I want to find out how my hashed file is tuned.
Trying to run the analyze.shm command, but it is not in the VOC.
How to add it in VOC?

Thanks

attu · Post by **attu** » Tue Jan 06, 2009 2:28 pm

sorry. it should have been analyze.file not analyze.shm.

attu · Post by **attu** » Tue Jan 06, 2009 2:36 pm

Can someone tell me the syntax for this command?

Code: Select all

>ANALYZE.FILE
File name        =  "/dsadm/hash/myhashfile"
Must specify file name.

narasimha · Post by **narasimha** » Tue Jan 06, 2009 2:53 pm

First establish a pointer in the VOC by issuing the command

Code: Select all

SETFILE /dsadm/hash/myhashfile myhashfile;

ANALYZE.FILE myhashfile;

attu · Post by **attu** » Tue Jan 06, 2009 3:06 pm

[quote="narasimha"]First establish a pointer in the VOC by issuing the command

Code: Select all

SETFILE /dsadm/hash/myhashfile myhashfile;

I get this message

what do you want to call it in your VOC file =

attu · Post by **attu** » Tue Jan 06, 2009 3:09 pm

thanks.

here is the output.

File type .................. DYNAMIC
Hashing Algorithm .......... GENERAL
No. of groups (modulus) .... 12003 current ( minimum 1 )
Large record size .......... 1628 bytes
Group size ................. 2048 bytes
Load factors ............... 80% (split), 50% (merge) and 80% (actual)
Total size ................. 32661504 bytes

is it badly tuned?

narasimha · Post by **narasimha** » Tue Jan 06, 2009 3:18 pm

attu,

That would depend on your requirement.
There is a small application called HFC.exe available on your Datastage Installation CD. This can help you tune your hashed file.

ray.wurlod · Post by **ray.wurlod** » Tue Jan 06, 2009 3:21 pm

Lose the semi-colons.

attu · Post by **attu** » Tue Jan 06, 2009 3:24 pm

Thanks Narasimha.

The issue is that we are trying to do lookup from hashed files and the throughput is very low, it is like 17 rows/sec.

Is there any issue with the hashed file index or do we need to recreate the hashed files again?

Is there any other thing I can do to improve the performance?

Thanks

ray.wurlod · Post by **ray.wurlod** » Tue Jan 06, 2009 3:24 pm

Without the STATISTICS keyword, ANALYZE.FILE reports only the tuning settings (the parameters that can be set when the hashed file is created). There is no way to tell from that whether the hashed file is well tuned.

Add this keyword to have sizing information reported.
ANALYZE.FILE myhashedfile STATISTICS

Note, however, that a dynamic hashed file is a moving target; as the data volume to be stored in it changes, it will automatically alter its shape (in particular the number of groups, or modulus).

Therefore "tuned" is an ephemeral characteristic.

attu · Post by **attu** » Tue Jan 06, 2009 3:27 pm

Thanks Ray. I ran it as you suggested

here is the o/p

File type .................. DYNAMIC
Hashing Algorithm .......... GENERAL
No. of groups (modulus) .... 12003 current ( minimum 1, 0 empty,
3896 overflowed, 1 badly )
Number of records .......... 297932
Large record size .......... 1628 bytes
Number of large records .... 0
Group size ................. 2048 bytes
Load factors ............... 80% (split), 50% (merge) and 80% (actual)
Total size ................. 32661504 bytes
Total size of record data .. 18122595 bytes
Total size of record IDs ... 1787593 bytes
Unused space ............... 12747220 bytes
Total space for records .... 32657408 bytes

any advise?

ray.wurlod · Post by **ray.wurlod** » Tue Jan 06, 2009 3:50 pm

Yes. Never react to a single sample. Monitor over time - maybe four weekly samples.

narasimha · Post by **narasimha** » Tue Jan 06, 2009 6:10 pm

attu wrote: No. of groups (modulus) .... 12003 current ( minimum 1, 0 empty,
3896 overflowed, 1 badly )

Not sure what the meaning of "badly" is in the above context?

ray.wurlod · Post by **ray.wurlod** » Tue Jan 06, 2009 6:20 pm

More than one secondary buffer in the group. One out of 12003 is not a problem.

attu · Post by **attu** » Fri Jan 09, 2009 11:03 am

Thanks for the info guys,
Our issue us still not resolved. we are doing couple of lookups using hashed file and the throughput is very slow e.g. 17 rows/sec.
I tried tuning the performance by increasing row-buffering to 1024KB but still does not helps.
what other options do I have, we are using a Dynamic 30 hashed file, can i resize it ? The memory on the box is also 100% when I run nmon, and hashed files are using pre-load file to memory option, can i disable that?

My job design is

Code: Select all


I/P --> Link Collector --> Transformer <--Hashed File
                                            |
                                            |
                                      Transformer <-- Hahsed file
                 |                          |
                 |        HF -->  Transformer <-- Hashed File      
                I/P                        |
                                            |
                          HF   --> Transformer      
                                            |
                                           Seq File

Appreciate your input.
Thanks