Page 1 of 2

analyze.shm

Posted: Tue Jan 06, 2009 2:22 pm
by attu
hi,
I want to find out how my hashed file is tuned.
Trying to run the analyze.shm command, but it is not in the VOC.
How to add it in VOC?

Thanks

Posted: Tue Jan 06, 2009 2:28 pm
by attu
sorry. it should have been analyze.file not analyze.shm.

Posted: Tue Jan 06, 2009 2:36 pm
by attu
Can someone tell me the syntax for this command?

Code: Select all

>ANALYZE.FILE
File name        =  "/dsadm/hash/myhashfile"
Must specify file name.

Posted: Tue Jan 06, 2009 2:53 pm
by narasimha
First establish a pointer in the VOC by issuing the command

Code: Select all

SETFILE /dsadm/hash/myhashfile myhashfile;
Next

Code: Select all

ANALYZE.FILE myhashfile;

Posted: Tue Jan 06, 2009 3:06 pm
by attu
[quote="narasimha"]First establish a pointer in the VOC by issuing the command

Code: Select all

SETFILE /dsadm/hash/myhashfile myhashfile;
I get this message

what do you want to call it in your VOC file =

Posted: Tue Jan 06, 2009 3:09 pm
by attu
thanks.

here is the output.

File type .................. DYNAMIC
Hashing Algorithm .......... GENERAL
No. of groups (modulus) .... 12003 current ( minimum 1 )
Large record size .......... 1628 bytes
Group size ................. 2048 bytes
Load factors ............... 80% (split), 50% (merge) and 80% (actual)
Total size ................. 32661504 bytes


is it badly tuned?

Posted: Tue Jan 06, 2009 3:18 pm
by narasimha
attu,

That would depend on your requirement.
There is a small application called HFC.exe available on your Datastage Installation CD. This can help you tune your hashed file.

Posted: Tue Jan 06, 2009 3:21 pm
by ray.wurlod
Lose the semi-colons.

Posted: Tue Jan 06, 2009 3:24 pm
by attu
Thanks Narasimha.

The issue is that we are trying to do lookup from hashed files and the throughput is very low, it is like 17 rows/sec.

Is there any issue with the hashed file index or do we need to recreate the hashed files again?

Is there any other thing I can do to improve the performance?

Thanks

Posted: Tue Jan 06, 2009 3:24 pm
by ray.wurlod
Without the STATISTICS keyword, ANALYZE.FILE reports only the tuning settings (the parameters that can be set when the hashed file is created). There is no way to tell from that whether the hashed file is well tuned.

Add this keyword to have sizing information reported.
ANALYZE.FILE myhashedfile STATISTICS

Note, however, that a dynamic hashed file is a moving target; as the data volume to be stored in it changes, it will automatically alter its shape (in particular the number of groups, or modulus).

Therefore "tuned" is an ephemeral characteristic.

Posted: Tue Jan 06, 2009 3:27 pm
by attu
Thanks Ray. I ran it as you suggested

here is the o/p

File type .................. DYNAMIC
Hashing Algorithm .......... GENERAL
No. of groups (modulus) .... 12003 current ( minimum 1, 0 empty,
3896 overflowed, 1 badly )
Number of records .......... 297932
Large record size .......... 1628 bytes
Number of large records .... 0
Group size ................. 2048 bytes
Load factors ............... 80% (split), 50% (merge) and 80% (actual)
Total size ................. 32661504 bytes
Total size of record data .. 18122595 bytes
Total size of record IDs ... 1787593 bytes
Unused space ............... 12747220 bytes
Total space for records .... 32657408 bytes

any advise?

Posted: Tue Jan 06, 2009 3:50 pm
by ray.wurlod
Yes. Never react to a single sample. Monitor over time - maybe four weekly samples.

Posted: Tue Jan 06, 2009 6:10 pm
by narasimha
attu wrote: No. of groups (modulus) .... 12003 current ( minimum 1, 0 empty,
3896 overflowed, 1 badly )
Not sure what the meaning of "badly" is in the above context?

Posted: Tue Jan 06, 2009 6:20 pm
by ray.wurlod
More than one secondary buffer in the group. One out of 12003 is not a problem.

Posted: Fri Jan 09, 2009 11:03 am
by attu
Thanks for the info guys,
Our issue us still not resolved. we are doing couple of lookups using hashed file and the throughput is very slow e.g. 17 rows/sec.
I tried tuning the performance by increasing row-buffering to 1024KB but still does not helps.
what other options do I have, we are using a Dynamic 30 hashed file, can i resize it ? The memory on the box is also 100% when I run nmon, and hashed files are using pre-load file to memory option, can i disable that?

My job design is

Code: Select all


I/P --> Link Collector --> Transformer <--Hashed File
                                            |
                                            |
                                      Transformer <-- Hahsed file
                 |                          |
                 |        HF -->  Transformer <-- Hashed File      
                I/P                        |
                                            |
                          HF   --> Transformer      
                                            |
                                           Seq File
Appreciate your input.
Thanks