Page 1 of 1

A small number of block size is large. Is it ok?

Posted: Thu Jan 28, 2010 12:40 am
by daxinz
Hi Experts,
I have a very large file A (>20 MG rows) match to another large file B (about 10 GB rows). I use QualityStage 7.0 designer.

In one of the passes, I bad better to block on: last name, first name, and Birth Year. In file B, the block has good number of records (<20). But in file A, there are about 10 blocks that have many records (>20), the largest block has 1000 records.

I read the user guid. the size of the block should be not large (around 20). If I have to use this block, how bad if a small number of blocks have large size?

Thanks in advance.

Posted: Thu Jan 28, 2010 1:19 am
by ray.wurlod
Did the job report use of block overflow? If not you're within reasonable sizes.

Posted: Thu Jan 28, 2010 1:12 pm
by JRodriguez
Daxinz,

It will depend on the match type and other factors, but in general for all those blocks with large sizes the matcher will perform more comparisons so the performance will be hit .. in your case max number of comparison will be 20,000 ( 20 x 1000)

If the large size causes overflow then all exceding records should be handle in the next pass and anyway the performance will be hit


Regards

Posted: Sun Jan 31, 2010 11:46 pm
by daxinz
I get the result. In the jobName.MTC.StepFREQLDA.log, I see
freqld(1823):Sat 30 Jan 2010 03:00:33 PM PST LOG: 937 number of overflow areas allocated

in jobName.MTC.StepFREQLDB.log, I see
freqld(17532):Sat 30 Jan 2010 03:24:06 PM PST LOG: 695 number of overflow areas allocated

This process has 7 passes. Other log files as
jobName.MTC.StepMCSORT_N.log
jobName.MTC.StepSORTA_N.log
jobName.MTC.StepSORTB_1.log
jobName.MTC.StepMTCH_N.log N=1,2,3,4,5,6,7
None of them has overflow mesage.

In case there is overflow, which of the following value(s) need to be adjusted in Advanced Run Options for Match?
Frequency Analysis Buffer Count
Index Buffer Count
Match Presort Buffer Count
Report Buffer Count
Extract Buffer Count

Thansk