A small number of block size is large. Is it ok?

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
daxinz
Premium Member
Premium Member
Posts: 27
Joined: Mon Aug 17, 2009 11:29 pm

A small number of block size is large. Is it ok?

Post by daxinz »

Hi Experts,
I have a very large file A (>20 MG rows) match to another large file B (about 10 GB rows). I use QualityStage 7.0 designer.

In one of the passes, I bad better to block on: last name, first name, and Birth Year. In file B, the block has good number of records (<20). But in file A, there are about 10 blocks that have many records (>20), the largest block has 1000 records.

I read the user guid. the size of the block should be not large (around 20). If I have to use this block, how bad if a small number of blocks have large size?

Thanks in advance.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Did the job report use of block overflow? If not you're within reasonable sizes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Daxinz,

It will depend on the match type and other factors, but in general for all those blocks with large sizes the matcher will perform more comparisons so the performance will be hit .. in your case max number of comparison will be 20,000 ( 20 x 1000)

If the large size causes overflow then all exceding records should be handle in the next pass and anyway the performance will be hit


Regards
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
daxinz
Premium Member
Premium Member
Posts: 27
Joined: Mon Aug 17, 2009 11:29 pm

Post by daxinz »

I get the result. In the jobName.MTC.StepFREQLDA.log, I see
freqld(1823):Sat 30 Jan 2010 03:00:33 PM PST LOG: 937 number of overflow areas allocated

in jobName.MTC.StepFREQLDB.log, I see
freqld(17532):Sat 30 Jan 2010 03:24:06 PM PST LOG: 695 number of overflow areas allocated

This process has 7 passes. Other log files as
jobName.MTC.StepMCSORT_N.log
jobName.MTC.StepSORTA_N.log
jobName.MTC.StepSORTB_1.log
jobName.MTC.StepMTCH_N.log N=1,2,3,4,5,6,7
None of them has overflow mesage.

In case there is overflow, which of the following value(s) need to be adjusted in Advanced Run Options for Match?
Frequency Analysis Buffer Count
Index Buffer Count
Match Presort Buffer Count
Report Buffer Count
Extract Buffer Count

Thansk
Post Reply