Hi Experts,
I have a very large file A (>20 MG rows) match to another large file B (about 10 GB rows). I use QualityStage 7.0 designer.
In one of the passes, I bad better to block on: last name, first name, and Birth Year. In file B, the block has good number of records (<20). But in file A, there are about 10 blocks that have many records (>20), the largest block has 1000 records.
I read the user guid. the size of the block should be not large (around 20). If I have to use this block, how bad if a small number of blocks have large size?
Thanks in advance.
A small number of block size is large. Is it ok?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 425
- Joined: Sat Nov 19, 2005 9:26 am
- Location: New York City
- Contact:
Daxinz,
It will depend on the match type and other factors, but in general for all those blocks with large sizes the matcher will perform more comparisons so the performance will be hit .. in your case max number of comparison will be 20,000 ( 20 x 1000)
If the large size causes overflow then all exceding records should be handle in the next pass and anyway the performance will be hit
Regards
It will depend on the match type and other factors, but in general for all those blocks with large sizes the matcher will perform more comparisons so the performance will be hit .. in your case max number of comparison will be 20,000 ( 20 x 1000)
If the large size causes overflow then all exceding records should be handle in the next pass and anyway the performance will be hit
Regards
Julio Rodriguez
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
I get the result. In the jobName.MTC.StepFREQLDA.log, I see
freqld(1823):Sat 30 Jan 2010 03:00:33 PM PST LOG: 937 number of overflow areas allocated
in jobName.MTC.StepFREQLDB.log, I see
freqld(17532):Sat 30 Jan 2010 03:24:06 PM PST LOG: 695 number of overflow areas allocated
This process has 7 passes. Other log files as
jobName.MTC.StepMCSORT_N.log
jobName.MTC.StepSORTA_N.log
jobName.MTC.StepSORTB_1.log
jobName.MTC.StepMTCH_N.log N=1,2,3,4,5,6,7
None of them has overflow mesage.
In case there is overflow, which of the following value(s) need to be adjusted in Advanced Run Options for Match?
Frequency Analysis Buffer Count
Index Buffer Count
Match Presort Buffer Count
Report Buffer Count
Extract Buffer Count
Thansk
freqld(1823):Sat 30 Jan 2010 03:00:33 PM PST LOG: 937 number of overflow areas allocated
in jobName.MTC.StepFREQLDB.log, I see
freqld(17532):Sat 30 Jan 2010 03:24:06 PM PST LOG: 695 number of overflow areas allocated
This process has 7 passes. Other log files as
jobName.MTC.StepMCSORT_N.log
jobName.MTC.StepSORTA_N.log
jobName.MTC.StepSORTB_1.log
jobName.MTC.StepMTCH_N.log N=1,2,3,4,5,6,7
None of them has overflow mesage.
In case there is overflow, which of the following value(s) need to be adjusted in Advanced Run Options for Match?
Frequency Analysis Buffer Count
Index Buffer Count
Match Presort Buffer Count
Report Buffer Count
Extract Buffer Count
Thansk