Problem with more Match passes

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
vairus
Participant
Posts: 52
Joined: Thu Feb 07, 2008 8:02 am
Location: Johannesburg

Problem with more Match passes

Post by vairus »

Hi QS Folks,

Greetings.

I ams struggling in my matching process while running 4 parse match with 4 million records.

The process stops after creating 3rd dataset, it has to create 5 dataset for 4 parse. I really don't know what is the problem is.

But I could run very successfully when I run 3 parse match.

It would be nice if anyone suggest me what to do to run the 4 parse match.

The hardware configuration is

3 GB RAM
2 Processor (2.99 GHZ)
100 GB Hard disk space

The error while I am running for 4 pass is (from source data type Uint64 to int64)

I am not getting the error when am running 3 pass.

Thanks in advance
vairamuthu
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Not quite sure what you mean by "stops". How do you know the match has stopped?

Is there an error message anywhere in the log files? Does the job actually abort? Do the files in the Data/Controls/Logs files stop increasing in size?

The most likely cause is that pass 4 is creating huge block sizes which is churning memory.

What I would be tempted to do is create a source file for the match of, say, 1 million rows (or even less) and run the match again. Then look at the block sizes at the end of the report, if I am right pass 4 should be significantlydifferent to the others.


Hope this helps,
Bob.
Bob Oxtoby
vairus
Participant
Posts: 52
Joined: Thu Feb 07, 2008 8:02 am
Location: Johannesburg

RE : Problem with more Match passes

Post by vairus »

Hi Bob,

Thanks for the response.

3 pass match specification runs if confidence level is lower in 3rd pass.

In 3rd pass i had 3 columns with CHAR algorithm and i want exact word match.So i increase the confidence to .999 M and .998 u probability in 3rd pass . i monitor the job. it create dataset and scratch files for last pass but after few min it rolls back everything and then no error message but its not processing.

in previous low confidence matches it creates nearly 2 million matched records in 3rd pass only.

i don't know why it happen when i increase the confidence level.

Could you please tell me where i can find the block size that you mentioned.

i ll try with a 1 million records

Eagerly waiting for reply

Thanks in Advance
vairamuthu
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

The blocking information I was referring to can be found at the end of the match report (matchname.RPT) in the Data folder of the project.

If you are doing an exact match with the Char comparison algorithym you do not need to bechanging the m or u probability settings. I think I ould leave these as the default until you get the match itself to run, then you can start to tune it.

Bob.
Bob Oxtoby
Post Reply