Page 1 of 1

Re: Matching Question

Posted: Sun Jun 14, 2009 11:00 pm
by dalecooper
Hi Julio

Thanks for the reply!
Unfortunately it does not work on my side.
The closest I get to the desired results is not to have
AddedColumn in the blocking but to include it in
the match specification, UNCERT 900, 0.9 and 0.1.
Critical Missing OK

For this example I have standardized the CompanyName.

So The columns look similar to this:

MatchName1|MatchName2|DivisionNumber|Addedcolumn|
IBM | | 01 | 01 |
IBM | Corp | 01 | 01 |
IBM | CC | 01 | |
IBM | Pty | 01 | 07 |
IBM | | 01 | 07 |

Blocking on MatchName1 and DivisionNumber.
And I agree that blocking on MatchName1 and DivisionNumber and
matching on AddedColumn UNCERT 0.9 and 0.1 , 900 and Critical Missing OK
should group it correctly but it does not.
The above code is used in the seventh pass.

I get results like this:
Pass6:
MatchName1|MatchName2|DivisionNumber|Addedcolumn|
IBM | | 01 | 01 |
IBM | CC | 01 | |

Pass7:
MatchName1|MatchName2|DivisionNumber|Addedcolumn|
IBM | Corp | 01 | 01 |
IBM | Pty | 01 | 07 |
IBM | | 01 | 07 |

In Pass7 there is 01 also included!

Any suggestions?

Thanks
Dale

Re: Matching Question

Posted: Mon Jun 15, 2009 6:58 am
by JRodriguez
Dale,

Which column in your input data is having a null value? I can see it .......
maybe if you use "Code" it will be properly tabulated ...

If you are using just AddedColumn as your Blocking column and one of the records is having a null value in this column then that record will be automatically rejected but it will available for other passes ...

If the AddedColumn is having a null value, I guess that you will need two passes, please use independent match:

1) Block on Added column
Nothing on Math commands!

Results should be like:

MatchName1|MatchName2|DivisionNumber|Addedcolumn|
IBM | | 01 | 01 |
IBM | Corp | 01 | 01 |


IBM | Pty | 01 | 07 |
IBM | | 01 | 07 |

Residual record ( will be available for second pass)
IBM | CC | 01 | |


2) Block on DivisionNumber and First Character of MatchName2
Nothing on Match commands!

This will add the residual record from pass1 to top group of records

MatchName1|MatchName2|DivisionNumber|Addedcolumn|

IBM | | 01 | 01 |
IBM | Corp | 01 | 01 |
IBM | CC | 01 | |



IBM | Pty | 01 | 07 |
IBM | | 01 | 07 |



dalecooper wrote:Hi Julio

Thanks for the reply!
Unfortunately it does not work on my side.
The closest I get to the desired results is not to have
AddedColumn in the blocking but to include it in
the match specification, UNCERT 900, 0.9 and 0.1.
Critical Missing OK

For this example I have standardized the CompanyName.

So The columns look similar to this:

MatchName1|MatchName2|DivisionNumber|Addedcolumn|
IBM | | 01 | 01 |
IBM | Corp | 01 | 01 |
IBM | CC | 01 | |
IBM | Pty | 01 | 07 |
IBM | | 01 | 07 |

Blocking on MatchName1 and DivisionNumber.
And I agree that blocking on MatchName1 and DivisionNumber and
matching on AddedColumn UNCERT 0.9 and 0.1 , 900 and Critical Missing OK
should group it correctly but it does not.
The above code is used in the seventh pass.

I get results like this:
Pass6:
MatchName1|MatchName2|DivisionNumber|Addedcolumn|
IBM | | 01 | 01 |
IBM | CC | 01 | |

Pass7:
MatchName1|MatchName2|DivisionNumber|Addedcolumn|
IBM | Corp | 01 | 01 |
IBM | Pty | 01 | 07 |
IBM | | 01 | 07 |

In Pass7 there is 01 also included!

Any suggestions?

Thanks
Dale

Posted: Wed Jun 17, 2009 12:11 am
by dalecooper
Hi Julio

Thank you again for the help!

I have tried this and it works, however don't you think it
will also be possible to modify the "Weight Overrides" and add a value
to the "AddedColum" to ensure that if the data agrees/ do not agree then
it should put it in different blocks?

The "Weight Overrides" screen has the following options:

Replace/Add Radio buttons.
Agreement Weight[AW]
Disagreement Weight[DW]
Data Source Missing Weight[AM]
Reference Source Missing Weight[BM]
Both Missing Weight[XM]
Conditional Source Value[AV]
Conditional Reference Source Value[BV]

So, if I select "Add" and put -15 in the "Disagreement Weight" this might block the data into different blocks?

Thanks
Dale

Posted: Wed Jun 17, 2009 1:06 am
by ray.wurlod
Please be careful with terminology.

Blocking columns MUST acheive an exact match.

Data are grouped into "sets", each with a unique Set ID, to indicate potential duplicates based on your other criteria.

Increasing the disagreement weight is more likely to reduce the number of potential duplicates.

Posted: Wed Jun 17, 2009 5:33 am
by dalecooper
Thanks Ray for the information.

Where can I find what each of the following means and how to set it up?

Replace/Add Radio buttons.
Agreement Weight[AW]
Disagreement Weight[DW]
Data Source Missing Weight[AM]
Reference Source Missing Weight[BM]
Both Missing Weight[XM]
Conditional Source Value[AV]
Conditional Reference Source Value[BV]

I have tried numerous settings for the above but cannot get the correct results?
Is there maybe a more detailed description somewhere on what the above does?
Basically what I need is that when a certain column does not match it needs to be allocted to different buckets?

Example :
(This is the third pass)
AddedColumnToMatch:
Same Pass
Matched set, SetId 1 :
07
07
null

Matched set, SerId2 :
03
03

Thanks
Dale

Posted: Wed Jun 17, 2009 7:19 am
by JRodriguez
Dale,

You can find the meaning of those weight override variables in the WebSphere QualityStage User Guide pdf that comes with the product and the online help of the product itself. Learning how to set them up after getting the sense of how the tool used composite weights and cut off values to determine match, duplicates, and residual pairs, is just a matter of practicing different predetermined scenarios like you are doing


dalecooper wrote:Thanks Ray for the information.

Where can I find what each of the following means and how to set it up?

Replace/Add Radio buttons.
Agreement Weight[AW]
Disagreement Weight[DW]
Data Source Missing Weight[AM]
Reference Source Missing Weight[BM]
Both Missing Weight[XM]
Conditional Source Value[AV]
Conditional Reference Source Value[BV]

I have tried numerous settings for the above but cannot get the correct results?
Is there maybe a more detailed description somewhere on what the above does?
Basically what I need is that when a certain column does not match it needs to be allocted to different buckets?

Example :
(This is the third pass)
AddedColumnToMatch:
Same Pass
Matched set, SetId 1 :
07
07
null

Matched set, SerId2 :
03
03

Thanks
Dale

Posted: Wed Jun 17, 2009 10:29 am
by JRodriguez
Dale,

Yes is possible ... but adding,replacing and scaling weights only modify the calculated composite weight of the pair, the key factor if you take that route, will be the cut off value of the pass that will determine if the pair is a match or non match

If you want a record to be considered automatically a non match base on a column disagreement weight ( In your case "AddedColumn") then you should declare the column as CRITICAL [MISSINGOK] in the Special Variable Properties ( VARTYPE )
dalecooper wrote:Hi Julio

Thank you again for the help!

I have tried this and it works, however don't you think it
will also be possible to modify the "Weight Overrides" and add a value
to the "AddedColum" to ensure that if the data agrees/ do not agree then
it should put it in different blocks?

The "Weight Overrides" screen has the following options:

Replace/Add Radio buttons.
Agreement Weight[AW]
Disagreement Weight[DW]
Data Source Missing Weight[AM]
Reference Source Missing Weight[BM]
Both Missing Weight[XM]
Conditional Source Value[AV]
Conditional Reference Source Value[BV]

So, if I select "Add" and put -15 in the "Disagreement Weight" this might block the data into different blocks?

Thanks
Dale

Posted: Wed Jun 17, 2009 11:01 am
by chulett
Julio - not to be a 'noodge' but you are aware there's no need to quote the entire contents of every post you reply to, yes? There is a perfectly lovely Reply to topic button that starts you off with a clean slate.

Posted: Wed Jun 17, 2009 11:26 am
by JRodriguez
I did not notice this buttom until you point me to it, thanks!

Posted: Wed Jun 17, 2009 11:27 am
by JRodriguez
:(
I did not notice this buttom until you point me to it, thanks!

Posted: Wed Jun 17, 2009 11:32 am
by dalecooper
Well, thanks for all the information.
I'm sure from here I will able to manage the changes on my side!

Your help Julio/Ray is much appreciated.