Unduplicate Match Stage

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
rupesh.datastage
Participant
Posts: 33
Joined: Tue Oct 21, 2008 10:29 am

Unduplicate Match Stage

Post by rupesh.datastage »

Hi All,

I have a problem now, i dont know how to do it. I have an input table with some x columns, i have to find matched, dups and non matched records based on the 7 columns (LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER, BIRTH_DATE, PHONE_NO, ADDRESS).

I have used input table out flow to standardize stage and added rules like this :-

USNAME.SET :- LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER
VDATE.SET :- BIRTH_DATE
VPHONE.SET :- PHONE_NO
USADDR.SET:- ADDRESS



after standardization i have used a copy stage to get two flows one into match frequency and the other one for Unduplicate Match, i got the second flow from Match Frequency for Unduplicate Match.


Now - i have two input flows (Out of standardize and out of Match Frequency) for Unduplicate Match - please give me an idea how to use unduplicate match, how to create a match specification for this requirement.

This is very important - please help me -

Thanks,
Raja
rupesh.datastage
Participant
Posts: 33
Joined: Tue Oct 21, 2008 10:29 am

Re: Unduplicate Match Stage

Post by rupesh.datastage »

rupesh.datastage wrote:Hi All,

I have a problem now, i dont know how to do it. I have an input table with some x columns, i have to find matched, dups and non matched records based on the 7 columns (LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER, BIRTH_DATE, PHONE_NO, ADDRESS).

I have used input table out flow to standardize stage and added rules like this :-

USNAME.SET :- LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER
VDATE.SET :- BIRTH_DATE
VPHONE.SET :- PHONE_NO
USADDR.SET:- ADDRESS



after standardization i have used a copy stage to get two flows one into match frequency and the other one for Unduplicate Match, i got the second flow from Match Frequency for Unduplicate Match.


Now - i have two input flows (Out of standardize and out of Match Frequency) for Unduplicate Match - please give me an idea how to use unduplicate match, how to create a match specification for this requirement.

This is very important - please help me -

Thanks,
*********

Oh my god, i know that DSXchange is the best forum for DataStage and QualityStage.

Still, i am not getting an answer from you Xperts....??

Please give me an idea....

Thanks,
Raja
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well... there's probably fewer people here that are familiar with QS than you might think, especially in version 8. I'll be the first to admit that I, for one, know jack about it. As in squat. Zip, zilch, nada. :wink:

You'll either need to (continue to) be patient and see if anyone can help or if it's really "very important" then get help from your official support provider.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

... and possibly stop posting the same question multiple times and annoying some of us by asking it again via private message and/or email.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rupesh.datastage
Participant
Posts: 33
Joined: Tue Oct 21, 2008 10:29 am

Thanks

Post by rupesh.datastage »

chulett wrote:Well... there's probably fewer people here that are familiar with QS than you might think, especially in version 8. I'll be the first to admit that I, for one, know jack about it. As in squat. Zip, zilch, nada. :wink:

You'll either need to (continue to) be patient and see if anyone can help or if it's really "very important" then get help from your official support provider.
***

chulett,

Thanks for your reply. I will wait for sometime otherwise i wil contact the IBM support.

Regards,
Raja
rupesh.datastage
Participant
Posts: 33
Joined: Tue Oct 21, 2008 10:29 am

Hey Ray

Post by rupesh.datastage »

ray.wurlod wrote:... and possibly stop posting the same question multiple times and annoying some of us by asking it again via private message and/or email. ...
**

Ray - Thanks for your reply, its really very helpful to me.

I apprecite your answers.

Regards,
Raja
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Unless I have missed something, a Standardize stage can only deploy one Rule Set. On that basis you will need four Standardize stages, one for each Rule Set. You will also need an artificial key so that you can re-join the partial records downstream.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rupesh.datastage
Participant
Posts: 33
Joined: Tue Oct 21, 2008 10:29 am

standardize prob

Post by rupesh.datastage »

ray.wurlod wrote:Unless I have missed something, a Standardize stage can only deploy one Rule Set. On that basis you will need four Standardize stages, one for each Rule Set. You will also need an artificial key so ...
**

Ray - are you sure that standardize will handle only one ruleset, if that is the case if i use 4 standardises, then i need 4 match frequences. But we can add only 2 inputs (one from standardize and one from match frequency) to unduplicate stage. is in't it ??

please advise -
Raja
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Prepare data for Match stage as follows. Output links leading to frequency tables have been omitted for clarity: you will need them.

Code: Select all

                    +----------+
                    |          |
                 +->|  Stan_1  |        +----------+
                 |  |          +------->|          |
                 |  +----------+        |          |
                 |                      |  Join_A  |
                 |  +----------+        |          |
                 |  |          +------->|          |
                 +->|  Stan_2  |        +----------+
+--------+       |  |          |             |
|        |       |  +----------+             |
|  Xfmr  +-------+                           V
|        |       |  +----------+        +----------+
+--------+       |  |          |        |          |
                 +->|  Stan_3  +------->|  Join_B  |
                 |  |          |        |          |
                 |  +----------+        +----------+
                 |                           |
                 |  +----------+             V
                 |  |          |         +----------+        +------------+
                 +->|  Stan_4  |         |          |        |            |
                    |          +-------->|  Join_C  +------->|  Data_Set  |
                    +----------+         |          |        |            |
                                         +----------+        +------------+
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

ray.wurlod wrote:Unless I have missed something, a Standardize stage can only deploy one Rule Set. On that basis you will need four Standardize stages, one for each Rule Set. You will also need an artificial key so ...
Hey Ray,

I had a quick look at it (in the designer only: didn't run a job) and it looks like you can have multiple "processes" in a stage and each process can do a different STAN. The mappings tab in stage properties showed the multiple sets of ruleset output fields ok.

The main limitation is that you can't create more than 1 process per stage that uses the same ruleset, presumably because the hard-coded output field names (eg StreetName_AUADDR) would clash.
Designer complained when I tried to save a 2nd process using a ruleset I'd already picked for another process.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Cool, I've learned something useful today. :D
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rupesh.datastage
Participant
Posts: 33
Joined: Tue Oct 21, 2008 10:29 am

am i using correctly

Post by rupesh.datastage »

stuartjvnorton wrote:
ray.wurlod wrote:Unless I have missed something, a Standardize stage can only deploy one Rule Set. On that basis you will need four Standardize stages, one for each Rule Set. You will also need an artificial key so ...
Hey Ray,

I had a quick look at it (in the designer only: didn't run a job) and it looks like you can have multiple "processes" in a stage and each process can do a different STAN. The mappings tab in stage properties showed the multiple sets of ruleset output fields ok.

The main limitation is that you can't create more than 1 process per stage that uses the same ruleset, presumably because the hard-coded output field names (eg StreetName_AUADDR) would clash.
Designer complained when I tried to save a 2nd process using a ruleset I'd already picked for another process.
**

Thanks for your reply, sounds like i am using right ruleset. Is n't it ??

USNAME.SET :- LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER
VDATE.SET :- BIRTH_DATE
VPHONE.SET :- PHONE_NO
USADDR.SET:- ADDRESS


Regards,

Rupesh
Raja
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Is that the order you are adding the fields into each STAN?

I work with the AU sets, but if the US one is similar, the order you specified wouldn't work too well. The rules for processing last name at the start tend to rely on the comma following the Last Name (which you wouldn't have).

Also, the Gender field where it is would only confuse the ruleset. You may find yourself with some very short surnames or middle initials you didn't expect. ;-)
If they are something standard like M/F then I'd leave them out completely and possibly compare later if you want to.
If they are actually salutations then you might want to order them like:

Gender (Only if it holds salutations ie: Mr, Mrs, etc. Won't understand M/F)
First Name
Middle Name
Last Name


Cheers
Post Reply