Unduplicate Match Stage

rupesh.datastage · Post by **rupesh.datastage** » Fri Nov 14, 2008 8:17 am

Hi All,

I have a problem now, i dont know how to do it. I have an input table with some x columns, i have to find matched, dups and non matched records based on the 7 columns (LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER, BIRTH_DATE, PHONE_NO, ADDRESS).

I have used input table out flow to standardize stage and added rules like this :-

USNAME.SET :- LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER
VDATE.SET :- BIRTH_DATE
VPHONE.SET :- PHONE_NO
USADDR.SET:- ADDRESS

after standardization i have used a copy stage to get two flows one into match frequency and the other one for Unduplicate Match, i got the second flow from Match Frequency for Unduplicate Match.

Now - i have two input flows (Out of standardize and out of Match Frequency) for Unduplicate Match - please give me an idea how to use unduplicate match, how to create a match specification for this requirement.

This is very important - please help me -

Thanks,

rupesh.datastage · Post by **rupesh.datastage** » Fri Nov 14, 2008 6:22 pm

rupesh.datastage wrote:Hi All,

I have a problem now, i dont know how to do it. I have an input table with some x columns, i have to find matched, dups and non matched records based on the 7 columns (LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER, BIRTH_DATE, PHONE_NO, ADDRESS).

I have used input table out flow to standardize stage and added rules like this :-

USNAME.SET :- LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER
VDATE.SET :- BIRTH_DATE
VPHONE.SET :- PHONE_NO
USADDR.SET:- ADDRESS

after standardization i have used a copy stage to get two flows one into match frequency and the other one for Unduplicate Match, i got the second flow from Match Frequency for Unduplicate Match.

Now - i have two input flows (Out of standardize and out of Match Frequency) for Unduplicate Match - please give me an idea how to use unduplicate match, how to create a match specification for this requirement.

This is very important - please help me -

Thanks,

*********

Oh my god, i know that DSXchange is the best forum for DataStage and QualityStage.

Still, i am not getting an answer from you Xperts....??

Please give me an idea....

Thanks,

chulett · Post by **chulett** » Fri Nov 14, 2008 7:17 pm

Well... there's probably fewer people here that are familiar with QS than you might think, especially in version 8. I'll be the first to admit that I, for one, know jack about it. As in squat. Zip, zilch, nada.

You'll either need to (continue to) be patient and see if anyone can help or if it's really "very important" then get help from your official support provider.

ray.wurlod · Post by **ray.wurlod** » Fri Nov 14, 2008 9:20 pm

... and possibly stop posting the same question multiple times and annoying some of us by asking it again via private message and/or email.

rupesh.datastage · Post by **rupesh.datastage** » Sat Nov 15, 2008 10:43 am

chulett wrote:Well... there's probably fewer people here that are familiar with QS than you might think, especially in version 8. I'll be the first to admit that I, for one, know jack about it. As in squat. Zip, zilch, nada.

You'll either need to (continue to) be patient and see if anyone can help or if it's really "very important" then get help from your official support provider.

***

chulett,

Thanks for your reply. I will wait for sometime otherwise i wil contact the IBM support.

Regards,

rupesh.datastage · Post by **rupesh.datastage** » Sat Nov 15, 2008 10:58 am

ray.wurlod wrote:... and possibly stop posting the same question multiple times and annoying some of us by asking it again via private message and/or email. ...

**

Ray - Thanks for your reply, its really very helpful to me.

I apprecite your answers.

Regards,

ray.wurlod · Post by **ray.wurlod** » Sat Nov 15, 2008 7:45 pm

Unless I have missed something, a Standardize stage can only deploy one Rule Set. On that basis you will need four Standardize stages, one for each Rule Set. You will also need an artificial key so that you can re-join the partial records downstream.

rupesh.datastage · Post by **rupesh.datastage** » Sat Nov 15, 2008 9:57 pm

ray.wurlod wrote:Unless I have missed something, a Standardize stage can only deploy one Rule Set. On that basis you will need four Standardize stages, one for each Rule Set. You will also need an artificial key so ...

**

Ray - are you sure that standardize will handle only one ruleset, if that is the case if i use 4 standardises, then i need 4 match frequences. But we can add only 2 inputs (one from standardize and one from match frequency) to unduplicate stage. is in't it ??

please advise -

ray.wurlod · Post by **ray.wurlod** » Sun Nov 16, 2008 8:10 am

Prepare data for Match stage as follows. Output links leading to frequency tables have been omitted for clarity: you will need them.

Code: Select all

                    +----------+
                    |          |
                 +->|  Stan_1  |        +----------+
                 |  |          +------->|          |
                 |  +----------+        |          |
                 |                      |  Join_A  |
                 |  +----------+        |          |
                 |  |          +------->|          |
                 +->|  Stan_2  |        +----------+
+--------+       |  |          |             |
|        |       |  +----------+             |
|  Xfmr  +-------+                           V
|        |       |  +----------+        +----------+
+--------+       |  |          |        |          |
                 +->|  Stan_3  +------->|  Join_B  |
                 |  |          |        |          |
                 |  +----------+        +----------+
                 |                           |
                 |  +----------+             V
                 |  |          |         +----------+        +------------+
                 +->|  Stan_4  |         |          |        |            |
                    |          +-------->|  Join_C  +------->|  Data_Set  |
                    +----------+         |          |        |            |
                                         +----------+        +------------+

stuartjvnorton · Post by **stuartjvnorton** » Sun Nov 16, 2008 10:29 pm

ray.wurlod wrote:Unless I have missed something, a Standardize stage can only deploy one Rule Set. On that basis you will need four Standardize stages, one for each Rule Set. You will also need an artificial key so ...

Hey Ray,

I had a quick look at it (in the designer only: didn't run a job) and it looks like you can have multiple "processes" in a stage and each process can do a different STAN. The mappings tab in stage properties showed the multiple sets of ruleset output fields ok.

The main limitation is that you can't create more than 1 process per stage that uses the same ruleset, presumably because the hard-coded output field names (eg StreetName_AUADDR) would clash.
Designer complained when I tried to save a 2nd process using a ruleset I'd already picked for another process.

ray.wurlod · Post by **ray.wurlod** » Sun Nov 16, 2008 10:43 pm

Cool, I've learned something useful today. :D

rupesh.datastage · Post by **rupesh.datastage** » Mon Nov 17, 2008 9:16 am

stuartjvnorton wrote:
ray.wurlod wrote:Unless I have missed something, a Standardize stage can only deploy one Rule Set. On that basis you will need four Standardize stages, one for each Rule Set. You will also need an artificial key so ...
Hey Ray,

I had a quick look at it (in the designer only: didn't run a job) and it looks like you can have multiple "processes" in a stage and each process can do a different STAN. The mappings tab in stage properties showed the multiple sets of ruleset output fields ok.

The main limitation is that you can't create more than 1 process per stage that uses the same ruleset, presumably because the hard-coded output field names (eg StreetName_AUADDR) would clash.
Designer complained when I tried to save a 2nd process using a ruleset I'd already picked for another process.

**

Thanks for your reply, sounds like i am using right ruleset. Is n't it ??

USNAME.SET :- LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER
VDATE.SET :- BIRTH_DATE
VPHONE.SET :- PHONE_NO
USADDR.SET:- ADDRESS

Regards,

Rupesh

stuartjvnorton · Post by **stuartjvnorton** » Mon Nov 17, 2008 5:18 pm

Is that the order you are adding the fields into each STAN?

I work with the AU sets, but if the US one is similar, the order you specified wouldn't work too well. The rules for processing last name at the start tend to rely on the comma following the Last Name (which you wouldn't have).

Also, the Gender field where it is would only confuse the ruleset. You may find yourself with some very short surnames or middle initials you didn't expect.

If they are something standard like M/F then I'd leave them out completely and possibly compare later if you want to.
If they are actually salutations then you might want to order them like:

Gender (Only if it holds salutations ie: Mr, Mrs, etc. Won't understand M/F)
First Name
Middle Name
Last Name

Cheers

DSXchange

Unduplicate Match Stage

Unduplicate Match Stage

Re: Unduplicate Match Stage

Thanks

Hey Ray

standardize prob

am i using correctly