Unduplicate Match Stage
-
- Participant
- Posts: 33
- Joined: Tue Oct 21, 2008 10:29 am
Unduplicate Match Stage
Hi All,
I have a problem now, i dont know how to do it. I have an input table with some x columns, i have to find matched, dups and non matched records based on the 7 columns (LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER, BIRTH_DATE, PHONE_NO, ADDRESS).
I have used input table out flow to standardize stage and added rules like this :-
USNAME.SET :- LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER
VDATE.SET :- BIRTH_DATE
VPHONE.SET :- PHONE_NO
USADDR.SET:- ADDRESS
after standardization i have used a copy stage to get two flows one into match frequency and the other one for Unduplicate Match, i got the second flow from Match Frequency for Unduplicate Match.
Now - i have two input flows (Out of standardize and out of Match Frequency) for Unduplicate Match - please give me an idea how to use unduplicate match, how to create a match specification for this requirement.
This is very important - please help me -
Thanks,
I have a problem now, i dont know how to do it. I have an input table with some x columns, i have to find matched, dups and non matched records based on the 7 columns (LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER, BIRTH_DATE, PHONE_NO, ADDRESS).
I have used input table out flow to standardize stage and added rules like this :-
USNAME.SET :- LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER
VDATE.SET :- BIRTH_DATE
VPHONE.SET :- PHONE_NO
USADDR.SET:- ADDRESS
after standardization i have used a copy stage to get two flows one into match frequency and the other one for Unduplicate Match, i got the second flow from Match Frequency for Unduplicate Match.
Now - i have two input flows (Out of standardize and out of Match Frequency) for Unduplicate Match - please give me an idea how to use unduplicate match, how to create a match specification for this requirement.
This is very important - please help me -
Thanks,
Raja
-
- Participant
- Posts: 33
- Joined: Tue Oct 21, 2008 10:29 am
Re: Unduplicate Match Stage
*********rupesh.datastage wrote:Hi All,
I have a problem now, i dont know how to do it. I have an input table with some x columns, i have to find matched, dups and non matched records based on the 7 columns (LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER, BIRTH_DATE, PHONE_NO, ADDRESS).
I have used input table out flow to standardize stage and added rules like this :-
USNAME.SET :- LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER
VDATE.SET :- BIRTH_DATE
VPHONE.SET :- PHONE_NO
USADDR.SET:- ADDRESS
after standardization i have used a copy stage to get two flows one into match frequency and the other one for Unduplicate Match, i got the second flow from Match Frequency for Unduplicate Match.
Now - i have two input flows (Out of standardize and out of Match Frequency) for Unduplicate Match - please give me an idea how to use unduplicate match, how to create a match specification for this requirement.
This is very important - please help me -
Thanks,
Oh my god, i know that DSXchange is the best forum for DataStage and QualityStage.
Still, i am not getting an answer from you Xperts....??
Please give me an idea....
Thanks,
Raja
Well... there's probably fewer people here that are familiar with QS than you might think, especially in version 8. I'll be the first to admit that I, for one, know jack about it. As in squat. Zip, zilch, nada.
You'll either need to (continue to) be patient and see if anyone can help or if it's really "very important" then get help from your official support provider.
You'll either need to (continue to) be patient and see if anyone can help or if it's really "very important" then get help from your official support provider.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 33
- Joined: Tue Oct 21, 2008 10:29 am
Thanks
***chulett wrote:Well... there's probably fewer people here that are familiar with QS than you might think, especially in version 8. I'll be the first to admit that I, for one, know jack about it. As in squat. Zip, zilch, nada.
You'll either need to (continue to) be patient and see if anyone can help or if it's really "very important" then get help from your official support provider.
chulett,
Thanks for your reply. I will wait for sometime otherwise i wil contact the IBM support.
Regards,
Raja
-
- Participant
- Posts: 33
- Joined: Tue Oct 21, 2008 10:29 am
Hey Ray
**ray.wurlod wrote:... and possibly stop posting the same question multiple times and annoying some of us by asking it again via private message and/or email. ...
Ray - Thanks for your reply, its really very helpful to me.
I apprecite your answers.
Regards,
Raja
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Unless I have missed something, a Standardize stage can only deploy one Rule Set. On that basis you will need four Standardize stages, one for each Rule Set. You will also need an artificial key so that you can re-join the partial records downstream.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 33
- Joined: Tue Oct 21, 2008 10:29 am
standardize prob
**ray.wurlod wrote:Unless I have missed something, a Standardize stage can only deploy one Rule Set. On that basis you will need four Standardize stages, one for each Rule Set. You will also need an artificial key so ...
Ray - are you sure that standardize will handle only one ruleset, if that is the case if i use 4 standardises, then i need 4 match frequences. But we can add only 2 inputs (one from standardize and one from match frequency) to unduplicate stage. is in't it ??
please advise -
Raja
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Prepare data for Match stage as follows. Output links leading to frequency tables have been omitted for clarity: you will need them.
Code: Select all
+----------+
| |
+->| Stan_1 | +----------+
| | +------->| |
| +----------+ | |
| | Join_A |
| +----------+ | |
| | +------->| |
+->| Stan_2 | +----------+
+--------+ | | | |
| | | +----------+ |
| Xfmr +-------+ V
| | | +----------+ +----------+
+--------+ | | | | |
+->| Stan_3 +------->| Join_B |
| | | | |
| +----------+ +----------+
| |
| +----------+ V
| | | +----------+ +------------+
+->| Stan_4 | | | | |
| +-------->| Join_C +------->| Data_Set |
+----------+ | | | |
+----------+ +------------+
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 527
- Joined: Thu Apr 19, 2007 1:25 am
- Location: Melbourne
Hey Ray,ray.wurlod wrote:Unless I have missed something, a Standardize stage can only deploy one Rule Set. On that basis you will need four Standardize stages, one for each Rule Set. You will also need an artificial key so ...
I had a quick look at it (in the designer only: didn't run a job) and it looks like you can have multiple "processes" in a stage and each process can do a different STAN. The mappings tab in stage properties showed the multiple sets of ruleset output fields ok.
The main limitation is that you can't create more than 1 process per stage that uses the same ruleset, presumably because the hard-coded output field names (eg StreetName_AUADDR) would clash.
Designer complained when I tried to save a 2nd process using a ruleset I'd already picked for another process.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 33
- Joined: Tue Oct 21, 2008 10:29 am
am i using correctly
**stuartjvnorton wrote:Hey Ray,ray.wurlod wrote:Unless I have missed something, a Standardize stage can only deploy one Rule Set. On that basis you will need four Standardize stages, one for each Rule Set. You will also need an artificial key so ...
I had a quick look at it (in the designer only: didn't run a job) and it looks like you can have multiple "processes" in a stage and each process can do a different STAN. The mappings tab in stage properties showed the multiple sets of ruleset output fields ok.
The main limitation is that you can't create more than 1 process per stage that uses the same ruleset, presumably because the hard-coded output field names (eg StreetName_AUADDR) would clash.
Designer complained when I tried to save a 2nd process using a ruleset I'd already picked for another process.
Thanks for your reply, sounds like i am using right ruleset. Is n't it ??
USNAME.SET :- LAST_NAME, FIRST_NAME, MIDDLE_NAME, GENDER
VDATE.SET :- BIRTH_DATE
VPHONE.SET :- PHONE_NO
USADDR.SET:- ADDRESS
Regards,
Rupesh
Raja
-
- Participant
- Posts: 527
- Joined: Thu Apr 19, 2007 1:25 am
- Location: Melbourne
Is that the order you are adding the fields into each STAN?
I work with the AU sets, but if the US one is similar, the order you specified wouldn't work too well. The rules for processing last name at the start tend to rely on the comma following the Last Name (which you wouldn't have).
Also, the Gender field where it is would only confuse the ruleset. You may find yourself with some very short surnames or middle initials you didn't expect.
If they are something standard like M/F then I'd leave them out completely and possibly compare later if you want to.
If they are actually salutations then you might want to order them like:
Gender (Only if it holds salutations ie: Mr, Mrs, etc. Won't understand M/F)
First Name
Middle Name
Last Name
Cheers
I work with the AU sets, but if the US one is similar, the order you specified wouldn't work too well. The rules for processing last name at the start tend to rely on the comma following the Last Name (which you wouldn't have).
Also, the Gender field where it is would only confuse the ruleset. You may find yourself with some very short surnames or middle initials you didn't expect.
If they are something standard like M/F then I'd leave them out completely and possibly compare later if you want to.
If they are actually salutations then you might want to order them like:
Gender (Only if it holds salutations ie: Mr, Mrs, etc. Won't understand M/F)
First Name
Middle Name
Last Name
Cheers