What is the difference b/w ZQMIXNZQ, ZQMIXAZQ, ZQMIXRZQ

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
reachsam11
Participant
Posts: 26
Joined: Wed Mar 17, 2010 11:05 am

What is the difference b/w ZQMIXNZQ, ZQMIXAZQ, ZQMIXRZQ

Post by reachsam11 »

IBM documentation pdf had the same explanation for all three ZQMIXNZQ, ZQMIXAZQ, ZQMIXRZQ separators used in Standardize stage, Can anyone tell me the diff - When to use what.

I have a requirement to cleanse US address. Some addresses are 4 lines, some are 5 lines and they have all sorts of possible combinations of mix and match of name/address/area. I am looking to use USPREP and then USNAME/USADDR/USAREA on the respective prep domains.

Any has sample job or any pointer would be great.
ReachSam
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Which rule set uses these literals (I am not in the USA)? The names suggest "mixed numeric", "mixed alpha" and "mixed something beginning with R" - possibly very mixed tokens such as R2D2.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Re: What is the difference b/w ZQMIXNZQ, ZQMIXAZQ, ZQMIXRZQ

Post by stuartjvnorton »

reachsam11 wrote:IBM documentation pdf had the same explanation for all three ZQMIXNZQ, ZQMIXAZQ, ZQMIXRZQ separators used in Standardize stage, Can anyone tell me the diff - When to use what.

I have a requirement to cleanse US address. Some addresses are 4 lines, some are 5 lines and they have all sorts of possible combinations of mix and match of name/address/area. I am looking to use USPREP and then USNAME/USADDR/USAREA on the respective prep domains.

Any has sample job or any pointer would be great.
I think you'll find the explanation of each of the USPREP delimiters is in USPREP.CLS. IIRC, the different mix delimiters will test against all 3 of the domains, but vary the order.

You might have to do some further tests on the data to work out the best way to approach it. Maybe MIXA is the easiest way to go that gives you good results.
Maybe you can work out a way to narrow down the behaviours, based on a couple of basic observations. eg: when there are 5 lines, then the first is mostly attn info and then try it with MIXN, or the last line is almost always area info, so the last line could use MIXR, etc.
It might make for a more complicated job, but it may give you much better results for a bit of extra work.

As with most questions about STANing data, the better you understand the data, the better a job you can write to stan it.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

ray.wurlod wrote:Which rule set uses these literals (I am not in the USA)? The names suggest "mixed numeric", "mixed alpha" and "mixed something beginning with R" - possibly very mixed tokens such as R2D2. ...
LOL

"These are not the tokens you're looking for..." :oops: :lol:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Mixican data, perhaps?
:lol:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply