Page 1 of 1

What is the difference b/w ZQMIXNZQ, ZQMIXAZQ, ZQMIXRZQ

Posted: Fri Jun 11, 2010 9:42 am
by reachsam11
IBM documentation pdf had the same explanation for all three ZQMIXNZQ, ZQMIXAZQ, ZQMIXRZQ separators used in Standardize stage, Can anyone tell me the diff - When to use what.

I have a requirement to cleanse US address. Some addresses are 4 lines, some are 5 lines and they have all sorts of possible combinations of mix and match of name/address/area. I am looking to use USPREP and then USNAME/USADDR/USAREA on the respective prep domains.

Any has sample job or any pointer would be great.

Posted: Fri Jun 11, 2010 6:46 pm
by ray.wurlod
Which rule set uses these literals (I am not in the USA)? The names suggest "mixed numeric", "mixed alpha" and "mixed something beginning with R" - possibly very mixed tokens such as R2D2.

Re: What is the difference b/w ZQMIXNZQ, ZQMIXAZQ, ZQMIXRZQ

Posted: Sun Jun 13, 2010 8:12 am
by stuartjvnorton
reachsam11 wrote:IBM documentation pdf had the same explanation for all three ZQMIXNZQ, ZQMIXAZQ, ZQMIXRZQ separators used in Standardize stage, Can anyone tell me the diff - When to use what.

I have a requirement to cleanse US address. Some addresses are 4 lines, some are 5 lines and they have all sorts of possible combinations of mix and match of name/address/area. I am looking to use USPREP and then USNAME/USADDR/USAREA on the respective prep domains.

Any has sample job or any pointer would be great.
I think you'll find the explanation of each of the USPREP delimiters is in USPREP.CLS. IIRC, the different mix delimiters will test against all 3 of the domains, but vary the order.

You might have to do some further tests on the data to work out the best way to approach it. Maybe MIXA is the easiest way to go that gives you good results.
Maybe you can work out a way to narrow down the behaviours, based on a couple of basic observations. eg: when there are 5 lines, then the first is mostly attn info and then try it with MIXN, or the last line is almost always area info, so the last line could use MIXR, etc.
It might make for a more complicated job, but it may give you much better results for a bit of extra work.

As with most questions about STANing data, the better you understand the data, the better a job you can write to stan it.

Posted: Sun Jun 13, 2010 8:14 am
by stuartjvnorton
ray.wurlod wrote:Which rule set uses these literals (I am not in the USA)? The names suggest "mixed numeric", "mixed alpha" and "mixed something beginning with R" - possibly very mixed tokens such as R2D2. ...
LOL

"These are not the tokens you're looking for..." :oops: :lol:

Posted: Sun Jun 13, 2010 4:43 pm
by ray.wurlod
Mixican data, perhaps?
:lol: