Page 1 of 1

Cleansing Name

Posted: Tue Apr 02, 2013 7:55 am
by chandra.shekhar@tcs.com
Hi All,
We have received a requirement from our client where we have to cleanse the Customer Name Field through Datastage.
The Name(VARCHAR(120)) should be considered as Invalid and to be reported in the target if

1) It contains only numbers OR
2) It contains only junk values(or special characters) OR
3) It is null or Empty

The Customer Name is valid if it contains alphanumeric or alphabets or alphanumeric with junk values.
E.g.

Code: Select all

Datastage     -- Valid
Datastage 8.5 -- Valid
8.5           -- Invalid
              -- Invalid
Datasta%^ge   -- Valid
1234          -- Invalid
^&*(          -- Invalid
I have tried using Alpha & Alnum function but getting stuck afterwards.

Posted: Tue Apr 02, 2013 8:12 am
by chulett
OK... I'll start. "Junk" values? :?

Regardless, you should be able to use the double convert technique to remove all of the good characters and if anything is left then it is "invalid".

Posted: Tue Apr 02, 2013 8:46 am
by wwilliamson
Based on the rules you've given, if there's even a single letter present then the name is valid. If that's the case then I'd do what chulett mentioned, but remove non-alphabetic characters, and if the length is still > 0 then it's a valid name. Also, it might be a good idea to run NullToEmpty() and Trim() on your column data.

Posted: Tue Apr 02, 2013 9:04 am
by priyadarshikunal
You may need double convert however I would suggest calculating length of original string and the string after convert (all alphabets)and if the length is same then reject it.

So your derivation should look like

Stage Variable

svOrignalLen=Len(InLink.Name)
svConvertLen=Len(convert("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz","",InLink.Name))

Constraint

svOrignalLen > svConvertLen


The second option using double convert

Len(convert(convert("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz","",InLink.Name),"",InLink.Name))<>0

so this will remove all the junk characters and leave you with alphabets only. So if there is any alphabet, your string is valid

May be this is what Craig meant.

Posted: Wed Apr 03, 2013 2:38 am
by Rakesh311
For this requirement i feel its better to go with Quality stage if u have.

Write a pattern like *? and *^
*? Which will satisfy only if your input contain Alphabets.
*^ Which will satisfy only if your input contain Number.

Any by using this ruleset in Standardize stage you can achieve output.

Regards

Posted: Wed Apr 03, 2013 3:07 am
by chandra.shekhar@tcs.com
@Rakesh,
I know that Qualitystage is better tool for this requirement, but we have a constraint of doing it by Datastage only.

Posted: Wed Apr 03, 2013 7:14 am
by chulett
Silly requirement. Use the best tool for the job.

Perhaps you can do this in steps. Use Convert() to remove all of the digits and punctuation then check the size of the result. If it is zero then it was 'only numbers' and is invalid. Then from that result upcase it and use Convert() to remove A-Z and any other valid characters from it. If the size doesn't change it would fall into your 'only junk or special characters' category and is invalid. Otherwise... valid.

Off the top of my head, that should get you pretty close.

Posted: Wed Apr 03, 2013 7:40 am
by priyadarshikunal
chandra.shekhar@tcs.com wrote:Also I need to trim, do a nullablity check & remove tab/new line characters from the name too.
In 8.5 and above versions you may skip null checks but you need to do trimming and other operations in addition to this as per your requirement.

Posted: Thu Apr 04, 2013 1:23 am
by chandra.shekhar@tcs.com
@Priya
I'll try it and let you know whats my result.
For the time being I'll tell you what I have done.
I have used 1 stage variable

Code: Select all

svName --> Convert(Convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz','',NullToEmpty(DSLink2.NAME)),'',NullToEmpty(DSLink2.NAME))
Then I have used a constraint where I filter the data on the basis of

Code: Select all

(AlNum(svName) = 0 AND Alpha(svName) = 0)
(Both should be false)
Is this correct ?
In 8.5 and above versions you may skip null checks
Can you elaborate, I am using version 8.7..


@Craig,
Not able to view your whole content, :(
Can you please summarize a bit ?

Posted: Thu Apr 04, 2013 2:45 am
by Rakesh311
Okay
Lets take it in a different way,
so your requirement is if any alphabet present in your input column then it is valid.


lets do in 3 steps:
1)convert $ in the input empty.

2) then Covert all alphabets present in the output of 1st step to $,

one suggestion instead of writing all alphabet in both cases , covert input to one any of the case and use that case in your code.

3) then if count of $ in output of step 2 is greater than 0 means you have some alphabet in your input.

Regards,

Posted: Thu Apr 04, 2013 12:35 pm
by priyadarshikunal
I see you have marked the topic as workaround. Does this mean the solution worked?

Posted: Thu Apr 04, 2013 11:38 pm
by chandra.shekhar@tcs.com
@Priya
Yes, my solution what I have posted above and your solution are giving me the same output so I think the logic is working now.
Can you explain me when you said that we can skip null checks in 8.5 and above ?
How can we do it ?

Posted: Fri Apr 05, 2013 3:02 am
by jerome_rajan
priyadarshikunal wrote:In 8.5 and above versions you may skip null checks
That's Incorrect if you mean what I think you mean :?