Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.
Moderators: chulett , rschirm , roy
chandra.shekhar@tcs.com
Premium Member
Posts: 353 Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India
Post
by chandra.shekhar@tcs.com » Tue Apr 02, 2013 7:55 am
Hi All,
We have received a requirement from our client where we have to cleanse the Customer Name Field through Datastage.
The Name(VARCHAR(120)) should be considered as Invalid and to be reported in the target if
1) It contains only numbers OR
2) It contains only junk values(or special characters) OR
3) It is null or Empty
The Customer Name is valid if it contains alphanumeric or alphabets or alphanumeric with junk values.
E.g.
Code: Select all
Datastage -- Valid
Datastage 8.5 -- Valid
8.5 -- Invalid
-- Invalid
Datasta%^ge -- Valid
1234 -- Invalid
^&*( -- Invalid
I have tried using Alpha & Alnum function but getting stuck afterwards.
Thanx and Regards,
ETL User
chulett
Charter Member
Posts: 43085 Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO
Post
by chulett » Tue Apr 02, 2013 8:12 am
OK... I'll start. "Junk" values?
Regardless, you should be able to use the
double convert technique to remove all of the good characters and if anything is left then it is "invalid".
-craig
"You can never have too many knives" -- Logan Nine Fingers
wwilliamson
Participant
Posts: 21 Joined: Fri Oct 01, 2010 2:45 pm
Contact:
Post
by wwilliamson » Tue Apr 02, 2013 8:46 am
Based on the rules you've given, if there's even a single letter present then the name is valid. If that's the case then I'd do what chulett mentioned, but remove non-alphabetic characters, and if the length is still > 0 then it's a valid name. Also, it might be a good idea to run NullToEmpty() and Trim() on your column data.
priyadarshikunal
Premium Member
Posts: 1735 Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI
Post
by priyadarshikunal » Tue Apr 02, 2013 9:04 am
You may need double convert however I would suggest calculating length of original string and the string after convert (all alphabets)and if the length is same then reject it.
So your derivation should look like
Stage Variable
svOrignalLen=Len(InLink.Name)
svConvertLen=Len(convert("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz","",InLink.Name))
Constraint
svOrignalLen > svConvertLen
The second option using double convert
Len(convert(convert("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz","",InLink.Name),"",InLink.Name))<>0
so this will remove all the junk characters and leave you with alphabets only. So if there is any alphabet, your string is valid
May be this is what Craig meant.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Rakesh311
Participant
Posts: 36 Joined: Wed Aug 04, 2010 10:53 pm
Location: Banglore
Contact:
Post
by Rakesh311 » Wed Apr 03, 2013 2:38 am
For this requirement i feel its better to go with Quality stage if u have.
Write a pattern like *? and *^
*? Which will satisfy only if your input contain Alphabets.
*^ Which will satisfy only if your input contain Number.
Any by using this ruleset in Standardize stage you can achieve output.
Regards
rAKESH
chandra.shekhar@tcs.com
Premium Member
Posts: 353 Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India
Post
by chandra.shekhar@tcs.com » Wed Apr 03, 2013 3:07 am
@Rakesh,
I know that Qualitystage is better tool for this requirement, but we have a constraint of doing it by Datastage only.
Thanx and Regards,
ETL User
chulett
Charter Member
Posts: 43085 Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO
Post
by chulett » Wed Apr 03, 2013 7:14 am
Silly requirement. Use the best tool for the job.
Perhaps you can do this in steps. Use Convert() to remove all of the digits and punctuation then check the size of the result. If it is zero then it was 'only numbers' and is invalid. Then from that result upcase it and use Convert() to remove A-Z and any other valid characters from it. If the size doesn't change it would fall into your 'only junk or special characters' category and is invalid. Otherwise... valid.
Off the top of my head, that should get you pretty close.
-craig
"You can never have too many knives" -- Logan Nine Fingers
priyadarshikunal
Premium Member
Posts: 1735 Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI
Post
by priyadarshikunal » Wed Apr 03, 2013 7:40 am
chandra.shekhar@tcs.com wrote: Also I need to trim, do a nullablity check & remove tab/new line characters from the name too.
In 8.5 and above versions you may skip null checks but you need to do trimming and other operations in addition to this as per your requirement.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
chandra.shekhar@tcs.com
Premium Member
Posts: 353 Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India
Post
by chandra.shekhar@tcs.com » Thu Apr 04, 2013 1:23 am
@Priya
I'll try it and let you know whats my result.
For the time being I'll tell you what I have done.
I have used 1 stage variable
Code: Select all
svName --> Convert(Convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz','',NullToEmpty(DSLink2.NAME)),'',NullToEmpty(DSLink2.NAME))
Then I have used a constraint where I filter the data on the basis of
Code: Select all
(AlNum(svName) = 0 AND Alpha(svName) = 0)
(Both should be false)
Is this correct ?
In 8.5 and above versions you may skip null checks
Can you elaborate, I am using version 8.7..
@Craig,
Not able to view your whole content,
Can you please summarize a bit ?
Thanx and Regards,
ETL User
Rakesh311
Participant
Posts: 36 Joined: Wed Aug 04, 2010 10:53 pm
Location: Banglore
Contact:
Post
by Rakesh311 » Thu Apr 04, 2013 2:45 am
Okay
Lets take it in a different way,
so your requirement is if any alphabet present in your input column then it is valid.
lets do in 3 steps:
1)convert $ in the input empty.
2) then Covert all alphabets present in the output of 1st step to $,
one suggestion instead of writing all alphabet in both cases , covert input to one any of the case and use that case in your code.
3) then if count of $ in output of step 2 is greater than 0 means you have some alphabet in your input.
Regards,
rAKESH
priyadarshikunal
Premium Member
Posts: 1735 Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI
Post
by priyadarshikunal » Thu Apr 04, 2013 12:35 pm
I see you have marked the topic as workaround. Does this mean the solution worked?
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
chandra.shekhar@tcs.com
Premium Member
Posts: 353 Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India
Post
by chandra.shekhar@tcs.com » Thu Apr 04, 2013 11:38 pm
@Priya
Yes, my solution what I have posted above and your solution are giving me the same output so I think the logic is working now.
Can you explain me when you said that we can skip null checks in 8.5 and above ?
How can we do it ?
Thanx and Regards,
ETL User
jerome_rajan
Premium Member
Posts: 376 Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway
Post
by jerome_rajan » Fri Apr 05, 2013 3:02 am
priyadarshikunal wrote: In 8.5 and above versions you may skip null checks
That's Incorrect if you mean what I think you mean
Jerome
Data Integration Consultant at AWS
Connect With Me On
LinkedIn
Life is really simple, but we insist on making it complicated.