Splitting Single column value to three columns

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kumar3846
Participant
Posts: 36
Joined: Mon Jan 09, 2006 2:58 pm

Splitting Single column value to three columns

Post by kumar3846 »

Hi I have a requirement like below

I have source data coming as

Column A
BOISE ID 11022
SUN VALLEY ID 12345

I need these single column value in to three different columns with City,State and ZIP columns. I tired with field function but if you see second record even city has space in between. Can you guys please help me and that would be great help.

Thanks in advance for your valuble time and suggestions.

Thanks,
Kumar.
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

See the Column Import stage :)
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I don't see how that would help, you still have to declare a delimiter, and space would still get you the issue noted in the first post. Or am I missing something?

I think you'll need to do some intelligent parsing - find the number of fields in the string by counting the delimiters. Pull off the last field as the zip, then the next-to-last as the state. Then everything else woould be the city. Assuming state and zip are always present, of course. If not, then it gets a little more... interesting.

This would be pretty easy in a Server routine. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

Oops, my faux pas. You're right Craig, yes, the two-worded cities example I didn't any attention to. As you say, it's going to need some upfront parsing work done to it, to get some kind of consistency first off.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
kumar3846
Participant
Posts: 36
Joined: Mon Jan 09, 2006 2:58 pm

Post by kumar3846 »

Even i tried using Left and Right functions that posted earlier in the forum but still no use, i am getting different values.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Hopefully the state and Zip code are not two-worded. In that case you can get last two fields state & Zip and when you have two other fields you can derive the first field using length. hence the derivation will be:

Code: Select all

ZIP Code: svZIP=field(input column,' ',Dcount(Input Column))
State: svState=field(input column,' ',Dcount(Input Column)-1)
City: svCity=input column[1,len(input column)-(len(svZIP)+len(svState)+2)]
every thing with sv is a stage variable. And First 2 must be executed before the third one or if not using the stage variables the derivation of stage variable needs to be substituted where the stage variable is used.
Last edited by priyadarshikunal on Thu Jun 18, 2009 10:24 am, edited 1 time in total.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

To begin with - you'll probably need to do the steps below in stage variables in a transformer.

First - you'll need to use Trim on the field to get rid of any leading or trailing spaces and to reduce any interior spaces to one (ie: changes "City_____ST" to "City_ST"). Leading, trailing or duplicate interior spaces will throw you off. Use the trimmed value for all your substrings below.

Then use the Count function to determine how many spaces are in the string. If it is less than two or null then you need to reject the record as not having all entries.

Then use the field function three times, once to get the last space-separated field (zip), once to get the next to last field (city) and the third time to get the city.

To get the city it is a bit tricky because you'll have to do some math and use the optional last argument (number of substrings) of the Field function to get all but the last two "fields":

Code: Select all

FIELD (string, delimiter, occurrence [ ,num.substr] )
Last edited by asorrell on Thu Jun 18, 2009 10:25 am, edited 1 time in total.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Is your data standard?

trim(InputCol)[1, (Len(InputCol) - 8) ] CITY

(trim(InputCol)[8])[1,2] STATE

trim(InputCol)[5] ZIPCODE

* The emotions icon is a number 8
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

First - you'll need to use Trim on the field to get rid of any leading or trailing spaces and to reduce any interior spaces to one (ie: changes "City_____ST" to "City_ST"). Leading, trailing or duplicate interior spaces will throw you off. Use the trimmed value for all your substrings below.

Then use the Count function to determine how many spaces are in the string. If it is less than two or null then you need to reject the record as not having all entries.
correct.

you can use field function for city also

Code: Select all

svCity=Field(input column,' ',1,Dcount(input column)-2)
Other one is also correct you can use either of them.

I forgot the mention preprocessing steps noted by asorell. :idea:
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
kumar3846
Participant
Posts: 36
Joined: Mon Jan 09, 2006 2:58 pm

Post by kumar3846 »

Thanks for all your help but ,
i have used
Field(Trim(DSLink37.DDAM_SEASONAL_CITY_STATE_ZIP),' ',Dcount(DSLink37.DDAM_SEASONAL_CITY_STATE_ZIP)) this function toget zip but it is showing in RED Then i use Field(Trim(DSLink37.DDAM_SEASONAL_CITY_STATE_ZIP),' ',Dcount(DSLink37.DDAM_SEASONAL_CITY_STATE_ZIP,' ')) i am not geting any results on the target.
Last edited by kumar3846 on Thu Jun 18, 2009 3:26 pm, edited 1 time in total.
kumar3846
Participant
Posts: 36
Joined: Mon Jan 09, 2006 2:58 pm

Post by kumar3846 »

Thanks a lot guys and thanks for valuble suggestions

It worked same as priyadarshikunal said in his reply here you go

svZIP=Field(Trim(inputcolumn)," ",Dcount(Trim(inputcolumn)," "))

svState=Field(Trim(inputcolumn)," ",Dcount(Trim(inputcolumn)," ")-1)

inputcolumn[1,len(inputcolumn)-(len(svZIP)+len(svState)+2)]

Thanks
Kumar
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

QualityStage
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

Ray of course beat me to it and I am no expert and can only point you in the right direction but this is a perfect use of Quality Stage address standardization.

QS will look for certain patterns in your data and then those patterns are used to "parse" the data into fields. It will recognize multi-part or multi worded street and city names and most often handle things correctly. And, if I am not mistaken, will reject records for further manual review if they don't match a standard/recognized pattern.

And be careful with Canadian postal codes (zip code). They can often have a space between 2 sets of 3 chars (e.g. "M4B 1B3")
Bob
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Right... but realize the people that have it will use it and those that don't will post topics like this.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Kind of hoping that the people who need it buy it (insert usual disclaimers about not getting finder's fee etc. here).

Sure it can be done with generic routines such as RightmostDelimitedSubstring (this one does not come out of the box), but it's tedious identifying all possible variants on the address and area. The creators of QualityStage have already done all that hard work, for lots of different geographies.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply