Substring replacement

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Substring replacement

Post by PhilHibbs »

I've just come across a requirement on my project to replace " -" or "- " with "-", so I looked up Replace, and found that it is Server only, not PX. This is astonishing! There really is no built-in string replace function in parallel DataStage? I will try to get the client to accept the pxEreplace function, but I think they have two policies that will be a problem: "No parallel routines" and "No BASIC transformers". I imagine if I added an External Filter stage with a Unix command to do the replacement, they would ban that as well. Am I out of options for a parallel DataStage out-of-the-box solution for substring replacement?
Phil Hibbs | Capgemini
Technical Consultant
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Do you mean no server routines or truly no parallel routines? If parallel derivations are used 100's of times across jobs, it seems like a parallel routine would be easier to maintain.

What if your input contains " - " or "- " with multiple spaces?

Code: Select all

"      - " or "-      "
Perhaps using a loop within a Transformer stage would allow you to output "-" in these cases.
Choose a job you love, and you will never have to work a day in your life. - Confucius
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

qt_ky wrote:Do you mean no server routines or truly no parallel routines?
They are ok with BASIC Routines for use within Job Sequences - we have a number of routines that do various DB2 operations, get file row counts, etc.
What if your input contains " - " or "- " with multiple spaces?
Trim() will have dealt with that.
Perhaps using a loop within a Transformer stage would allow you to output "-" in these cases.
Transformer loops are for outputting multiple rows, I don't think it would be efficient to use a Transformer loop to loop through a string character by character - and there are multiple strings to perform this replace operation on as well.

I think I am winning the argument on allowing Parallel Routines though which is good. We just need to work out some support and maintenance documentation.
Phil Hibbs | Capgemini
Technical Consultant
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

I didn't think it would be efficient, but given all the client constraints, it seemed like they were eliminating all your options. I would lean towards the external filter stage you mentioned with calling the sed command. It's supported out of the box, but like you said, the pxerplace is not a built in function coming "out of the box" either.
Choose a job you love, and you will never have to work a day in your life. - Confucius
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Use Convert() function in transformer
Thanx and Regards,
ETL User
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If it's only " -" and "- " there's a tolerably ugly solution involving nested If..Then..Else and Index() functions. But one which would meet your clients' stated requirements.

<rant>Resist stupid requirements!</rant>
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

ray.wurlod wrote:If it's only " -" and "- " there's a tolerably ugly solution involving nested If..Then..Else and Index() functions. But one which would meet your clients' stated requirements.
Only if you can set an upper limit on the number of occurences
ray.wurlod wrote:<rant>Resist stupid requirements!</rant>
There is a valid point to be made, that for an organization that is using DataStage for the first time, and which may not have any one with C programming in their skill set, that the maintenance that parallel routines add is something that they might not want to take on. You have to compile it and put the .o file in the right place, and remember to recompile any jobs that use it whenever it changes, and make sure you update the .o file on uat and production servers, and if you want a different version in different projects on the same server then that is tricky as well.
Phil Hibbs | Capgemini
Technical Consultant
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

If they don't have C programming, it seems like they would more easily accept using a BASIC Transformer with the eReplace function.
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply