Parallel Routine help and solution recommendation needed

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
samarvind
Participant
Posts: 29
Joined: Wed Jan 18, 2006 6:13 am
Location: Sutton, Surrey

Parallel Routine help and solution recommendation needed

Post by samarvind »

Hi Members,

I have a requirement to standardize the incoming telephone number data in few columns to remove all alphabets, special characters and return only number. Ex. If I have a data like "(404)- Ns@-(0123456789)" it should strip of all Symbols, Special characters, alphabets and return only numbers like "4040123456789".

I have created a stored procedure to do this. However, my incoming data has many telephone number columns like Home, Work1, Work2, Mobile, Emergency...etc this makes my design very complex and ineffective as I have to make many stored procedures calls one for each column.

Also, Clients placed a requirement not include the exclusion list of all characters to be removed so I cannot use CONVERT function to hard-coded the exclusion list in DataStage.

Only way I could think of implementing this is taking the logic implemented in Stored procedure and use it in parallel routine and that would make the design easier, reusable and I can call this routine in transformer for any number of columns and get the result.

Although our team has some experience to write C Code to do this, I have a very little experience of how to compile the code in IBM AIX and use it in DataStage.

Could you please help me out in creating and compiling a C code in Unix with very basic steps?

Also, any other solution that you could think of better to implement it without parallel routine?

Thanks in advance for all your responses.

Sam
Thanks & Regards
arvind sampath
Software Engineer
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There are a number of threads discussing a "double convert" which will do what you are looking for - simply, quickly and in a transform without having to program in C++ or leave DataStage:

Code: Select all

Convert(Convert("0123456789", "", In.Column), "", In.Column)
The inner convert removes all the characters you want to KEEP from the string, then the outer convert replaces all the characters left remaining in the new string from the original string.... leaving you with only numeric digits.
samarvind
Participant
Posts: 29
Joined: Wed Jan 18, 2006 6:13 am
Location: Sutton, Surrey

Post by samarvind »

Thanks ArndW. You are a star :) - Double convert function worked.

used the following simple steps:

stagevar1 - "0123456789"

Stagevar2= Convert(Convert(stagevar1,'',inputdata),'',inputdata)

It returned all Numeric value. Great. Spent unnecessary effort researching parallel routines, even learnt few C coding as well on the way only to realize such a simple solutions exists here :x

Best programmer is the one who knows how and where to find the code :D
Thanks & Regards
arvind sampath
Software Engineer
samarvind
Participant
Posts: 29
Joined: Wed Jan 18, 2006 6:13 am
Location: Sutton, Surrey

Post by samarvind »

But still looking for a parallel routine to make it bit more reusable. Any help please how to use it?
Thanks & Regards
arvind sampath
Software Engineer
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There is no specific routine out there for DS to do this, but there are some examples of external calls or buildops. Are you looking for c++ code to effect this conversion, or the method of how to integrate an external function call or buildop into your jobs?
samarvind
Participant
Posts: 29
Joined: Wed Jan 18, 2006 6:13 am
Location: Sutton, Surrey

Post by samarvind »

I am looking for the method of how to compile the written C code in Unix and to generate the object file to refer that in parallel routine.
i.e.
What is the compilation command?
Where will the object exists after compilation?
Thanks & Regards
arvind sampath
Software Engineer
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The correct compile and link commands to use are found in your DSParams file - check the administrator for the environment compiler settings.

Once you've created your library (the name will differ depending upon which OS you are on) you can then link to that entrypoint. I usually end up writing a BuildOp instead - it is pretty easy and all of the DS datatype are defined.

I know that there are example programs and procedures out there for DataStage external functions; perhaps even "official" IBM ones.
samarvind
Participant
Posts: 29
Joined: Wed Jan 18, 2006 6:13 am
Location: Sutton, Surrey

Post by samarvind »

Thanks Arndw. I searched further and then got this link to understand the steps to compile the C code; however, I dont know where the object file would get created . I searched in the directory where I kept the C code but it is not there .

Also, I don't have much knowledge about buildOps; however, even if we build custom stages for the requirement like this with multiple telephone number columns in source you would end up linking one custom stage for each column and assembling all the columns together using Join or Aggregator which again makes it complex isn't it?

Can you please share me your thoughts how you visualize to achieve this using buildop stage?

Thanks again
Arvind
Thanks & Regards
arvind sampath
Software Engineer
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I think that you should start here with an official IBM example.
Post Reply