Reserved Character Conversion

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bakul
Participant
Posts: 60
Joined: Wed Nov 10, 2004 2:12 am

Reserved Character Conversion

Post by bakul »

Hello,

I have a job which reads records from a dataset, formats it into XML and then writes it to a sequential file. Some of the fields in the input dataset may contain special characters which have to be replaced with some other characters.

I cannot do this replacement using an after-job unix script on the sequential file as the sequential file contains the complete XML.
I have to perform the replacement only at input dataset column level.

Is there any way in which I can run a unix script on dataset columns or create some other subroutine which can be then called in a transformer to replace the characters in the columns?
Regards,
Bakul
richdhan
Premium Member
Premium Member
Posts: 364
Joined: Thu Feb 12, 2004 12:24 am

Post by richdhan »

Hi Bakul,

A wrapper stage can be used to execute unix scripts. Also if you are comfortable with C/C++ parallel routines can be developed and used in the transformer.

HTH
--Rich
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You could also use an External Filter stage and run the data through awk or sed (or any other stream editing command).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bakul
Participant
Posts: 60
Joined: Wed Nov 10, 2004 2:12 am

Post by bakul »

Thanks Ray! Is there any reference for specifiying External Filter commands? I did refer Parallel Developers Guide. But it does not give any guidelines on how to use the input column names in the filter commands and so on.
Regards,
Bakul
clshore
Charter Member
Charter Member
Posts: 115
Joined: Tue Oct 21, 2003 11:45 am

Post by clshore »

Generally, the tr command will yield the best performance when translating character values on a UNIX system.

Carter
bakul
Participant
Posts: 60
Joined: Wed Nov 10, 2004 2:12 am

Post by bakul »

I will try to provide more details. I have 4 input columns say Col1, Col2, Col3 and Col4. All the 4 columns have to be written to output columns (same name, same definition).
Col1 and Col3 may contain certain reserved characters and have to be replaced. However I cannot use Convert function since I have to replace one character with a string of more than 1 chars. Col2 and Col4 have to be passed on as it is.
So essentially what I have to do is

Input.Col1 -> reserved character conversion -> Output.Col1
Input.Col2 ---------------------------------------- > Output.Col2
Input.Col3 -> reserved character conversion -> Output.Col3
Input.Col4 ---------------------------------------- > Output.Col4

Can I use an external filter to do this? If yes, how do I perform the mapping from input columns to output columns?
Regards,
Bakul
bakul
Participant
Posts: 60
Joined: Wed Nov 10, 2004 2:12 am

Post by bakul »

Thanks! Convert would be perfect for a one-to-one char replacement. However I need to do a one-to-many character replacement. For e.g '&' to '&'. Any suggestions on how to do this?
Regards,
Bakul
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well, it that is what you are doing - is there no equivalent on the PX side to how you would approach this in a Server job? There you would simply change the Data element column value to XML and those kind of 'coversions' would happen automagically. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
bakul
Participant
Posts: 60
Joined: Wed Nov 10, 2004 2:12 am

Post by bakul »

Well I havent' been able to find the equivalent for that as yet.
Regards,
Bakul
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The server function for what you are trying to do is ereplace and there is no PX equivalent function. If you only have a couple of conversions then a IF-THEN-ELSE construct in a transform stage would work, if there are many then you would best be served writing your own function to do this.
bakul
Participant
Posts: 60
Joined: Wed Nov 10, 2004 2:12 am

Post by bakul »

Thanks! We have not written routines in C or C++ as yet. If we decide to use a C or C++ routine, what are the pre-requisites in terms of compilers, libraries and other requirements ?
Regards,
Bakul
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Whatever the installation guide and readme files for your platform specify!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You have all the software you need on your machine - since the compiler is already a pre-requisite of installing PX.

What I would ask you to think about is the requirement for using PX for this. If you write a DataStage Server job to read from a sequential file, convert using the EREPLACE() function and write out to a sequential file you will get very respectable speeds - on even a single-cpu Windows server it will still be in excess of 10,000 rows per second and most likely much faster. Even using a BASIC Transform stage will give you pretty good performance for this type of operation.

So if you don't need to do this in PX it might be a lot cheaper to solve it with a server job or stage. The development time to put in your own operator when you have never done so before is going to be quite high compared to solving it with Server and BASIC - especially since using the search functionality in this forum will give you your answer already.
Post Reply