Page 1 of 1

Reserved Character Conversion

Posted: Wed Nov 23, 2005 11:47 pm
by bakul
Hello,

I have a job which reads records from a dataset, formats it into XML and then writes it to a sequential file. Some of the fields in the input dataset may contain special characters which have to be replaced with some other characters.

I cannot do this replacement using an after-job unix script on the sequential file as the sequential file contains the complete XML.
I have to perform the replacement only at input dataset column level.

Is there any way in which I can run a unix script on dataset columns or create some other subroutine which can be then called in a transformer to replace the characters in the columns?

Posted: Thu Nov 24, 2005 12:48 am
by richdhan
Hi Bakul,

A wrapper stage can be used to execute unix scripts. Also if you are comfortable with C/C++ parallel routines can be developed and used in the transformer.

HTH
--Rich

Posted: Thu Nov 24, 2005 2:32 am
by ray.wurlod
You could also use an External Filter stage and run the data through awk or sed (or any other stream editing command).

Posted: Mon Nov 28, 2005 8:01 am
by bakul
Thanks Ray! Is there any reference for specifiying External Filter commands? I did refer Parallel Developers Guide. But it does not give any guidelines on how to use the input column names in the filter commands and so on.

Posted: Mon Nov 28, 2005 4:21 pm
by clshore
Generally, the tr command will yield the best performance when translating character values on a UNIX system.

Carter

Posted: Tue Nov 29, 2005 9:20 am
by bakul
I will try to provide more details. I have 4 input columns say Col1, Col2, Col3 and Col4. All the 4 columns have to be written to output columns (same name, same definition).
Col1 and Col3 may contain certain reserved characters and have to be replaced. However I cannot use Convert function since I have to replace one character with a string of more than 1 chars. Col2 and Col4 have to be passed on as it is.
So essentially what I have to do is

Input.Col1 -> reserved character conversion -> Output.Col1
Input.Col2 ---------------------------------------- > Output.Col2
Input.Col3 -> reserved character conversion -> Output.Col3
Input.Col4 ---------------------------------------- > Output.Col4

Can I use an external filter to do this? If yes, how do I perform the mapping from input columns to output columns?

Posted: Tue Nov 29, 2005 10:09 pm
by bakul
Thanks! Convert would be perfect for a one-to-one char replacement. However I need to do a one-to-many character replacement. For e.g '&' to '&'. Any suggestions on how to do this?

Posted: Tue Nov 29, 2005 11:05 pm
by chulett
Well, it that is what you are doing - is there no equivalent on the PX side to how you would approach this in a Server job? There you would simply change the Data element column value to XML and those kind of 'coversions' would happen automagically. :?

Posted: Wed Nov 30, 2005 2:58 am
by bakul
Well I havent' been able to find the equivalent for that as yet.

Posted: Wed Nov 30, 2005 3:04 am
by ArndW
The server function for what you are trying to do is ereplace and there is no PX equivalent function. If you only have a couple of conversions then a IF-THEN-ELSE construct in a transform stage would work, if there are many then you would best be served writing your own function to do this.

Posted: Wed Nov 30, 2005 4:02 am
by bakul
Thanks! We have not written routines in C or C++ as yet. If we decide to use a C or C++ routine, what are the pre-requisites in terms of compilers, libraries and other requirements ?

Posted: Wed Nov 30, 2005 4:13 am
by ray.wurlod
Whatever the installation guide and readme files for your platform specify!

Posted: Wed Nov 30, 2005 4:14 am
by ArndW
You have all the software you need on your machine - since the compiler is already a pre-requisite of installing PX.

What I would ask you to think about is the requirement for using PX for this. If you write a DataStage Server job to read from a sequential file, convert using the EREPLACE() function and write out to a sequential file you will get very respectable speeds - on even a single-cpu Windows server it will still be in excess of 10,000 rows per second and most likely much faster. Even using a BASIC Transform stage will give you pretty good performance for this type of operation.

So if you don't need to do this in PX it might be a lot cheaper to solve it with a server job or stage. The development time to put in your own operator when you have never done so before is going to be quite high compared to solving it with Server and BASIC - especially since using the search functionality in this forum will give you your answer already.