Page 1 of 1

Generic job for tranforming multiple files

Posted: Fri Feb 23, 2007 2:20 am
by chitrangadsingh
Hi all,

We have a requirement wherein 17 files each with different metadata are to be read and some generic transformations/ data cleansing is to be done. These majorly include stripping white spaces from char fields and removing leading/trailing zeroes from decimal fields.
The idea is to design a generic job which can do this for any file passed as a parameter, along with its metadata. Designing a single job for each file is the obvious approach but its not preferred since no. of files and layout may change in future.
But the roadblock I foresee is how can I generalize the transformations. Because to perform trimming etc one need to know the column name and its datatype.

Is there any way through which it can be achieved in PX. Will appreciate any pointers towards it.

Thanks in advance.

Posted: Fri Feb 23, 2007 2:56 am
by ray.wurlod
Design a single job for each file, and maintain them into the future.

As soon as you wish to apply any kind of transformation, even a Trim() function, you must name the input column and output column explicitly. That rules out runtime column propagation.

Posted: Fri Feb 23, 2007 4:10 am
by chitrangadsingh
Thanx Ray for the prompt response as always..

but guess m unlucky not to have a premium membership till yet!
Anyways, watever i cud gather from ur visible reply is: i should build separate jobs for each file and add transformations and col names as n when requirement comes? Isn't there a workaround in terms of routines or unix scripting? Can any non-premium poster help me :oops: ?
will b getting a premium id soon to read ur entire reply..thanx anyways.

I think you know your solution....

Posted: Fri Feb 23, 2007 5:27 am
by s_boyapati
You have work around for this being used by our gurus over years in UNIX with Awk and perl. Develop scripts in such away to take meta data file ( contains field name and definitions) . Use one metadata file per file you want to process. Your main script should read that metadata file and use that information to apply generic functions as required. Scripts might look ugly after tons of code in them and difficult to maintain. So group the files based on definitions and type of funtions to run on them. develop template based scripts to process them. I say one script for one group of files( or family). I did that kind of work before EE was on board.

chitrangadsingh wrote:Thanx Ray for the prompt response as always..

but guess m unlucky not to have a premium membership till yet!
Anyways, watever i cud gather from ur visible reply is: i should build separate jobs for each file and add transformations and col names as n when requirement comes? Isn't there a workaround in terms of routines or unix scripting? Can any non-premium poster help me :oops: ?
will b getting a premium id soon to read ur entire reply..thanx anyways.

Need more Details...

Posted: Fri Feb 23, 2007 5:30 am
by novneet

Can you tell me whether the metadata for all the files will be same or they will differ?
If the metadata is going to differ then, is it that all the char field will only store alphabates(not numbers).
Also is the source file going to be a delimited or fixed width and is the character field going to be in double quotes?
Please give me the above details and I might be able to help you up.

Posted: Fri Feb 23, 2007 8:15 am
by DSguru2B
I would opt for sed/awk script. That would be much easy and the code wouldnt be huge either. There are many sed one liners that perform similar tasks. The script would even be independent of the files metadata. Look into it.

Posted: Mon Feb 26, 2007 2:55 am
by chitrangadsingh
Thanks all for your time and suggestions.
Novneet: Metadata will differ. Files are fixed-length, no quote char. Let's assume char field will have only char data and numeric only number.
Let me know if you have any suggestions on this.


[quote="DSguru2B"]I would opt for sed/awk script. That would be much easy and the code wouldnt be huge either. There are many sed one liners that perform similar tasks. The script would even be independent of the files metadata. Look into it.[/quote]

Posted: Mon Feb 26, 2007 3:53 am
by novneet
Sorry, My DataStage server is down :shock: , as soon as it is up I will test and post the code.