Hi all,
We have a requirement wherein 17 files each with different metadata are to be read and some generic transformations/ data cleansing is to be done. These majorly include stripping white spaces from char fields and removing leading/trailing zeroes from decimal fields.
The idea is to design a generic job which can do this for any file passed as a parameter, along with its metadata. Designing a single job for each file is the obvious approach but its not preferred since no. of files and layout may change in future.
But the roadblock I foresee is how can I generalize the transformations. Because to perform trimming etc one need to know the column name and its datatype.
Is there any way through which it can be achieved in PX. Will appreciate any pointers towards it.
Thanks in advance.
Generic job for tranforming multiple files
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 12
- Joined: Mon Jul 18, 2005 4:07 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Design a single job for each file, and maintain them into the future.
As soon as you wish to apply any kind of transformation, even a Trim() function, you must name the input column and output column explicitly. That rules out runtime column propagation.
As soon as you wish to apply any kind of transformation, even a Trim() function, you must name the input column and output column explicitly. That rules out runtime column propagation.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 12
- Joined: Mon Jul 18, 2005 4:07 am
Thanx Ray for the prompt response as always..
but guess m unlucky not to have a premium membership till yet!
Anyways, watever i cud gather from ur visible reply is: i should build separate jobs for each file and add transformations and col names as n when requirement comes? Isn't there a workaround in terms of routines or unix scripting? Can any non-premium poster help me
?
will b getting a premium id soon to read ur entire reply..thanx anyways.
but guess m unlucky not to have a premium membership till yet!
Anyways, watever i cud gather from ur visible reply is: i should build separate jobs for each file and add transformations and col names as n when requirement comes? Isn't there a workaround in terms of routines or unix scripting? Can any non-premium poster help me
![Embarassed :oops:](./images/smilies/icon_redface.gif)
will b getting a premium id soon to read ur entire reply..thanx anyways.
-
- Premium Member
- Posts: 70
- Joined: Thu Aug 14, 2003 6:24 am
- Contact:
I think you know your solution....
You have work around for this being used by our gurus over years in UNIX with Awk and perl. Develop scripts in such away to take meta data file ( contains field name and definitions) . Use one metadata file per file you want to process. Your main script should read that metadata file and use that information to apply generic functions as required. Scripts might look ugly after tons of code in them and difficult to maintain. So group the files based on definitions and type of funtions to run on them. develop template based scripts to process them. I say one script for one group of files( or family). I did that kind of work before EE was on board.
Sree
Sree
chitrangadsingh wrote:Thanx Ray for the prompt response as always..
but guess m unlucky not to have a premium membership till yet!
Anyways, watever i cud gather from ur visible reply is: i should build separate jobs for each file and add transformations and col names as n when requirement comes? Isn't there a workaround in terms of routines or unix scripting? Can any non-premium poster help me?
will b getting a premium id soon to read ur entire reply..thanx anyways.
Need more Details...
Hi,
Can you tell me whether the metadata for all the files will be same or they will differ?
If the metadata is going to differ then, is it that all the char field will only store alphabates(not numbers).
Also is the source file going to be a delimited or fixed width and is the character field going to be in double quotes?
Please give me the above details and I might be able to help you up.
Can you tell me whether the metadata for all the files will be same or they will differ?
If the metadata is going to differ then, is it that all the char field will only store alphabates(not numbers).
Also is the source file going to be a delimited or fixed width and is the character field going to be in double quotes?
Please give me the above details and I might be able to help you up.
Regards,
Novneet Jain
Novneet Jain
I would opt for sed/awk script. That would be much easy and the code wouldnt be huge either. There are many sed one liners that perform similar tasks. The script would even be independent of the files metadata. Look into it.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 12
- Joined: Mon Jul 18, 2005 4:07 am
Thanks all for your time and suggestions.
Novneet: Metadata will differ. Files are fixed-length, no quote char. Let's assume char field will have only char data and numeric only number.
Let me know if you have any suggestions on this.
Thanks.
[quote="DSguru2B"]I would opt for sed/awk script. That would be much easy and the code wouldnt be huge either. There are many sed one liners that perform similar tasks. The script would even be independent of the files metadata. Look into it.[/quote]
Novneet: Metadata will differ. Files are fixed-length, no quote char. Let's assume char field will have only char data and numeric only number.
Let me know if you have any suggestions on this.
Thanks.
[quote="DSguru2B"]I would opt for sed/awk script. That would be much easy and the code wouldnt be huge either. There are many sed one liners that perform similar tasks. The script would even be independent of the files metadata. Look into it.[/quote]