Best approach for unknown varied source
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 38
- Joined: Fri Apr 22, 2005 6:07 am
Best approach for unknown varied source
Hi All,
We have some critical requirement in which any new file can come in any format and we have to just plug that file with the existing transformation logic. Source file can be anything like fixed width flat file, csv file, XML, SOAP, etc. I am thinking on following approach:
Creation of datastage job by manipulating the template export xml dump which is having transformation logic. Manipulation means adding source part based on the configuation information users will enter like file type, source to target mapping, etc.
I am thinking to create a perl script which will manipulate the exported job xml and add the source information into it. I will then import it in my project using perl and compile it using the script itself
Can anyone tell me whether this approach is feasible.
Thanks in advance.
-Amit
We have some critical requirement in which any new file can come in any format and we have to just plug that file with the existing transformation logic. Source file can be anything like fixed width flat file, csv file, XML, SOAP, etc. I am thinking on following approach:
Creation of datastage job by manipulating the template export xml dump which is having transformation logic. Manipulation means adding source part based on the configuation information users will enter like file type, source to target mapping, etc.
I am thinking to create a perl script which will manipulate the exported job xml and add the source information into it. I will then import it in my project using perl and compile it using the script itself
Can anyone tell me whether this approach is feasible.
Thanks in advance.
-Amit
Out of interest, why do you not know what format the file will come in?
Is there a finite set of file formats? If there is create a small job that will reformat each file into standard format and then run that through your main job that has transformations in it. Users can choose which formating job based on the format of the file?
Is there a finite set of file formats? If there is create a small job that will reformat each file into standard format and then run that through your main job that has transformations in it. Users can choose which formating job based on the format of the file?
Regards,
Nick.
Nick.
Re: Best approach for unknown varied source
Wow... that's just crazy talk. I would be thinking: go take a long walk on a short pier.Amit Jaiswal wrote:We have some critical requirement in which any new file can come in any format and we have to just plug that file with the existing transformation logic. Source file can be anything like fixed width flat file, csv file, XML, SOAP, etc. I am thinking on following approach:<snip>
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 38
- Joined: Fri Apr 22, 2005 6:07 am
Hi,
Requirement is to make everything flexible. As per that any new vendor can be added and he can send the file in his own format.
Since data may come in millions of records (1-2 gig file) I am thinking it will be regular heavy overhead to reshuffle the columns and bring those into some common standard format. Converting any other type of file into sequential file format will be another overhead if we have to use single job for processing.
Thanks,
-Amit[/img]
Requirement is to make everything flexible. As per that any new vendor can be added and he can send the file in his own format.
Since data may come in millions of records (1-2 gig file) I am thinking it will be regular heavy overhead to reshuffle the columns and bring those into some common standard format. Converting any other type of file into sequential file format will be another overhead if we have to use single job for processing.
Thanks,
-Amit[/img]
That is why you split it into 2 processes,
1) Re-format input file into common structure - 1 job per customer which will be quick to build.
2) Process all reformated files through your complex logic - Only one multi-instance job.
It will be quicker to build a new re-format job for each customer than build that script you were talking about!
1) Re-format input file into common structure - 1 job per customer which will be quick to build.
2) Process all reformated files through your complex logic - Only one multi-instance job.
It will be quicker to build a new re-format job for each customer than build that script you were talking about!
Regards,
Nick.
Nick.
Regarding compiling on the fly ... it is not as nice as it sounds. The commandline compilation tool dscc is available only with the windows client, hence if your scripts are running on any non-windows platform you cannot compile there.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
-
- Premium Member
- Posts: 38
- Joined: Fri Apr 22, 2005 6:07 am
Use of Java Pack in Datastage?
Hi All,
Thanks for all this valueable information and suggestions.
I'm thinking on the alternate solutions which is use of two jobs. One job to change the source file format to predefined format and another multi-instance job is to do actual transformation.
I have another query related to same topic. I would like to know what are all the things we can achieve from Java Pack?
In short the requirement is as below:
We need to create a web based application/package/service which will do following tasks in batch as per the scheduler and configuration information:
1. Fetch files from various vendor locations using Java using various protocols.
2. Uncompress it, decrypt it and store the file on ETL server
3. To process this data faster use Datastage EE for transformation and loading data in oracle target
In the proposed solution jobs will be executed in the batch and on demand basis. On demand means, we may deploy the whole package to various group so that they can take care of processing particular category vendor data. If job related to some feed/file fails that group should be able to re-execute only that job from web based User Interface.
My query is can we achieve this web-service or invocation of DS Job using Java/ Java Pack?
Can we invoke DS services through JMS and EJB client. So, if we have a JMS queue, couldnt the Java framework post messages to this queue and DS will read the messages from this queue and invoke other jobs? Essentially, the service in DS listening for JMS messages will be the controller within Datastage and we can build a config for DS similar to the Java framework (to define what DS job to process for what kind of JMS message request).
If it is not possible using Java Pack can we achieve this using Datastge SOA Edn?
Thanks in advance.
-Amit Jaiswal
Thanks for all this valueable information and suggestions.
I'm thinking on the alternate solutions which is use of two jobs. One job to change the source file format to predefined format and another multi-instance job is to do actual transformation.
I have another query related to same topic. I would like to know what are all the things we can achieve from Java Pack?
In short the requirement is as below:
We need to create a web based application/package/service which will do following tasks in batch as per the scheduler and configuration information:
1. Fetch files from various vendor locations using Java using various protocols.
2. Uncompress it, decrypt it and store the file on ETL server
3. To process this data faster use Datastage EE for transformation and loading data in oracle target
In the proposed solution jobs will be executed in the batch and on demand basis. On demand means, we may deploy the whole package to various group so that they can take care of processing particular category vendor data. If job related to some feed/file fails that group should be able to re-execute only that job from web based User Interface.
My query is can we achieve this web-service or invocation of DS Job using Java/ Java Pack?
Can we invoke DS services through JMS and EJB client. So, if we have a JMS queue, couldnt the Java framework post messages to this queue and DS will read the messages from this queue and invoke other jobs? Essentially, the service in DS listening for JMS messages will be the controller within Datastage and we can build a config for DS similar to the Java framework (to define what DS job to process for what kind of JMS message request).
If it is not possible using Java Pack can we achieve this using Datastge SOA Edn?
Thanks in advance.
-Amit Jaiswal
Re: Use of Java Pack in Datastage?
I am not sure if you should open another post for this.
Well, to achieve job invocation through JAVA you will need the SOA edition. The JAVA packs allows you the opposite, invocation of JAVA programs from Datastage.
Well, to achieve job invocation through JAVA you will need the SOA edition. The JAVA packs allows you the opposite, invocation of JAVA programs from Datastage.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Returning to the original topic, the "Best approach for unknown varied source" is, in my opinion to push back hard against such an insane requirement. At least demand a small and finite number of possible formats, and some easy means to detect the format in the first line of the source file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray is 1000% correct, just don't allow your client "to kill" you.
Ray has an excellent advice about finite numbers of format and
easy detection. I have the hands-on experience with the same
type of "crazy" reqs. We followed the client and only in the middle
of the project when money and time have been lost everybody
(including a client) realized that "pure theory is a lot different
from harsh practices"...
Ray has an excellent advice about finite numbers of format and
easy detection. I have the hands-on experience with the same
type of "crazy" reqs. We followed the client and only in the middle
of the project when money and time have been lost everybody
(including a client) realized that "pure theory is a lot different
from harsh practices"...
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: