Best approach for unknown varied source
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 612
- Joined: Thu May 03, 2007 4:59 am
- Location: Melbourne
Re: Use of Java Pack in Datastage?
Give XML a shot Ask all Java applications to send what ever they want in a common XML format. (You can accept data in junks as records and segrigate accordingly - use colum seperater stage,...).
Last edited by JoshGeorge on Mon May 07, 2007 2:23 am, edited 1 time in total.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
What if this requirement is the only way your client can survive because they may be "requesting data" from other people and have little or no control over the format. Then whatkduke wrote:Wow, Ray. Give Craig a little credit once in a while. Besides I would be more impressed if you thought it and he wrote it.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Let me explain in more detail. There are many organisations which collect data from various sources and the sources are not under their control. Hence, the format od the incoming data varies from source to source and also, since the number of sources are huge it is very cumbersome if distinct sets of ETL jobs exist for pulling data from each source. Hence, a strategy must exist to represent the metadata of each inout file and a generic job to ETL according to that.ray.wurlod wrote:Then specify the format in the request.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
Of course, this happens all the time. I just don't see how anyone can expect a single job (or jobs) be built that will somehow magically be able to figure out any and all incoming formats and produce what the consuming system needs. That, from what I recall, was the original request.
However, a series of dedicated 'pre-processing' jobs for each source is perfectly feasible and pretty common - they homogenize the data into your standard load format.
However, a series of dedicated 'pre-processing' jobs for each source is perfectly feasible and pretty common - they homogenize the data into your standard load format.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers