Best approach for unknown varied source

JoshGeorge · Post by **JoshGeorge** » Thu May 03, 2007 6:35 am

Give XML a shot

Ask all Java applications to send what ever they want in a common XML format. (You can accept data in junks as records and segrigate accordingly - use colum seperater stage,...).

sud · Post by **sud** » Thu May 03, 2007 8:25 am

kduke wrote:Wow, Ray. Give Craig a little credit once in a while. Besides I would be more impressed if you thought it and he wrote it.

What if this requirement is the only way your client can survive because they may be "requesting data" from other people and have little or no control over the format. Then what

ray.wurlod · Post by **ray.wurlod** » Thu May 03, 2007 4:42 pm

Then specify the format in the request.

sud · Post by **sud** » Thu May 03, 2007 4:46 pm

ray.wurlod wrote:Then specify the format in the request.

Let me explain in more detail. There are many organisations which collect data from various sources and the sources are not under their control. Hence, the format od the incoming data varies from source to source and also, since the number of sources are huge it is very cumbersome if distinct sets of ETL jobs exist for pulling data from each source. Hence, a strategy must exist to represent the metadata of each inout file and a generic job to ETL according to that.

chulett · Post by **chulett** » Thu May 03, 2007 4:55 pm

Of course, this happens all the time. I just don't see how anyone can expect a single job (or jobs) be built that will somehow magically be able to figure out any and all incoming formats and produce what the consuming system needs. That, from what I recall, was the original request.

However, a series of dedicated 'pre-processing' jobs for each source is perfectly feasible and pretty common - they homogenize the data into your standard load format.

DSXchange

Best approach for unknown varied source

Re: Use of Java Pack in Datastage?