Best approach for unknown varied source

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Re: Use of Java Pack in Datastage?

Post by JoshGeorge »

Give XML a shot :) Ask all Java applications to send what ever they want in a common XML format. (You can accept data in junks as records and segrigate accordingly - use colum seperater stage,...).
Last edited by JoshGeorge on Mon May 07, 2007 2:23 am, edited 1 time in total.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

kduke wrote:Wow, Ray. Give Craig a little credit once in a while. Besides I would be more impressed if you thought it and he wrote it.
What if this requirement is the only way your client can survive because they may be "requesting data" from other people and have little or no control over the format. Then what :?:
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Then specify the format in the request.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

ray.wurlod wrote:Then specify the format in the request.
Let me explain in more detail. There are many organisations which collect data from various sources and the sources are not under their control. Hence, the format od the incoming data varies from source to source and also, since the number of sources are huge it is very cumbersome if distinct sets of ETL jobs exist for pulling data from each source. Hence, a strategy must exist to represent the metadata of each inout file and a generic job to ETL according to that.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Of course, this happens all the time. I just don't see how anyone can expect a single job (or jobs) be built that will somehow magically be able to figure out any and all incoming formats and produce what the consuming system needs. That, from what I recall, was the original request. :shock:

However, a series of dedicated 'pre-processing' jobs for each source is perfectly feasible and pretty common - they homogenize the data into your standard load format.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply