Live Data Feed capturing through DS
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 39
- Joined: Thu Nov 23, 2006 11:23 pm
Live Data Feed capturing through DS
Hi,
I have been given a scenario in which I have to capture data from a live feed(can be in xml format, or csv), apply a couple of transforms and then store data into the database(DB2).
I wanted to know if it can be done using any of the DS stages?? I obviously dont want to download the whole feed into one file and then read it through sequential file or xml file stage....instead I want to do some sort of near real time transformation!!
any idea anyone?
I have been given a scenario in which I have to capture data from a live feed(can be in xml format, or csv), apply a couple of transforms and then store data into the database(DB2).
I wanted to know if it can be done using any of the DS stages?? I obviously dont want to download the whole feed into one file and then read it through sequential file or xml file stage....instead I want to do some sort of near real time transformation!!
any idea anyone?
-
- Participant
- Posts: 39
- Joined: Thu Nov 23, 2006 11:23 pm
Actually, I have a custom java application at a server that streams out some data after every couple of seconds for about 4-5 hours a day. I can configure the port to use for streaming the data out.
The data stream is in xml format.
Now I want to collect that data stream on some other system (where I have datastage installed). As and when data is collected, it has to be fed into database(DB2).
I dont know how to connect RealTime XML input stage to the data feed. And then where & how to parse it and store values in relational Db2 tables.
btw what is MQ?
The data stream is in xml format.
Now I want to collect that data stream on some other system (where I have datastage installed). As and when data is collected, it has to be fed into database(DB2).
I dont know how to connect RealTime XML input stage to the data feed. And then where & how to parse it and store values in relational Db2 tables.
btw what is MQ?
You would have to purchase the SOA Edition of DataStage to get true 'real time' processing of data. And possibly the Java PACK if you needed to use a Java 'app' to process the data. SOA would allow you to deploy your ETL job as a web service and thus process your XML in real time as your Java app called it.
Otherwise you'd need to do something more like 'near real time' processing. Small or 'micro' batches depending on your terminology, launching your processing job every X minutes to grab whatever is available to be processed.
As to the last question: http://en.wikipedia.org/wiki/WebSphere_MQ
Otherwise you'd need to do something more like 'near real time' processing. Small or 'micro' batches depending on your terminology, launching your processing job every X minutes to grab whatever is available to be processed.
As to the last question: http://en.wikipedia.org/wiki/WebSphere_MQ
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
You are right candidate to use MQseries. You can make the stage to listen to the port where the data is been ported out. And you an feed it in to XML stage to have persistent storage.
Do you already got RTI stage installed??
Do you already got RTI stage installed??
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Participant
- Posts: 39
- Joined: Thu Nov 23, 2006 11:23 pm
Under the real time tag, I only gave xml stages installed, NO MQSeries stages installed.Do you already got RTI stage installed??
I have been reading about MQSeries and SOA for a few hours now. I figured out that MQ series would best suit to my needs as suggested by kumar. But for that I need to *BUY* SOA edition of DS AND IBM MQSeries product. is that correct?
I guess, i would be better off writing a java application to collect the feed & writing it to a file, and then runing a DS job after regular intervals to collect data from file to simulate a near real time feed (as suggested by chulett).
The only problem with that approach is that how to clear the data after it has been read by the sequential file stage, also while the job is in progress more data can come in so I can not just use an after job routine to delete and recreate the file.
The problem becomes more complex if I have to simulate that functionality of MQStage, which deletes already read data only after job has been *sucessfully* finished.
Any coments?
-
- Premium Member
- Posts: 301
- Joined: Thu Jul 14, 2005 10:27 am
- Location: Melbourne, Australia
- Contact:
manish1005,
I've used a similar technique to read and process data streamed into a named pipe. The pipe was created with a 'mkfifo' at the start of the job and removed at the end (using before and after ExecSH entries)
In my case, the pipe was being populated by the 'top' command to capture performance statistics whilst numerous jobs were running.
You could just leave your job running for the duration - you wouldn't necessarily need to run it periodically, although that would also work. You could get your Java process (or whatever populates the pipe) to output some special character string (guaranteed not to appear in genuine data!) to inform your job that the datastream is finished, allowing your job to shutdown gracefully.
HTH
J.
I've used a similar technique to read and process data streamed into a named pipe. The pipe was created with a 'mkfifo' at the start of the job and removed at the end (using before and after ExecSH entries)
In my case, the pipe was being populated by the 'top' command to capture performance statistics whilst numerous jobs were running.
You could just leave your job running for the duration - you wouldn't necessarily need to run it periodically, although that would also work. You could get your Java process (or whatever populates the pipe) to output some special character string (guaranteed not to appear in genuine data!) to inform your job that the datastream is finished, allowing your job to shutdown gracefully.
HTH
J.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
-
- Participant
- Posts: 39
- Joined: Thu Nov 23, 2006 11:23 pm
jhmckeever, thanks for the reply. So, here I will have to create the named pipe before hand or through java application. and will have to share the path/name with DS Job.
If I get it right, named pipe will be accessed through sequential file stage like any normal file.
Simple Job design(keeping aside xml requirements for the time being)
Sequential File(with path of named pipe)--->Transformer---->DB2.
Also, I am using Windows2000 server, so probably I will need to figure out how to use named pipe on windows.
If I get it right, named pipe will be accessed through sequential file stage like any normal file.
Simple Job design(keeping aside xml requirements for the time being)
Sequential File(with path of named pipe)--->Transformer---->DB2.
But what I couldn't get is, why will the job access the same sequential stage again and again as the data comes in?? or is there something else needed to be done to map named pipe in datastage??You could just leave your job running for the duration - you wouldn't necessarily need to run it periodically, although that would also work.
Also, I am using Windows2000 server, so probably I will need to figure out how to use named pipe on windows.
You can use the same sequential file with the name of the pipe mentioned in it.
If you read one record, the next record will be available to read.
And you can read it periodically, as the data will be available in the pipe given by your Java stream.
Windows too have named pipe option. You can check for that.
If you read one record, the next record will be available to read.
And you can read it periodically, as the data will be available in the pipe given by your Java stream.
Windows too have named pipe option. You can check for that.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Premium Member
- Posts: 301
- Joined: Thu Jul 14, 2005 10:27 am
- Location: Melbourne, Australia
- Contact:
manish,
Yes - You'll need to synchronise the name of the pipe your Java app is writing to with the one your DS job is reading from.
Depending on your configuration you could either get DS to invoke the Java app to populate the pipe, or get your Java app to invoke your ds job with either (Java app or DS job) passing the name of the shared pipe to the other.
Yes - The DS jobs will access the named pipe as a sequential file.
The job will continue to read the sequential file until the pipe is closed, or until your job shuts down. I don't know what would happen if you put an EOF on the pipe - maybe that would work?
HTH,
J.
Yes - You'll need to synchronise the name of the pipe your Java app is writing to with the one your DS job is reading from.
Depending on your configuration you could either get DS to invoke the Java app to populate the pipe, or get your Java app to invoke your ds job with either (Java app or DS job) passing the name of the shared pipe to the other.
Yes - The DS jobs will access the named pipe as a sequential file.
The job will continue to read the sequential file until the pipe is closed, or until your job shuts down. I don't know what would happen if you put an EOF on the pipe - maybe that would work?
HTH,
J.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>