Hi,
Can datastage read unstructured data? When i mean unstructured it means, tweets or facebook messages.My understanding is we can read them but we will not be able to process them. Can you confirm my understanding.
Thank you
Unstructured data
Moderators: chulett, rschirm, roy
The words "unstructured data" mean too many things to too many people....so it becomes critical that such discussions be very specific......twitter data, for example can be received in json format.....fully readable by datastage.....formal excel files are fully readable by datastage......image data has had various strategies over the years, at least for "moving it" or "pointing to it"....
What exactly is the format of the data you need to consume. ?
Ernie
What exactly is the format of the data you need to consume. ?
Ernie
Ernie Ostic
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Basically yes - there is an Unstructured Data stage (most people are using this to read directly from Excel). There is also a Big Data File stage (which connects to Hadoop distributed file system), and various other mechanisms as well. Why not research on IBM web site and/or on your favourite search engine?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Information Server 11.3 added some additional support for this type of data by giving you the ability in the XML input stage to read from an API layer. So for Twitter you would have an API to read Tweets in a specific XML format. Typically you can buy from Twitter a subset of Tweet content filtered by region or topic or user type. Facebook is harder to read because content is not open (which is why IBM announced a partnership with Twitter and not Facebook). You could connect to Facebook but you can only get content from pages you are allowed to see. Again you would connect via an API and read it in as XML.
Once data is in DataStage as XML you can flatten it to relational data or output it as XML or write it to NoSQL or a Hadoop distributed file system. DataStage cannot do a lot with the content - it cannot do sentiment analysis or text analytics - you would write it out and then use SPSS or BigInsights to analyse the content.
Once data is in DataStage as XML you can flatten it to relational data or output it as XML or write it to NoSQL or a Hadoop distributed file system. DataStage cannot do a lot with the content - it cannot do sentiment analysis or text analytics - you would write it out and then use SPSS or BigInsights to analyse the content.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn