Unstructured data

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
pillip
Premium Member
Premium Member
Posts: 50
Joined: Thu Dec 10, 2009 10:43 am

Unstructured data

Post by pillip »

Hi,

Can datastage read unstructured data? When i mean unstructured it means, tweets or facebook messages.My understanding is we can read them but we will not be able to process them. Can you confirm my understanding.





Thank you
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

The words "unstructured data" mean too many things to too many people....so it becomes critical that such discussions be very specific......twitter data, for example can be received in json format.....fully readable by datastage.....formal excel files are fully readable by datastage......image data has had various strategies over the years, at least for "moving it" or "pointing to it"....

What exactly is the format of the data you need to consume. ?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'd also be curious what you mean by "process them"? What kind of fate did you have in mind for this data once you've read it? Or is this just an academic question.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pillip
Premium Member
Premium Member
Posts: 50
Joined: Thu Dec 10, 2009 10:43 am

Post by pillip »

Its a kind of an academic question which go this way... Can DataStage process all kinds of unstructured data available today. Can it be a replacement of Hadoop?





Thank you
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Basically yes - there is an Unstructured Data stage (most people are using this to read directly from Excel). There is also a Big Data File stage (which connects to Hadoop distributed file system), and various other mechanisms as well. Why not research on IBM web site and/or on your favourite search engine?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Big Data File stage generates MapReduce under the covers. :wink:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Information Server 11.3 added some additional support for this type of data by giving you the ability in the XML input stage to read from an API layer. So for Twitter you would have an API to read Tweets in a specific XML format. Typically you can buy from Twitter a subset of Tweet content filtered by region or topic or user type. Facebook is harder to read because content is not open (which is why IBM announced a partnership with Twitter and not Facebook). You could connect to Facebook but you can only get content from pages you are allowed to see. Again you would connect via an API and read it in as XML.

Once data is in DataStage as XML you can flatten it to relational data or output it as XML or write it to NoSQL or a Hadoop distributed file system. DataStage cannot do a lot with the content - it cannot do sentiment analysis or text analytics - you would write it out and then use SPSS or BigInsights to analyse the content.
Post Reply