Structured data has a known format like char, integer and so on and can be queried to get the desired result.
To quote Bill Inmon who is considered as the father of Data warehousing: "The challenge of integrating critical knowledge coordinates buried in volumes of unstructured data may become the single largest issue for IT organizations in coming years."
It is also said that 80% of data are in the unstructured manner. Which means the data that we are populating from the structured data amounts only to 20% of data in the warehouse.
![Exclamation :!:](./images/smilies/icon_exclaim.gif)
This deals us with the question as to how we are handling data in an unstructured manner? How do we extract the data? If somebody has some good insights into this topic, could they through some light for the below points
![Question :?:](./images/smilies/icon_question.gif)
I want to know how unstrucutured data in:
1. Mails are being handled? How do we take care of attachments in mails, if any?
2. In Word documents, Microsoft Excel spreadsheets and PDF documents handled?
Apart from the areas mentioned, Is there any other area where the unstructured data resides?