extract data from Emails, PDF files ?
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 13
- Joined: Thu Jul 17, 2008 4:11 am
extract data from Emails, PDF files ?
How can we be able to extract data from Emails, PDF files?
Chandra Sekhar
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
There's that. There was also a ClickPack available that would allow you to access emails and weblogs from what I recall, but I say "was" because AFAIK support for it has been dropped and it's not a part of the 8.x release. It was kind of interesting as it brought Perl support into DataStage.
I don't think you could even do much with a pdf. Unless, perhaps, you had your hands on a third-party tool / bridge / something that could read them. [shrug]
I don't think you could even do much with a pdf. Unless, perhaps, you had your hands on a third-party tool / bridge / something that could read them. [shrug]
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Charter Member
- Posts: 193
- Joined: Tue Sep 05, 2006 8:01 pm
- Location: Australia
Hi,
PDF files are usually hard to extract but it is possible. There are software (not Datastage though) that you can use to read PDF and then convert into Excel or CSV files. One that comes to mind is called Able2Extract. Just Google it.
Once it's in CSV format then you can use Datastage to process it.
As for Emails, it depends on what email system.The underlying email database may be queryable.
Cheers,
JS
PDF files are usually hard to extract but it is possible. There are software (not Datastage though) that you can use to read PDF and then convert into Excel or CSV files. One that comes to mind is called Able2Extract. Just Google it.
Once it's in CSV format then you can use Datastage to process it.
As for Emails, it depends on what email system.The underlying email database may be queryable.
Cheers,
JS
-
- Participant
- Posts: 13
- Joined: Thu Jul 17, 2008 4:11 am
Hi ray,ray.wurlod wrote:When's the interview?
IBM has lots of information it wants to share with you about accessing unstructured data. ...
Thanks for the reply.
Actually we are evaluating different etl tools to suggest to my client.My client has a requirement to extract the data from MS-Excel,XML,E-Mails and PDF's.
As per my knowledge Datastage is satisfying all the requiements but need to clarify wether it can able to extract Email and PDF data.
I was ot sure what kind of data does Email system has at this point.
Chandra Sekhar
-
- Participant
- Posts: 13
- Joined: Thu Jul 17, 2008 4:11 am
Hi ray,ray.wurlod wrote:When's the interview?
IBM has lots of information it wants to share with you about accessing unstructured data. ...
Thanks for the reply.
Actually we are evaluating different etl tools to suggest to my client.My client has a requirement to extract the data from MS-Excel,XML,E-Mails and PDF's.
As per my knowledge Datastage is satisfying all the requiements but need to clarify wether it can able to extract Email and PDF data.
I was ot sure what kind of data does Email system has at this point.
Chandra Sekhar
-
- Participant
- Posts: 13
- Joined: Thu Jul 17, 2008 4:11 am