We have a requirement to read the pdf which is embedded in XML document.
We need to parse the XML document to read the PDF content using Datastage 9.1.
Please let me know the wayout.
Reading PDF document
Moderators: chulett, rschirm, roy
Okay... while you should be able to parse the XML using DataStage and perhaps even retrieve the pdf document in some fashion, we'd need more details to provide cogent help. How is it stored in the XML? And what specifically do you mean by "read the PDF content" in an ETL context?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Alternatively,
Re: Extract an embedded pdf file from xml
1) load the xml files into db
2) extract the clob data from xml and store them in a clob
3) convert the clob data from base64 to binary and store result in blob
4) write the blob data to o/s file using UTL_FILE writing in raw mode
5) use java,phyton,pdf2txt etc to convert the pdf file to text and filter what you're seeking
https://community.oracle.com/thread/1114383
Regards,
Ozgur
Re: Extract an embedded pdf file from xml
1) load the xml files into db
2) extract the clob data from xml and store them in a clob
3) convert the clob data from base64 to binary and store result in blob
4) write the blob data to o/s file using UTL_FILE writing in raw mode
5) use java,phyton,pdf2txt etc to convert the pdf file to text and filter what you're seeking
https://community.oracle.com/thread/1114383
Regards,
Ozgur
Ozgur GUL
Assumption is the mother of all mistakes!
Assumption is the mother of all mistakes!
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: