Reading data from PDF file.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'd use a Sequential File stage with a PDF converter as the filter command.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Being an Informatica Dude now as well, I went looking for how it would do something like this as I don't recall seeing any mention of PDF files as a valid source. All I could find (so far) was this:
Some guy online wrote:If you want direct access to the PDF file, you could create a simple plugin in C++ or Java to read data from there. There is an SDK for that and Informatica development team will support you.
I like Ray's answer better. Wondering just how simple this "simple" plugin would turn out to be. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
kondeti
Premium Member
Premium Member
Posts: 67
Joined: Sat Mar 04, 2006 11:38 am

Post by kondeti »

Hi Ray,
Thanks for your response. Then my perception about sequential file was wrong. Can you please elaborate your explanation? Thank you.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The filter command allows a command to deliver data to the stage via "std out" which the stage then reads from "std in" essentially. A PDF converter would (hopefully) be able to deliver converted text in that same fashion. You would, of course, need to find and purchase said converter.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply