Data from Webpages To Datastage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
hhh
Participant
Posts: 86
Joined: Tue Aug 02, 2005 7:39 am

Data from Webpages To Datastage

Post by hhh »

Hello,

In my application i want to read the data from web pages .

Webpages inclue jpeg,text etc data but i need to extract only text data from the webpages so extracted data, i can pass through datastage.

could you people share your exeperience on this issue.

Thanks
HH
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Where are your webpages being read from? Normally the pages will contain only HTML text and reference other non-text data. There is a Click Pack to help you work with the log files, but it doesn't help with the actual pages.

You can strip out unprintable text (using the ;'MCP' conversion) to clean out binary data from a text stream, but in DataStage you would still need to parse the data into useable information.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you have direct access to the web pages you can "View Source" and use DataStage to parse that HTML.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply