Reading HTML Files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
RaviReena
Premium Member
Premium Member
Posts: 68
Joined: Tue Jul 29, 2008 10:01 am

Reading HTML Files

Post by RaviReena »

Hi,

Is there a way to read HTML files using Datastage and load into SQL Server datbase?

Any help greatly appreciated.
Rao V
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

HTML files are sequential files. If you don't need the contents parsed but just loaded into strings in the database, it is easy. HTML is a structured format which has no builtin parsers as one has for XML, though. So a lot depends upon what your HTML contains and how it needs to go into your database.
RaviReena
Premium Member
Premium Member
Posts: 68
Joined: Tue Jul 29, 2008 10:01 am

Reading HTML Files

Post by RaviReena »

Thank you for the reply,You are right if no parsing it is working.but I need to parse the contents in HTML and then load into database.How can we parse it and load?
Rao V
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

One trick is to treat it as xml....could be really ugly, but it depends on which tags you are looking for. The tags are far more generic, but of the same structure as a normal xml "chunk".

Either that or use a nested set of UV/Basic functions. FIELD and such come to mind, but there are a myriad of ways to parse strings in the Transformer.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
RaviReena
Premium Member
Premium Member
Posts: 68
Joined: Tue Jul 29, 2008 10:01 am

Reading HTML files

Post by RaviReena »

thank for the reply.

so which stage do you recommend to read and parse HTML files?my HTML pages/files contains one question and four answers like an exam questions.I need to parse the question and answers and load into a SQL server database.
Rao V
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If the HTML page has a valid XML structure, then the XML input stage might work for you. Could you post a sample of the page contents?
RaviReena
Premium Member
Premium Member
Posts: 68
Joined: Tue Jul 29, 2008 10:01 am

Reading HTML Files

Post by RaviReena »

thank you everyone for the response, we have too many parsing rules and decided not to do with datastage.
Rao V
Post Reply