Page 1 of 1

Reading HTML Files

Posted: Tue Sep 22, 2009 9:00 am
by RaviReena
Hi,

Is there a way to read HTML files using Datastage and load into SQL Server datbase?

Any help greatly appreciated.

Posted: Tue Sep 22, 2009 9:21 am
by ArndW
HTML files are sequential files. If you don't need the contents parsed but just loaded into strings in the database, it is easy. HTML is a structured format which has no builtin parsers as one has for XML, though. So a lot depends upon what your HTML contains and how it needs to go into your database.

Reading HTML Files

Posted: Thu Sep 24, 2009 1:57 pm
by RaviReena
Thank you for the reply,You are right if no parsing it is working.but I need to parse the contents in HTML and then load into database.How can we parse it and load?

Posted: Thu Sep 24, 2009 2:08 pm
by eostic
One trick is to treat it as xml....could be really ugly, but it depends on which tags you are looking for. The tags are far more generic, but of the same structure as a normal xml "chunk".

Either that or use a nested set of UV/Basic functions. FIELD and such come to mind, but there are a myriad of ways to parse strings in the Transformer.

Ernie

Reading HTML files

Posted: Fri Sep 25, 2009 11:52 am
by RaviReena
thank for the reply.

so which stage do you recommend to read and parse HTML files?my HTML pages/files contains one question and four answers like an exam questions.I need to parse the question and answers and load into a SQL server database.

Posted: Mon Sep 28, 2009 2:15 am
by ArndW
If the HTML page has a valid XML structure, then the XML input stage might work for you. Could you post a sample of the page contents?

Reading HTML Files

Posted: Fri Oct 02, 2009 7:15 am
by RaviReena
thank you everyone for the response, we have too many parsing rules and decided not to do with datastage.