can we use mail and html or URL as a source in Datastage

pavankatra · Post by **pavankatra** » Fri Jun 25, 2010 5:11 am

hi,
can we use mail and html or URL as a source in Datastage.If possible please tell me how to acheive this.

Thanks in advance.

chulett · Post by **chulett** » Fri Jun 25, 2010 6:04 am

The short answer is 'no', at least not directly. There may be some 'workarounds' for this but I'll leave that for others to post.

ray.wurlod · Post by **ray.wurlod** » Fri Jun 25, 2010 6:25 am

You're getting into the realm of unstructured data here, and that can only easily be done in DataStage using custom components. I've been on a couple of sites where emails of known structure were used as a source, and even that needed careful coding (in a Transformer stage in a server job).

jcthornton · Post by **jcthornton** » Fri Jun 25, 2010 10:00 am

I'll readily agree with our most active participants. Those sources do not have standard connection stages because they do not have a fixed format that is easy to handle within DataStage.

Rather than building custom components within DataStage for performing the processing from those sources, I would recommend using a tool better suited to turning unstructured data into structured data. There are multiple paths to go down to do this, but ultimately it comes down to parsing.

So, use your favorite parsing tool to give the data you want to look at a standard structure - store the formatted, structured data in your preferred staging location - and it will become highly usable for anything you would like to do with it in DataStage.

If you want to couple it tighter with DataStage, it is possible to wrap your parser in ways to allow you to make it a part of the ETL job(s), although that is a route I personally prefer to avoid. Modularize everything, make each module simple, and use well defined interfaces in between. Makes it easier to understand, easier to troubleshoot, easier to maintain, and easier to replace individual components later.

pavankatra · Post by **pavankatra** » Sun Jun 27, 2010 9:35 pm

jcthornton wrote:I'll readily agree with our most active participants. Those sources do not have standard connection stages because they do not have a fixed format that is easy to handle within DataStage.

Rather than building custom components within DataStage for performing the processing from those sources, I would recommend using a tool better suited to turning unstructured data into structured data. There are multiple paths to go down to do this, but ultimately it comes down to parsing.

So, use your favorite parsing tool to give the data you want to look at a standard structure - store the formatted, structured data in your preferred staging location - and it will become highly usable for anything you would like to do with it in DataStage.

If you want to couple it tighter with DataStage, it is possible to wrap your parser in ways to allow you to make it a part of the ETL job(s), although that is a route I personally prefer to avoid. Modularize everything, make each module simple, and use well defined interfaces in between. Makes it easier to understand, easier to troubleshoot, easier to maintain, and easier to replace individual components later.

Thanks to all.

Sreenivasulu · Post by **Sreenivasulu** » Mon Jun 28, 2010 3:58 am

You can make the format simplistic - blank email (with only the subject). Something like that would be workable. But i do not know how to read the email exchange server port using datastage.
Regards
Sreeni