Page 1 of 1

can we use mail and html or URL as a source in Datastage

Posted: Fri Jun 25, 2010 5:11 am
by pavankatra
hi,
can we use mail and html or URL as a source in Datastage.If possible please tell me how to acheive this.

Thanks in advance.

Posted: Fri Jun 25, 2010 6:04 am
by chulett
The short answer is 'no', at least not directly. There may be some 'workarounds' for this but I'll leave that for others to post.

Posted: Fri Jun 25, 2010 6:25 am
by ray.wurlod
You're getting into the realm of unstructured data here, and that can only easily be done in DataStage using custom components. I've been on a couple of sites where emails of known structure were used as a source, and even that needed careful coding (in a Transformer stage in a server job).

Posted: Fri Jun 25, 2010 10:00 am
by jcthornton
I'll readily agree with our most active participants. Those sources do not have standard connection stages because they do not have a fixed format that is easy to handle within DataStage.

Rather than building custom components within DataStage for performing the processing from those sources, I would recommend using a tool better suited to turning unstructured data into structured data. There are multiple paths to go down to do this, but ultimately it comes down to parsing.

So, use your favorite parsing tool to give the data you want to look at a standard structure - store the formatted, structured data in your preferred staging location - and it will become highly usable for anything you would like to do with it in DataStage.

If you want to couple it tighter with DataStage, it is possible to wrap your parser in ways to allow you to make it a part of the ETL job(s), although that is a route I personally prefer to avoid. Modularize everything, make each module simple, and use well defined interfaces in between. Makes it easier to understand, easier to troubleshoot, easier to maintain, and easier to replace individual components later.

Posted: Sun Jun 27, 2010 9:35 pm
by pavankatra
jcthornton wrote:I'll readily agree with our most active participants. Those sources do not have standard connection stages because they do not have a fixed format that is easy to handle within DataStage.

Rather than building custom components within DataStage for performing the processing from those sources, I would recommend using a tool better suited to turning unstructured data into structured data. There are multiple paths to go down to do this, but ultimately it comes down to parsing.

So, use your favorite parsing tool to give the data you want to look at a standard structure - store the formatted, structured data in your preferred staging location - and it will become highly usable for anything you would like to do with it in DataStage.

If you want to couple it tighter with DataStage, it is possible to wrap your parser in ways to allow you to make it a part of the ETL job(s), although that is a route I personally prefer to avoid. Modularize everything, make each module simple, and use well defined interfaces in between. Makes it easier to understand, easier to troubleshoot, easier to maintain, and easier to replace individual components later.
Thanks to all.

Posted: Mon Jun 28, 2010 3:58 am
by Sreenivasulu
You can make the format simplistic - blank email (with only the subject). Something like that would be workable. But i do not know how to read the email exchange server port using datastage.
Regards
Sreeni