Page 1 of 2

Making ETL refer a rule based engine

Posted: Mon May 31, 2004 1:00 am
by srinivasb
Hi All:

Just wanted to share a real complex issue which has been dogging me for some time now:

Has anyone got DataStage jobs to refer business rules for data in a Decision engine:

Let me give you the situation:

A leading finance giant wants a huge data transformation process done with DataStage, all rules have been developed in complex Java code and they wanted us to have DataStage call rules from these calculation engines.

While we tried duplicating code in PL/SQL and tried calling procedures , OUT parameters were the first block and second was the code complexity by itself.
Is there anyone with prior experience in handling this sort of thing successfully?

Regards
Srini

Posted: Mon May 31, 2004 1:11 am
by ray.wurlod
I have created rule-based ETL using DataStage, however it's an entirely different model from yours in that mine was table-based rather than object-based.
That said, I'm sure it could be done, but not via stored procedures. Instead, you would need a C-callable wrapper to the objects which you would call using the GCI. And this is not territory for the faint hearted!

Posted: Mon May 31, 2004 8:38 am
by roy
Hi,
since you mentioned java, maybe the java pack could be of some help ?

Good Luck,

Posted: Mon May 31, 2004 12:35 pm
by kcbland
The introduction of a rules based engine for transformation removes the T from the ETL formula. In my opinion, the T is the strength of DataStage. I suggest that your company probably bought the wrong tool if all they need DataStage to handle is the E and the L.

Re: Making ETL refer a rule based engine

Posted: Mon May 31, 2004 1:38 pm
by ray.wurlod
srinivasb wrote:... we tried duplicating code in PL/SQL ...
Since duplicating the Java code is acceptable, what's the difficulty in duplicating it in DataStage?

Posted: Mon May 31, 2004 5:56 pm
by datastage
kcbland wrote:The introduction of a rules based engine for transformation removes the T from the ETL formula. In my opinion, the T is the strength of DataStage. I suggest that your company probably bought the wrong tool if all they need DataStage to handle is the E and the L.
The amount of business logic without an organization is limitless. T is the strength of DataStage, but not everything needs to be done there. I think you can reach out to an external source to due some rules processing and still have PLENTY to do with T inside of DataStage. Using some one Ascential's data quality/profiling suite will reduce the T, but there will still be plenty of cleansing done in the ETL in an organization that does implemnent those correctly.

Posted: Mon May 31, 2004 6:02 pm
by datastage
This won't apply to Srini's problem of rules in java code, but with that type of question in general I'm intrigued with what Razza Dimension Server can do for an organization. There may be some intregration with Ascential products in the future and there is also a Master Data Management forum on this site with the early beginning of discussion on Razza

Posted: Mon May 31, 2004 7:20 pm
by kcbland
I have a rules/metadata based data cleansing module/bolt-on for DataStage to enforce L0 data quality with some cross-column verification and enforcement. However, the question is transformation, not cleansing. That being said, T is mostly about enforcing business rules, from my experience. The cleansing is the oft-missed portion, but one that I believe should be metadata/rules driven. But transformation by rules takes the performance tweaking/tuning ability completely out of the hands of the product and the product is specifically not designed that way. That's why I made the comment that the wrong tool may have been chosen. If you're looking for a rules-driven transformation tool, it ain't DataStage.

Posted: Mon May 31, 2004 9:45 pm
by datastage
kcbland wrote: If you're looking for a rules-driven transformation tool, it ain't DataStage.
True. DataStage isn't strong in rules-driven transformation. Yet at the same time don't you feel quite powerful with it at your hands? It can certainly implement complex rules during transformation.

Any thoughts on wheter it needs to gain strength as a product in this area? In a recent point about BI I made the point that I don't see a fit to incorporate this into DS and the ASCL suite of products...I still don't think there is probably a strong need to make it a rules-driven ETL tool.

Also, will some of the Web Services capabilities (I'm still a neophyte in Web Services) make external calls to a ruled based engine easier? If so, it may be possible to make this an easier task for DS without changing its style.

off topic story

Posted: Mon May 31, 2004 9:52 pm
by datastage
datastage wrote: Yet at the same time don't you feel quite powerful with it at your hands?
That reminds me of a story of one of my companies favorite managers at a client. One of the IT managers was quite a character, and once said, "I love DataStage, I can write a DataStage program to wipe my ass!"

You know you have a satisfied client and believer in the product when you hear that.

Posted: Mon May 31, 2004 9:57 pm
by kcbland
As much as I enjoy this thread, we're discussing last years model. The Server product is not the future, it's the Parallel edition. The flexibility and ease of use of the Server product will allow you to do anything. Every custom solution has to be done with an eye towards open metadata and sharing.

I don't think Ascential wants to move into an amorphous "anything goes" rules based arena, but that's my opinion and I don't have any specific insider knowledge. You lose all ability to high-performance tune, because the rules will vary from row to row. The trade-off is flexibility versus performance, and right now performance is a bigger driving sales factor than flexibility.

Personally, I LOVE the Server product and believe that Parallel technology should have blended into it for the E and L and some T. Right now, my ETL solutions are mixed bag combinations of Server and Parallel jobs. Introducing a Rules Based Transformation stage would pretty much chuck the whole way I'd use DS.

Posted: Mon May 31, 2004 10:58 pm
by vmcburney
I have heard almost the same requirement from a bank, a lot of advanced calculation routines had ben built using C code and they wanted to know if it could be reused with an ETL tool. Worst case scenarios
- build it again using DataStage routines
- use DataStage to touch the data down to flat files, write a script that starts the calculation engine and outputs it to flat files, read them in using DataStage.
Best case scenario:
- Find some way to deliver data directly to the calculation engine from DataStage such as API (which has a short use by date) or web services.

I would avoid PL/SQL as the layer between DataStage and the calculation engine as it is too clumsy.

I agree that a centralised rules engine is probably out of the reach of Ascential and may distract them from the core objectives of scalability and easy integration. Products offering centralised rules services carry price tags several times larger then a full Ascential suite and they are a very complex technology.

Posted: Mon May 31, 2004 11:29 pm
by ray.wurlod
INTEGRITY began life as a flat-file to flat-file tool. When it became Quality Stage the input and output flat files could be replaced (effectively) by named pipes. This approach might be able to be used for your API too - the Sequential File stage has been able to used named pipes for quite a few versions now.

Posted: Tue Jun 01, 2004 1:46 pm
by mhester
Srinivas,

I understand what you want to do and we (Ascential) accomplished this very task @ BellSouth in 2001 and 2002. The requirement was to process data and apply very complex business rules (not only complex, but dozens of rules had to be applied to each incoming row of data), these business rules that would certainly slow the output of rows to the DW. We chose to use ILOG Jrules (Java) rules engine and to create our own plug-in to call this interface. It did not take much time and worked very well.

A developer would translate the business rule into a Java class coded to the ILOG API and the plug-in would allow for a choice of what class to use. The bonus here was that you could choose to operate on one row of data or 1000's of rows of data so you can tune the process to meet your needs.

Writing sequential or procedural business rules within DS is not the same as calling an inference or rules engine. The rules engine (because of the algorithm used) is much faster and can act on multiple rules and rows of data simultaneously.

If you would like more information then please drop me a private line and I can send you some information on what we did and how we accomplished it.

Regards,

Michael Hester

Posted: Tue Jun 01, 2004 2:46 pm
by datastage
mhester wrote:If you would like more information then please drop me a private line and I can send you some information on what we did and how we accomplished it.
Mike, you want to delver deeper with Srinivas, but I'd still like to see a post here going a step deeper into the methodology you used. Or at least something to help us gain a better understanding on how to implement this in general without detailed specifics to Srinivas's development or the work at BellSouth.

Thanks,

Byron