Making ETL refer a rule based engine
Moderators: chulett, rschirm, roy
Making ETL refer a rule based engine
Hi All:
Just wanted to share a real complex issue which has been dogging me for some time now:
Has anyone got DataStage jobs to refer business rules for data in a Decision engine:
Let me give you the situation:
A leading finance giant wants a huge data transformation process done with DataStage, all rules have been developed in complex Java code and they wanted us to have DataStage call rules from these calculation engines.
While we tried duplicating code in PL/SQL and tried calling procedures , OUT parameters were the first block and second was the code complexity by itself.
Is there anyone with prior experience in handling this sort of thing successfully?
Regards
Srini
Just wanted to share a real complex issue which has been dogging me for some time now:
Has anyone got DataStage jobs to refer business rules for data in a Decision engine:
Let me give you the situation:
A leading finance giant wants a huge data transformation process done with DataStage, all rules have been developed in complex Java code and they wanted us to have DataStage call rules from these calculation engines.
While we tried duplicating code in PL/SQL and tried calling procedures , OUT parameters were the first block and second was the code complexity by itself.
Is there anyone with prior experience in handling this sort of thing successfully?
Regards
Srini
Srinivasan Balasubramanian
Data Architect
London,
Phone :07840668861
Data Architect
London,
Phone :07840668861
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I have created rule-based ETL using DataStage, however it's an entirely different model from yours in that mine was table-based rather than object-based.
That said, I'm sure it could be done, but not via stored procedures. Instead, you would need a C-callable wrapper to the objects which you would call using the GCI. And this is not territory for the faint hearted!
That said, I'm sure it could be done, but not via stored procedures. Instead, you would need a C-callable wrapper to the objects which you would call using the GCI. And this is not territory for the faint hearted!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi,
since you mentioned java, maybe the java pack could be of some help ?
Good Luck,
since you mentioned java, maybe the java pack could be of some help ?
Good Luck,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
The introduction of a rules based engine for transformation removes the T from the ETL formula. In my opinion, the T is the strength of DataStage. I suggest that your company probably bought the wrong tool if all they need DataStage to handle is the E and the L.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: Making ETL refer a rule based engine
Since duplicating the Java code is acceptable, what's the difficulty in duplicating it in DataStage?srinivasb wrote:... we tried duplicating code in PL/SQL ...
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
The amount of business logic without an organization is limitless. T is the strength of DataStage, but not everything needs to be done there. I think you can reach out to an external source to due some rules processing and still have PLENTY to do with T inside of DataStage. Using some one Ascential's data quality/profiling suite will reduce the T, but there will still be plenty of cleansing done in the ETL in an organization that does implemnent those correctly.kcbland wrote:The introduction of a rules based engine for transformation removes the T from the ETL formula. In my opinion, the T is the strength of DataStage. I suggest that your company probably bought the wrong tool if all they need DataStage to handle is the E and the L.
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.
"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.
"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
This won't apply to Srini's problem of rules in java code, but with that type of question in general I'm intrigued with what Razza Dimension Server can do for an organization. There may be some intregration with Ascential products in the future and there is also a Master Data Management forum on this site with the early beginning of discussion on Razza
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.
"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.
"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
I have a rules/metadata based data cleansing module/bolt-on for DataStage to enforce L0 data quality with some cross-column verification and enforcement. However, the question is transformation, not cleansing. That being said, T is mostly about enforcing business rules, from my experience. The cleansing is the oft-missed portion, but one that I believe should be metadata/rules driven. But transformation by rules takes the performance tweaking/tuning ability completely out of the hands of the product and the product is specifically not designed that way. That's why I made the comment that the wrong tool may have been chosen. If you're looking for a rules-driven transformation tool, it ain't DataStage.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
True. DataStage isn't strong in rules-driven transformation. Yet at the same time don't you feel quite powerful with it at your hands? It can certainly implement complex rules during transformation.kcbland wrote: If you're looking for a rules-driven transformation tool, it ain't DataStage.
Any thoughts on wheter it needs to gain strength as a product in this area? In a recent point about BI I made the point that I don't see a fit to incorporate this into DS and the ASCL suite of products...I still don't think there is probably a strong need to make it a rules-driven ETL tool.
Also, will some of the Web Services capabilities (I'm still a neophyte in Web Services) make external calls to a ruled based engine easier? If so, it may be possible to make this an easier task for DS without changing its style.
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.
"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.
"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
off topic story
That reminds me of a story of one of my companies favorite managers at a client. One of the IT managers was quite a character, and once said, "I love DataStage, I can write a DataStage program to wipe my ass!"datastage wrote: Yet at the same time don't you feel quite powerful with it at your hands?
You know you have a satisfied client and believer in the product when you hear that.
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.
"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.
"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
As much as I enjoy this thread, we're discussing last years model. The Server product is not the future, it's the Parallel edition. The flexibility and ease of use of the Server product will allow you to do anything. Every custom solution has to be done with an eye towards open metadata and sharing.
I don't think Ascential wants to move into an amorphous "anything goes" rules based arena, but that's my opinion and I don't have any specific insider knowledge. You lose all ability to high-performance tune, because the rules will vary from row to row. The trade-off is flexibility versus performance, and right now performance is a bigger driving sales factor than flexibility.
Personally, I LOVE the Server product and believe that Parallel technology should have blended into it for the E and L and some T. Right now, my ETL solutions are mixed bag combinations of Server and Parallel jobs. Introducing a Rules Based Transformation stage would pretty much chuck the whole way I'd use DS.
I don't think Ascential wants to move into an amorphous "anything goes" rules based arena, but that's my opinion and I don't have any specific insider knowledge. You lose all ability to high-performance tune, because the rules will vary from row to row. The trade-off is flexibility versus performance, and right now performance is a bigger driving sales factor than flexibility.
Personally, I LOVE the Server product and believe that Parallel technology should have blended into it for the E and L and some T. Right now, my ETL solutions are mixed bag combinations of Server and Parallel jobs. Introducing a Rules Based Transformation stage would pretty much chuck the whole way I'd use DS.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
I have heard almost the same requirement from a bank, a lot of advanced calculation routines had ben built using C code and they wanted to know if it could be reused with an ETL tool. Worst case scenarios
- build it again using DataStage routines
- use DataStage to touch the data down to flat files, write a script that starts the calculation engine and outputs it to flat files, read them in using DataStage.
Best case scenario:
- Find some way to deliver data directly to the calculation engine from DataStage such as API (which has a short use by date) or web services.
I would avoid PL/SQL as the layer between DataStage and the calculation engine as it is too clumsy.
I agree that a centralised rules engine is probably out of the reach of Ascential and may distract them from the core objectives of scalability and easy integration. Products offering centralised rules services carry price tags several times larger then a full Ascential suite and they are a very complex technology.
- build it again using DataStage routines
- use DataStage to touch the data down to flat files, write a script that starts the calculation engine and outputs it to flat files, read them in using DataStage.
Best case scenario:
- Find some way to deliver data directly to the calculation engine from DataStage such as API (which has a short use by date) or web services.
I would avoid PL/SQL as the layer between DataStage and the calculation engine as it is too clumsy.
I agree that a centralised rules engine is probably out of the reach of Ascential and may distract them from the core objectives of scalability and easy integration. Products offering centralised rules services carry price tags several times larger then a full Ascential suite and they are a very complex technology.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
INTEGRITY began life as a flat-file to flat-file tool. When it became Quality Stage the input and output flat files could be replaced (effectively) by named pipes. This approach might be able to be used for your API too - the Sequential File stage has been able to used named pipes for quite a few versions now.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Srinivas,
I understand what you want to do and we (Ascential) accomplished this very task @ BellSouth in 2001 and 2002. The requirement was to process data and apply very complex business rules (not only complex, but dozens of rules had to be applied to each incoming row of data), these business rules that would certainly slow the output of rows to the DW. We chose to use ILOG Jrules (Java) rules engine and to create our own plug-in to call this interface. It did not take much time and worked very well.
A developer would translate the business rule into a Java class coded to the ILOG API and the plug-in would allow for a choice of what class to use. The bonus here was that you could choose to operate on one row of data or 1000's of rows of data so you can tune the process to meet your needs.
Writing sequential or procedural business rules within DS is not the same as calling an inference or rules engine. The rules engine (because of the algorithm used) is much faster and can act on multiple rules and rows of data simultaneously.
If you would like more information then please drop me a private line and I can send you some information on what we did and how we accomplished it.
Regards,
Michael Hester
I understand what you want to do and we (Ascential) accomplished this very task @ BellSouth in 2001 and 2002. The requirement was to process data and apply very complex business rules (not only complex, but dozens of rules had to be applied to each incoming row of data), these business rules that would certainly slow the output of rows to the DW. We chose to use ILOG Jrules (Java) rules engine and to create our own plug-in to call this interface. It did not take much time and worked very well.
A developer would translate the business rule into a Java class coded to the ILOG API and the plug-in would allow for a choice of what class to use. The bonus here was that you could choose to operate on one row of data or 1000's of rows of data so you can tune the process to meet your needs.
Writing sequential or procedural business rules within DS is not the same as calling an inference or rules engine. The rules engine (because of the algorithm used) is much faster and can act on multiple rules and rows of data simultaneously.
If you would like more information then please drop me a private line and I can send you some information on what we did and how we accomplished it.
Regards,
Michael Hester
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
Mike, you want to delver deeper with Srinivas, but I'd still like to see a post here going a step deeper into the methodology you used. Or at least something to help us gain a better understanding on how to implement this in general without detailed specifics to Srinivas's development or the work at BellSouth.mhester wrote:If you would like more information then please drop me a private line and I can send you some information on what we did and how we accomplished it.
Thanks,
Byron
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.
"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.
"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure