Calling a Oracle Stored Procedure From Datastage (URGENT)

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

raju711001
Participant
Posts: 3
Joined: Mon Jan 12, 2004 11:39 am

Calling a Oracle Stored Procedure From Datastage (URGENT)

Post by raju711001 »

I need to call a Oracle Stored Procedure that accepts input from Datastage and returns values back to the Datastage job.

I have come accross lot of stuff about this in this forum and nowere it's mentioned clearly how it works.

Can someone please give me a clear picture of this, means what stage I need to use to call this Procedure from and were do I store the Out parameters returned by this procedure in datastage. Also Do i need t import the procedure definition into the datastage first before making a call to this procedure.

Thanks in Advance.
Rakesh
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Executing an Oracle Stored Procedure that "does something", such as dropping constraints/indexes, rebuilding indexes, etc, can be done many ways.

You have not described fully what it is you want to do, so of course no one wants to respond. I shall attempt a generic response.

To execute a stored procedure from a standalone viewpoint, write an SQL script. Then execute a sqlplus session with the SQL script passed to it. Execute the sqlplus session either from a Command stage or from using DS BASIC calling the DSExecute API from somewhere (before/after transformer/job subroutine, BATCH job).

To execute a stored procedure in conjunction with an OCI stage that either streams or receives data, put the CALL storedprocedure statement into the before/after SQL tab. Format the statement appropriately.

If you are using a stored procedure to stream results or receive streaming data, then create and compile the stored procedure in the instance. Use DS Manager to import plugin metadata and choose the stored procedure checkbox. Open your transformation job and choose a SQL script type of stored procedure. The tabs will change to options for a stored procedure and then you will configure it. This is not available for Parallel jobs, just Server.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
raju711001
Participant
Posts: 3
Joined: Mon Jan 12, 2004 11:39 am

Post by raju711001 »

Thanks for the Reply.

What I am looking for.
---------------------------------------------------------------

I am reading data from a file, For each row of the file need to pass one of the columns to a Oracle Stored Procedure, the Procedure contains either select/insert/Update/Delete statement and passes a couple of values (i.e. Out Parameters) back, then I need to write all the fields what I read from the file and the values returned from the procedure to an Output file.
We have the Oracle Stored Procedure already written and just want to make a call to this instead of building this functionality in datastage.

Thanks a Lot.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

We just had a phone conversation concerning the requirements so I'm going to lay it out there for the whole forum comment.

This poster is working on a batch oriented load of several million rows on average. They are doing a proof of concept to show the viability of DataStage, notably PX technology, versus writing PL/SQL based ETL.

No matter the choice made, they are considering writing common reference objects as stored procedures. This way, many PL/SQL scripts can use the same objects to perform lookups, and have one object to manage regarding resolving the lookup. They also want to use the same stored procedures as reference objects in DataStage. The stored procedures are really wrappers to simple or complex SQL statements, that could contain join, union, and/or aggregate SQL logic.

So, to summarize, the poster is going to use OCI style reference lookups using stored procedures FOR EVERY REFERENCE LOOKUP. They will never directly query tables. They will not use hash files in Server jobs. They will not use PX datasets for merges or lookups.


COULD THE FORUM PLEASE COMMENT (Directly to points please)

1. Is this a scalable solution?
2. Is this a performant solution?
3. Is this a manageable solution from a code maintenance point of view?
4. Is the solution tunable?
5. Is the solution utilizing the tool to its fullest processing throughput capability?
6. Is the solution utilizing the tool to its fullest graphical data flow metaphor?
7. Is the solution supporting a metadata/enterprise impact analysis framework?
8. Is the solution elegant?

Thanks
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

kcbland wrote: 1. Is this a scalable solution?
2. Is this a performant solution?
3. Is this a manageable solution from a code maintenance point of view?
4. Is the solution tunable?
5. Is the solution utilizing the tool to its fullest processing throughput capability?
6. Is the solution utilizing the tool to its fullest graphical data flow metaphor?
7. Is the solution supporting a metadata/enterprise impact analysis framework?
8. Is the solution elegant?
1. Only to a limited degree. The bottleneck is likely to be the database server (and/or the network connection to it).
2. No. Even with a prepared query executing the stored procedure, there's a heck of a lot more overhead than pre-loading a hashed file, and no capacity for read-caching lookups in memory.
3. Yes, but with much greater difficulty than the visual paradigm used by DataStage.
4. Yes, in the Oracle sense, by throwing more resources (e.g. listeners, licences, consultants, hardware) at it and making use of partition parallelism to use those resources.
5. No if the tool is DataStage. Probably if the tool is hand-coded PL/SQL.
6. Clearly not.
7. Clearly not.
8. That's a subjective judgment. Dilbert would possibly argue yes. I wouldn't.
Last edited by ray.wurlod on Tue Jan 13, 2004 3:49 am, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
well all said and done,
they only thing I can add is if your not going to use the tool in the way it was intended I guess the benefit you gain from it would be minimal to say the least.

I have some more things to say on this but I rathere not state them, hoping you understand by now that your concept of how to use DS is off the mark IMHO.

Good Luck,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

kcbland wrote:This poster is working on a batch oriented load of several million rows on average. They are doing a proof of concept to show the viability of DataStage, notably PX technology, versus writing PL/SQL based ETL.
[snip]
So, to summarize, the poster is going to use OCI style reference lookups using stored procedures FOR EVERY REFERENCE LOOKUP. They will never directly query tables. They will not use hash files in Server jobs. They will not use PX datasets for merges or lookups.
IMHO, this 'Proof of Concept' is doomed to fail, especially considering their average data volumes. You can't test the viability of anything if you don't put it to work in the manner in which it was intended to be used. The constraint of using Stored Procedures for everything from reference lookups to handling target tables means you are still running "PL/SQL based ETL" when you get right down to it. :?

If they really wanted to do a POC, then build some DataStage jobs using their Stored Procedures, run them against some of these large datasets and set that up as the initial benchmark. Then create the same jobs again as they were meant to be created, be it Server or PX (or heck, do both), and see what kind of performance improvement you get. Hmmm...

I think Ray has done fine with the point-by-point issues. No need to belabor any of that other than to ask - are the source/target databases co-resident with the DataStage server? I don't recall that being specified anywhere.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

I need to call a Oracle Stored Procedure that accepts input from Datastage and returns values back to the Datastage job.
I will answer to what I believe this poster is asking in the above statement. The stored procedure can accept input, but as far as I know DS cannot handle any return other than standard application raised errors in the form of

Code: Select all

raise_application_error 
These will be displayed in the log for the job.

I don't believe you can return a value that can be meaningfully used by the job in later processing. I grabbed the following from the help within DS -
DataStage supports the use of stored procedures (with or without input arguments) and the creation of a result set, but does not support output arguments or return values. A stored procedure may have a return value or output parameters defined, but these are ignored at run time.
I hope this helps.

Regards,

Michael Hester
raju_chvr
Premium Member
Premium Member
Posts: 165
Joined: Sat Sep 27, 2003 9:19 am
Location: USA

Post by raju_chvr »

Okay the last post which totally contradicts what Kenneth says. Even I am confused. I haven't used Stored Procedures from DataStage. I heard both of the statements that it can & cannot be done.

Can someone clear up this. Can we have result set of a Stored Procdure captured in DataStage or not?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

raju_chvr wrote:Okay the last post which totally contradicts what Kenneth says. Even I am confused. I haven't used Stored Procedures from DataStage. I heard both of the statements that it can & cannot be done.

Can someone clear up this. Can we have result set of a Stored Procdure captured in DataStage or not?
You can use a stored procedure as a sourcing object, or a loading object. I do believe an ODBC stage has the ability to act as a reference object using a stored procedure. In this manner, you can design a job like:

Code: Select all

seq ---> xfm w/ODBC reference lookup ---> sequential file
and use a stored procedure as the reference lookup SQL mechanism.

I had a 30 minute conversation with the poster, so I fully understand what he is ask to do. He's a savvy and experienced DataStage developer, so our conversation was quite intense. I framed the discussion so that he could see everyone's feedback so that he can take the results to his company. I also have not prejudiced the discussion with my opinion yet. I tried to ask for feedback on specific questions someone should ask when evaluating the ETL design.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Here's my feedback:

1. Is this a scalable solution?
No. As you add more processes using the same stored procedure simultaneously to perform a "reference", the database will congest along that access path.
2. Is this a performant solution?
No. It does not scale, so therefore for batch oriented high volume processing it cannot be performant. Your parallelism within the database is only being used to return the results for that one row. You would be better off joining tables of all your data and allowing the database the choice of a parallel query. The database will have to gear up for every single query. If you have 2 million source rows, that's 2 million stored procedure calls. Your query parsing and queuing overhead is a killer alone compared to a full dataset handed to the database and told to go to town.
3. Is this a manageable solution from a code maintenance point of view?
From a pl/SQL perspective, yes. From the ETL perspective, no. Once the solution is protyped, it will be more convoluted that a simple ODBC/OCI/hash/PX merge/PX lookup reference lookup.
4. Is the solution tunable?
From a pl/SQL perspective, yes. From the ETL perspective, no. You're not "inside" the ETL tool when in counts the most.
5. Is the solution utilizing the tool to its fullest processing throughput capability?
No. DataStage has high performance reference lookup mechanisms geared to offload processing from a database and put it into the hands of the tool where it can work magic. In Server you have hash files with preload to memory, caching, shared-in memory hashes, etc. In PX you have high-performance parallel partitioning with a merge paradigm. You're using none of this.
6. Is the solution utilizing the tool to its fullest graphical data flow metaphor?
No. The logic within the stored procedure, the tables accessed, are hidden from the job design. Therefore, the full data flow picture is basically empty.
7. Is the solution supporting a metadata/enterprise impact analysis framework?
No. You can't analyze unless you incorporate the stored procedure logic into your metadata framework. You basically are hidding the T in ETL in the database, instead of in the visible world.
8. Is the solution elegant?
In my opinion, NO. You're using DataStage as simply a task manager and a bulk loader/unloader. The true power of ETL is the T, not the E and L. The proposed design is more like EtL, instead of eTl.

My recommendation for a proof of concept is to develop your jobs three ways:
  • 1. Using the current design proposal
    2. Using DataStage Server with hash lookups
    3. Using DataStage Parallel with merging datasets
Your performance will probably be something like on the order of 1X, 100X, 10000X, respectively.

Good luck!
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
alexysflores
Participant
Posts: 18
Joined: Mon Jan 12, 2004 7:20 am
Location: USA

Re: Calling a Oracle Stored Procedure From Datastage (URGENT

Post by alexysflores »

[quote="raju711001"]I need to call a Oracle Stored Procedure that accepts input from Datastage and returns values back to the Datastage job.

I have come accross lot of stuff about this in this forum and nowere it's mentioned clearly how it works.

Can someone please give me a clear picture of this, means what stage I need to use to call this Procedure from and were do I store the Out parameters returned by this procedure in datastage. Also Do i need t import the procedure definition into the datastage first before making a call to this procedure.

Thanks in Advance.
Rakesh[/quote]

If your server is a Unix box this is a solution.

1. Write a korn or shell scripts program that execute SQLnet and calls your PL/SQL code.

2. Excute command ExecSH on your Before/After Tab properties and enter your korn or shell script program.

On Windoze box use ODBC plug-ins - performance will be the main issue coz ODBC plug-ins are really sloooooooooow its like running a 8088 processor.

Stored Procedure solution in only for installation that does not have any ETL tools in their Shops. If you have DataStage up and running FORGET Stored Procedure in your overall design concepts, IF ITS WORKING DON'T BREAK IT.

There are lots of ramification on using Stored Prcodures - bad stuff.
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

kcbland wrote:This poster is working on a batch oriented load of several million rows on average.
A moderate sized DataStage solution.
They are doing a proof of concept to show the viability of DataStage, notably PX technology, versus writing PL/SQL based ETL.
If they are doing a proof of concept, then they MUST utilize the tools the way it is designed.
They also want to use the same stored procedures as reference objects in DataStage.
This is not a valid Proof of Concept effort.
1. Is this a scalable solution?
For that size of data? Hmm... to a point. Definitely not to the point where we are now. DataStage PX - 20 hours run (and still dropping) using DataStage solution; 48 hour run using complex SQL or PL/SQL within DataStage. Depending on the Database server which CAN be hit by other requests, instead of the local DataStage server can dramatically hurt performance.
2. Is this a performant solution?
No.
3. Is this a manageable solution from a code maintenance point of view?
No. It is more difficult to archive within a source repository multiple diverse location of codes, and to ensure that your gatekeeper (for code migration) migrate the correct version to the correct designations. To also identify failure points are more difficult. You will have combatants between PL/SQL team and DataStage team blaming each other for a bug. To conduct a QA would require more test cases - you MUST independently validate the PL/SQL code, AND the DataStage code to ensure that it is performing correctly on its own. Code changes within PL/SQL also may have an adverse effect on the DataStage code -- which flies against the philosophy of Data Warehousing Design -- Graceful changes.
4. Is the solution tunable?
Anything is tunable. However, with this solution, it is far more difficult to contain the factors that could affect the program. With a DataStage only solution -- it's easy: Just change input data to a test sequential file, and go from there. But to ensure that PL/SQL code provide the best performance for a specific ruleset would require an expert on PL/SQL. You are practically paying two -- An expert PL/SQL coder and an expert DataStage coder -- for a job that one can do.
5. Is the solution utilizing the tool to its fullest processing throughput capability?
Yes, and no. With simple SQL, yes. With complex SQL, no.
6. Is the solution utilizing the tool to its fullest graphical data flow metaphor?
This is a question that is better asked as Question #7:
7. Is the solution supporting a metadata/enterprise impact analysis framework?
Absolutely not. It is far easier to pull the tables, and compare it to the DataStage's formatting to ensure compatibility than it is to figure out the metadata behind PL/SQL code.
8. Is the solution elegant?
Elegant is something I am a strong proponent for, but I am more concerned with question #3 and #4 than I am with #8. If I can haul ass on #3 and #4, I could care less whether it looks pretty or not.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

FYI: Elegant solution would mean is it simple in design, easy to maintain, powerful, flexible, and a joy to work with. Yugo - ugly, can't kill it. Ferrari - awesome, high maintenance and cost. BMW 530i - sweeeet, worth every penny*.









* Ken's wish list item.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Seems to me that what you are trying to do, get DataStage to use PL/SQL for lookups, is the same as trying to get Oracle Warehouse Builder to use hash files. The first question you should get when presenting your POC is why didn't you use OWB instead? It has superior interaction with Oracle components. I think your POC is doomed.

The best performance of your DataStage jobs requires local reference data on the DataStage server. I'm not strong on PX functionality but in the standard version you could achieve this by keeping the logic in PL/SQL and dumping the output from these procedures in bulk runs into hash files.

You know those cooking shows where the chef prepares the ingredients for a cake and then says "here is one I cooked earlier" and whips out a perfect cake. By dumping your PL/SQL reference generated data into hash files en masse when you run an extract it gets the lookup information straight from these prepared hash files. These hash files can contain the output of unions, aggregations, joins etc. You keep your logic in PL/SQL and you get the performance of local lookups.

Now this isn't necessarily manageable or elegant, but it does get improved performance.
Post Reply