DataStage vs. Stored procedure

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
yiminghu
Participant
Posts: 91
Joined: Fri Nov 14, 2003 10:09 am

DataStage vs. Stored procedure

Post by yiminghu »

Hi,

Our company started building Data Mart two years ago. It used hard coded stored procedure to implement ETL (the platform is SQL Server + NT), it works perfectly fine. Now, the company wants to convert the database to DB2 on Unix, and we are told to re-write all the ETL stored procedure using DataStage.

During the design phase, we noticed that not all stored procedure can be easily re-written in datastage, because datastage does not provide too much flexibility. (some of our stored procdure uses cursor, lots of conditional checking. ) I am wondering how do you justify which action could be used in datastage, which could not.

By the way, what's the benefit of using datastage compared to stored procedure.

Thanks,

Yiming
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

because datastage does not provide too much flexibility
Uhmm, I think I would disagree with you. The problem is that stored procedures are a programming language. An ETL tool is a different way of thinking about how you're going to move data.

The benefits of an ETL solution over a stored-procedure are too many to list. All I can say is that you have to adopt a different approach to how you're doing things. Make use of DataStage's strengths of flat file staging, hash file reference lookups, job control, job instantiation, etc.

Unless you are doing intra-instance loading, a stored procedure is abysmally slow. You will be single threading your process, unless you incorporated parallel DML into parts of it. On SQL-Server, you're not really working with high volumes of data, but you will be seeing a difference in databases on DB2. I've never lost a competition using DataStage when the opponent was handwritten code like stored procedures.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

It is indeed a requirement to have a paradigm shift in thinking of how programming is like when you are using an ETL tool.

You are not having the program control the data. You are having the data control the program. Instead of thinking of how to individually force each data to fit the way it should appear, you are herding data toward the proper conclusion. Manhandling them will choke the process.

Instead of drilling down to the specific nitty gritty details and take shortcuts to reach a conclusion, you have to obtain a higher level of perspective of how data flows. Instead of being an engineer, be an architect. Let ETL handle the nitty gritty stuff itself. Design with an understanding of what each tool would give you.

It is really difficult to explain in words the paradigm shift. Just trust me on this -- let go of the innate details such as cursors, checking, and all, think in a designer perspective: "I want to protect the database from failure while loading this data on to this table." That would translate to "How can I restore from a failure of this program?" Again to, "What is the best method to load data to this table?" Yes, you can answer those things the engineer way. Don't. Answer them the architect way. Let data drive your decision.

Sorry if this is so confusing to you.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I strongly recommend you bring in a DataStage expert for a couple of weeks to put you on the right path in the conversion of your routines. This will make your conversion project far more efficient and avoid many of the pitfalls in ETL development. Some Ascential offices offer fast start help in getting DataStage projects up and running, there are also contractors and consultants out there.

I suggest this because on a normal ETL project a developer who is new to DataStage has time to build their skill in the product while development is in the early stages of analysis and design. On your project however you have the analysis already done, you have a requirement to convert from a fully functioning DW so you just need skilled DataStage developers who can convert that code quickly and who know the range of functions available.

At the simplest level an ETL job takes an input, maps columns to output columns, performs a lookup on some columns and outputs the results. There are plenty of functions, stages and code that adds flexibility to this approach:
- constraints with multiple output paths, that let you send rows from a single cursor down different processing paths.
- Routines that accept one or more input fields and process that data using the full BASIC language (which is as fully functional as any database language) and returns one or more values for output.
- Database input stages that let you write custom SQL to join tables, add filters, select unions etc.
- Additional stages such for aggregation, sorting, splitting, merging, bulk loads. These stages perform these steps with very little programming effort.
- Hash files for creating fast staging data files that can be used for lookups.

And the list goes on and on. If you took any stored procedure from your old system and showed it to a DataStage expert they will be able to tell you how to do it in DataStage.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Yiming

Vincent is correct you need someone to help transition you from one way to whole new mindset. We had a customer switch from DataStage to PLSQL. I wrote a program to convert a job to PLSQL to they could quickly verify the business rules in a language they were fimiliar with.

If you are experts in a single database and your source and target data are both in that database then stored procedures make sense. If your source data is scattered across several databases and you have flat files then ETL tools are far superior. The other advantage is flexiblity. You can quickly incorporate new data sources with ETL tools. The time it takes to take new data and turn it into information is critical in business today. That time lag can make or break a company in real competitive industries. Some companies realize this and they are still paying for top talent. To outsource your data warehouse is a very bad idea. A friend of mine is working in CA and one report saved their company 3 million. They pay top dollar and every implementation goes in without a problem. They proved that better talent saves money in the long term.

Kim.
Mamu Kim
afreas
Participant
Posts: 19
Joined: Wed Nov 12, 2003 3:56 pm
Location: USA, Virginia

Stored Procs

Post by afreas »

Keep in mind that there are some differences in the way stored procs work on the different databases. Particularly between SQL server and UDB. It is worth research on this to help solidify the decision to go to Datastage.
ART
shawn_ramsey
Participant
Posts: 145
Joined: Fri May 02, 2003 9:59 am
Location: Seattle, Washington. USA

Post by shawn_ramsey »

I agree with Vincent bringing in a good consultant is the way to go. We had the same issues here with the developers (including myself) having to go from a stored procedure and Perl solution mindset to an ETL one. It only took a couple of weeks to get up to speed with a help of a consultant to get all of the developers on track with ETL development. We had the consultant stay on longer to tackle the harder jobs while the developers learned by building the simpler ones. Eventually our developers took on even the more challenging ETL work, even though we still leverage consultants on a regular basis.

I think you would be hard pressed to convince my team to go back to stored procedures.
Shawn Ramsey

"It is a mistake to think you can solve any major problems just with potatoes."
-- Douglas Adams
Post Reply