reusability concepts

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
PREMILA
Participant
Posts: 4
Joined: Wed Oct 06, 2004 3:52 am

reusability concepts

Post by PREMILA »

Hi All,
What is the reusability concepts could be followed in DataStage in such way that it goes accross the projects i do.
I want more clarifications on the reusability availability with datastage .

Regards,
Premila
memrinal
Participant
Posts: 74
Joined: Wed Nov 24, 2004 9:13 pm

Post by memrinal »

Hi,
for reusability, you can use
1. New Template from Job and
2. New Job from Template
from the File -> New
In addition to this you can create and use Shared Containers. Routines created using DataStage Manager can also be reused. Hope I have answered your question.
Mrinal Kumar

Even the IMPOSSIBLE says "I M Possible"
PREMILA
Participant
Posts: 4
Joined: Wed Oct 06, 2004 3:52 am

Post by PREMILA »

Hi,

Probably this reusability i should have mentioned in the management forum coz i required the reusability concept in such a way that i can use them accross all the project i do in any project based company.I should be able to tell the client that even before starting the project these are few resuability features that i could perform in this project.....i would have a predone procedure for this feature in such a way that where ever and whom ever be my client i can produce before him these features.Could any one help me out in this regard.

Regards,
Premila
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

Your question is not easily answered in a short form. You need to invite a DS-Professional for some days and discuss with him/her all facettes, than you'll have a certain idea how you could solve or let solve your problem

Wolfgang
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The entire architecture of DataStage is designed around the concept of re-usable components. Anything you create you can save and re-use elsewhere. If that somewhere else is a different project, there are several ways of getting the component into the new project. Tools range from copy/paste through export/import to Version Control promotion.

The secret is to plan carefully so that your components are as re-usable as possible; use job parameters, arguments and other mechanisms to make them as flexible as you can.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

There will be a collection of shared components between projects such as routines, table definitions, shared containers, job templates and some jobs. To get the full benefits of reusability you need to ensure everyone is aware of and using the common components, that means documenting them in a way that is brief and easy to access. You also need change control.

With shared components existing in multiple projects there will always be requirements or temptations for developers to change one, this throws them out of synch and introduces problems. You need to define a process for allowing changes to shared components and processes for migrating that change across other projects. MetaStage reporting or custom queries against the Reporting Assistant can help identify where the components are and whether they have been modified.

As Wolfgang points out this is something that is usually overlooked by startup DataStage projects and one of the things an expert can bring.
alvarez-m
Participant
Posts: 13
Joined: Tue Nov 16, 2004 3:12 pm

Post by alvarez-m »

Premila

I am realizing that is very useful to create stored procedures and call them from DataStage.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

If you don't mind maintaining an extra programming language. If you can do it in DataStage I would steer clear of stored procedures, they tend to be difficult to maintain, they can lack scalability, they don't report errors very well to DataStage administrators and they require a seperate source control and delivery mechanism.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I agree... you are defeating the whole purpose of using an ETL tool if you push the processing off to stored procedures. :?

Now, I've seen times when that was the only option, like two-phased commits or other situations that DataStage doesn't handle. Other than that, I personally see no reason to have stored procedures - rewrite them, turn them into DataStage jobs. For all of the reasons Vincent mentioned.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I consider methodologies and best practices as reusable. If you develop sequences the same way every time and with the same parameter names this helps speed up development and keep jobs consistent. Job templates can also standardize methods. Using standard jobs to gather ETL stats and process errors can also help.

Metadata information can help drive these methods. The source keys become hash file keys for lookups. The source keys in the target help build hash file crossref lookups to decide if it is an insert or an update. More than that is needed if you are SCD type 2. Row counts from manual processes or ProfileStage can help estimate target row counts. Metadata on column types and lengths plus these row counts can estimate disk space and some performance estimates.

Routines to format dates to Oracle style or some specific database are reusable. All routines are reusable in that sense.

All my free jobs posted on my tips page and the methods in DwNav are reusable. They can help document your projects in a standard way. EtlStats can help standardize the way you do performance tuning. The concept of reusability is a way to isolate sections of code and not worry about it. It also allows developers to work independently. DataStage does this nicely.

Most of the real usability is in the DataStage engine itself. All the methods to route data from lots of different databases to a separate database are reusable. This saves thousands of man hours building a data warehouse. There are thousands of man hours and code behind the DataStage engine.

I am not sure what you are looking for in an answer but this should give you several ideas to ask a more specific question. We can easily expand our answers in any of these more specific categories. Ascential has lots of tools to help build a data warehouse and maybe they do not fit in some textbook idea of reusable code but they are definitely taking development to the next level.
Mamu Kim
Post Reply