DataStage, Universe and TCL

gpbarsky · Post by **gpbarsky** » Tue Sep 02, 2003 3:07 pm

[:)][:)][:)][:)][:)]

Hi my friends.

I am working with DataStage since a few months ago. I don't know anything about Universe, and the structure of the tables supporting DS.

What I would like is to have the complete structure of the Universe database supporting DataStage, and a manual (or a document) for the TCL commands.

Do you know where are these things ?

Thanks in advance.

[:)][:)][:)][:)][:)]

Guillermo P. Barsky
Buenos Aires - Argentina

kcbland · Post by **kcbland** » Tue Sep 02, 2003 4:52 pm

Visit IBM's website for Universe related documentation, as IBM owns Universe. DataStage's underlying engine is a code branch of Universe from about 2 years ago.

As far as the underlying data model and business rules, when you get one, could you please share it with the rest of us?

Kenneth Bland

vmcburney · Post by **vmcburney** » Tue Sep 02, 2003 6:09 pm

I find the IBM web pages very difficult to navigate so when I want a Universe document I look through the tools4datastage archive where its been posted a couple times:
http://www-3.ibm.com/software/data/u2/p ... ry/96univ/

Lots of pdfs here including Universe Basic, the SQL reference and the Administrator Guide which includes instructions for using hash file analysis commands.

The structure DataStage repository database is not public domain information and is quite complex. If you need to retrieve information for reporting or documentation you can use the Reporting Tool which transfers information about projects into an Access database with a simpler table structure. You can then run your own queries against it.

Vincent McBurney
Data Integration Services
www.intramatix.com

kduke · Post by **kduke** » Tue Sep 02, 2003 6:29 pm

Guillermo

Ken is right they do not want you to play around at the Universe level. It is now called the DataStage engine and not Universe. There is no documentation at the engine level. They have an API to deal with what Ascential wants you to deal with like start and stop jobs. They do not give you internals besides it changes a lot of one release to the next. Most of what is in there is in the doc_tool.mdb which is updated by the Reporting Assistant in the DataStage Manager. The interface to it is terrible that is why I wrote DwNav.

I know another plug but I built these tools because there are holes in DataStage which should be fixed. The metadata for the ETL should be easier to browse. It should automatically generate source to target mapping. No way. It will never do it. DataStage is way too complicated and way too flexible. The relationship between a user defined query and real metadata for that query is seriously complex. You need a SQL parser for every database. Good luck. This information is critical to analyzing ETL.

Lets say we want to treat ETL just like we treat any data warehouse. We want to data mine it. We want to treat ETL like a black box and ignore all the steps from start to finish and see what column in the data warehouse came from or is dependant on what source columns in what source tables. How many people have that kind of documentation? You should. Some day you may be forced to do it. Ab Initio or Informatica may be able to do it because their ETL rules are not as complex as DataStage. The complexity of the ETL is also how difficult it is to analyze it. Why not have an ETL star schema built in and your favorite drill down tool like Business Objects or Cognos or whatever to drill down into your ETL. What is your fact table? Maybe business rules. Maybe derivations. What about constraints? Are they dimensions or facts? What are your dimensions? Source tables, target tables, staging files are all dimensions. What else?

What if you built the same data warehouse in all the major ETL tools. How would you decide which tool was the best? Fastest? Least amount of steps? Easiest to understand? Cheapest? Most cost effective? What about other tools like QualityStage and MetaStage. What is the best tool? What tool makes my company the most money as a user? That is the real answer. You are in business to make money right?

To answer your original question about Universe. Jonathan Sisk makes the best books on Pick, Prime, Universe and Unidata. These are all very similar databases based on the original Pick database. Orielly, the Nutshell people, also made some good books on this topic.

There are very few TCL commands you need to know. The SDK has almost everything you will ever need. By the way you can download Universe for 2 users for free from IBM. It is hard to find on the website but Linux and Windows versions are free for development. Good documentation too is on the website.

That is my three cents. Sorry. Need to stir things up now and then. Getting bored.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

kduke · Post by **kduke** » Tue Sep 02, 2003 6:30 pm

Sorry Vincent is faster on the trigger than I am.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

kcbland · Post by **kcbland** » Tue Sep 02, 2003 9:07 pm

It's okay Kim, we all need to keep the pot stirring for those guys over at Ascential. It's unbelievable that the underlying storage structures for job designs are:

1. Not relational
2. Not open
3. Not browseable

Unless you reverse engineer the existing storage structures to be:

1. Relational
2. Open
3. Browseable

you will be faced with using MetaStage or the Reporting Assistant in order to do every day work such as:

1. Where is this hash file used? (by physical hash file name, not those stupid definitions, which DON'T WORK AND BREAK THEIR CONNECTIONS ALL THE TIME!)
2. Where is this sequential file used?
3. What's the SQL used in these jobs? Because the DBA's are asking and I don't have all day to cut and paste, or I have 800+ jobs and the Reporting Assistant has decided that's waaaay to much information.
4. What jobs are touching this table? SQL Parsing, bwaa haa haa haa, see #3 just mentioned.

And I could go on and on and on.

DataStage has connectivity pretty much to every database, so why can't it store it's job design repository externally? Sure, the job designs are all PICK flavored multivalue and it would be a lot of fun to normalize that information. So here they took Orchestrate and GUI-ized it, too bad they didn't move the Server jobs and the Orchestrate jobs under a single unified open, relational, and browseable storage model. Now QualityStage and ProfileStage are all piled in. Wonderful.

Kenneth Bland

vmcburney · Post by **vmcburney** » Tue Sep 02, 2003 9:26 pm

If DataStage moved away from Universe into supporting multiple types of database repositories would that mean changing the scripting language as well? From what I understand the routines and jobs need Universe installed to compile and run Universe BASIC code.

Vincent McBurney
Data Integration Services
www.intramatix.com

kduke · Post by **kduke** » Tue Sep 02, 2003 11:08 pm

Ken

That is my point on all these tools. They are all very closed systems. Business Objects import and export are terrible. They use relational. MicroStrategy is the same issue. Oracle system tables are wide open. I like a lot of people benefit from knowing how their system tables work. Not only that but they are clean.

How metadata is stored in the future is going to change. As people understand metadata then they will want more access under the covers. All these products need to be a lot more open. Good developers use this information in powerful ways. They can snapshot it. They can compare those snapshots to older versions and automate change control. How metadata is stored is important.

XML may be the answer but browsing XML data is not exactly easy. XML export is as ugly as the dsx export.

All these are textual representations of ETL or process designs. SQL is a textual represtation of programs that batch change data. I think most people understand the standardization of SQL and its importantance. Metadata is heading down the same path. You wait. The repositories are all going to standardize whenever possible.

ETL tools will be interchangable just like UNIX systems and databases. Vendors already have translation tools. It should not be that hard to build XML process to automate imports from other ETL tools to DataStage or even the other way around. I know some of these already exist. As soon as the customers figure this out that the tools are more similar than different then we will be exposed to other ETL tools just like we all use many databases.

That is my 4 cents worth. Shooting for a dime.

Hey Dennis where is your metadata expert?

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

ray.wurlod · Post by **ray.wurlod** » Wed Sep 03, 2003 1:17 am

I sometimes get asked this question when running DataStage training.
Open the doc_tool.mdb with MS Access (bypassing the startup macro, for example by holding the Shift key when opening it), and get it to analyze the relationships between entities in a normalized version of the repository database.
There are usually no more questions after this. [:D]

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

Teej · Post by **Teej** » Wed Sep 03, 2003 9:48 am

quote:Originally posted by kcbland
[br]you will be faced with using MetaStage or the Reporting Assistant in order to do every day work such as:

Hey Ken --

Just in case you are not aware, Reporting Assistant is a deprecated feature, and is supposingly removed by Twister's release. Ascential has been advising us to move to MetaStage.

Of course, we haven't yet set up MetaStage, so... Not a pretty sight at our place.

-T.J.

* * *

... now if this can make breakfast, my life is complete.

ariear · Post by **ariear** » Wed Sep 03, 2003 1:18 pm

Ray are you serious or kidding [:D]
Is there a way to do a server side 'UPDATE NOW' function ?
Can you imagine how long takes an 'UPDATE NOW' for big projects to a remote ORACLE from a Client ?
Let's say that the Universe infra for DataStage is the most suitable -
When will they give a decent server side backup/restore facility like any other RDBMS has ?
There's more of the same but still....
DataStage is the best ETL out there[:D]

vinod_sri · Post by **vinod_sri** » Fri Mar 23, 2012 9:01 am

Here Vincent mentioned that to retrieve information for reporting or documentation purposes, one can use a reporting tool which transfers information about DataStage projects into access database with a simpler table structure.

vmcburney wrote:The structure DataStage repository database is not public domain information and is quite complex. If you need to retrieve information for reporting or documentation you can use the Reporting Tool which transfers information about projects into an Access database with a simpler table structure. You can then run your own queries against it.

Here my question is:

-- Which reporting tool he is referring to? Need little bit more information about it?

-- And what is process to export the DataStage metadata from universe database to MS Access database?

qt_ky · Post by **qt_ky** » Fri Mar 23, 2012 9:23 am

Vinod, you replied to a message that's over 8 years old. It's a better practice in this case to start a new topic and include a link back to this topic.

eostic · Post by **eostic** » Fri Mar 23, 2012 9:36 am

Indeed...and the reporting tooling mentioned is long since gone.

Start a new thread, outline your objectives....there are lots of ways to get at various bits of metadata......depends on what you need.

Ernie

vinod_sri · Post by **vinod_sri** » Fri Mar 23, 2012 9:47 am

I just started the new thread with the subject line "DataStage, Universe & TCL " and the link is as follows:

viewtopic.php?t=145112