Query on DataStage Repository

Umbix62 · Post by **Umbix62** » Wed Mar 22, 2006 3:35 am

Hi

I need to query the DataStage repository to know the Jobs using an Oracle8i Stage with a table name "pippo", for example, and to display the type of operation on that table.

I can't access directly the repository. May I do a Job, Server or Batch, reading it for extrat the information I need? If yes, how can I do that?

Thank you

Umberto

ArndW · Post by **ArndW** » Wed Mar 22, 2006 3:52 am

If you used the metadata for the table "pippo" in your job design, you can go to the Manager and right-click on the table definition and do a usage analysis to get the list of jobs that have used that object.

Umbix62 · Post by **Umbix62** » Wed Mar 22, 2006 3:57 am

ArndW wrote:If you used the metadata for the table "pippo" in your job design, you can go to the Manager and right-click on the table definition and do a usage analysis to get the list of jobs that have used that ...

No, the customer don't use metadata. So the only way to do that is querying the repository via a Job batch

ray.wurlod · Post by **ray.wurlod** » Wed Mar 22, 2006 3:52 pm

Your customer is out of luck because of that choice. Details of the Repository are not in the public domain. You may get some ideas from searching here, but there's nothing official.

DataStage is metadata-driven. You really must have good metadata management practices.

Chuck Smith will post here about his "find table name" utility. That's one possibility. You could also export the project and scan the export file. Kim Duke's utilities may also be useful.

kduke · Post by **kduke** » Wed Mar 22, 2006 7:29 pm

I included this in EtlStats. It is called DsSearch1.sql.

Code: Select all

SELECT 
   DS_JOBS.NAME AS JOB_NAME, 
   DS_JOBS.CATEGORY, 
   DS_JOBOBJECTS.NAME AS OBJECT_NAME, 
   DS_JOBOBJECTS.OLETYPE, 
   EVAL DS_JOBOBJECTS."if index(upcase(@RECORD),'PIPPO',1) > 0 then 'FOUND' else ''" AS FOUND FMT '5L'
FROM 
   DS_JOBS, 
   DS_JOBOBJECTS 
WHERE 
   DS_JOBS.JOBNO = DS_JOBOBJECTS.OBJIDNO 
   and FOUND = 'FOUND'
GROUP BY
   JOB_NAME, 
   DS_JOBS.CATEGORY, 
   OBJECT_NAME, 
   DS_JOBOBJECTS.OLETYPE, 
   FOUND
;

Umbix62 · Post by **Umbix62** » Thu Mar 23, 2006 3:26 am

You are an angel. Thank you very much. It is exactly what I was looking for.

Umbix62 · Post by **Umbix62** » Thu Mar 23, 2006 4:05 am

I'am sorry but I have another question for you. How can I embedded this query in a Universe Stage? How can I connect to the Repository by a DataStage Job Server, if it' is possible? Is there the possibility to connect by an ODBC driver? If yes how can I install it?

Umberto

ArndW · Post by **ArndW** » Thu Mar 23, 2006 5:20 am

Umberto, you can connect using ODBC "localuv"

kduke · Post by **kduke** » Thu Mar 23, 2006 7:58 am

Arnd is correct localuv. You can also start with UV stage. UV stage will let you run SQL against hashed files that you create or part of the repository. You have to trick DataStage in order to import the table definitions. If you import UV table definitions then the repository ones are left out. You have to create q-pointers in order to import these or create them manaully by doing:

LIST DICT DS_JOBS

Q-pointers have been discussed before so search for those answers.

Umbix62 · Post by **Umbix62** » Thu Mar 23, 2006 10:27 am

Hi, thank you for your answering. I have followed Arnd's advice and it works well. But I haven't understood what do you say. I only want to know if I can read from the Repository some information such as "the job name calling a qualified job", for example a Job batch executing a Job Server by a DataStage Basic Script, or if a Passive Stage is used in input or in output. Could you help me? Cuold you tell me about the existence of some documentation about those arguments?

Umberto

kduke · Post by **kduke** » Thu Mar 23, 2006 11:38 am

There are no documents about the repository tables. All the SQL I post that runs at TCL against repository tables can be used in a UV stage as a source and written to any table or file. Doing any of this in BASIC either in a routine or job control code takes a lot more knowledge of Universe.

The hardest part of using these repository SQL statements is creating the metadata. You cannot import metadata directly from DS_JOBS, DS_JOBOBJECTS or any repository table. You have to create it manually or import it indirectly using q-pointers or some other Universe concept which requires a lot more knowledge of Universe.

Most of us want and need some of this information like the relationship between the tables used in a link. Automating this relationship is very difficult especially if you did not create the original job. Even more complicated if everything is user defined SQL or using views.

Your specific question was "the job name calling a qualified job". The job name that executes another job is put in I think the second record of the log. This hashed file is called RT_LOGnnn where nnn is the job number. There are lots of posts on how to read these log records. This line has the word "control" in it. The batch or sequence usually has a tab in job properties to show all jobs and routines dependent or "ran" by this sequence or batch. This is automatically created as you build the sequence or batch but can be out of sync or missing entries. Therefore it is not always accurate. I count on it being accurate when I get row counts for a sequence in EtlStats. If you ask for row counts for a sequence by running job DSJobReportDbDriver then it has BASIC code to figure out all the jobs called by that sequence and will get row counts for every one them. I have a new version called DSJobReportDbDriverNoWait which does the samething only in the background so this sequence can finish quicker and not to slow down the ETL any more than necessary. This job and many of the jobs in EtlStats get their information by reading the repository directly in BASIC or in a UV stage. There are probably 25 jobs or more which are examples of how to do this.

EtlStats is free. It is on my tips page. Most of the SQL I post can be run y these jobs and email you the results as a html web page. The UNIX version will email them as attachments. As soon as I post the new DOS version it will use blat.exe to do the samething, email as attachments. The attachments look great. They can have your company logo on them. Very professional looking. I have been using this code and these jobs for years now. It is very stable but a pain to install.

There are 40 SQL scripts used in EtlStats. They are in a folder called SqlScripts. Most are against the repository. They can be easily run by the job named GenHtmlFromSqlDriver which runs a job called GenHtmlFromSql. The jobs the end in Driver are batch jobs or jobs with no stages only job control code. They read directly from the repository. The html result pages are stored in Sql2Html folder below the project. You can do anything you want with these. GenHtmlFromSql will turn almost any SQL into a web page. It is a great documentation tool.

DwNav does almost the exact thing. It has a form to input SQL and a toolbar icon and menu option to run the SQL and populate a grid. There is also a button to save the results as a html page. This is designed to meet a special documentation requirement. As DwNav drills down into ETL designs and displays either the table names or column names you have been looking for then anything you displayed either in the tree or the grid can be tweaked by the user. DwNav will display the SQL used to create the last level of the tree or the grid. It is a traditional explorer interface with a tree control on the left and a grid on the right. I wanted this kind of flexibility so you could extend the tool beyond what I might of thought of to meet some real specific requirement for a report. You can add levels to the tree or add your SQL so it shows up in the grid because the popup menus can be changed by the end user.

ParameterNavigator is the same program with a different tree design and popup menus. It has stand-alone VB code which reads and writes directly to the repository. So it can update job parameters in jobs. So it updates DS_JOBOBJECTS directly which I do not recommend unless you have a very good understanding of Universe.

Nothing in EtlStats updates repository tables directly. It is very safe.

All of these are little glimpses of how to organize your ETL better. Either document it better or report on it better. How can you know if you have an issue if you don't gather row counts or have the bare minimum of tools not counting what the vendor gives you. If you just use what the vendor gives you then at best you can only be an average developer. If you add your thoughts and ideas on top of a good toolset then you are adding value to process. You get better. Your team gets better. You deliver better solutions which are more easily understood.

DataStage is a Rapid Application Development tool. It can rapidily create a mess if there is not good metadata about your ETL. These are different ways to use metadata.

Do me a favor. Write one job which extracts DS_JOBS into a table. Write a report of all the sequences in your project. Now sort it by job name. Sort it by category next. How long did it take you? Post your answer to the last question.