Px DataSet information

ArndW · Post by **ArndW** » Mon Jun 27, 2005 9:17 am

I would like to get information on my system's datasets into a data file; right now I am issuing a "orchadmin describe -l {mydsfile}" but I am having to parse the output myself. Is there some other method or existing procedure to collect all this information in a compressed form?

vmcburney · Post by **vmcburney** » Mon Jun 27, 2005 6:03 pm

I use an XML export style sheet to extract a list of input and output datasets for each job. This gives me a list of all datasets and the jobs using them.

There is an XLS stylesheet already in the DataStage client directory. I modified it to show just datasets.

If you do regular backups of your DataStage project then the export files will be readily available for a range of XML reporting.

ArndW · Post by **ArndW** » Tue Jun 28, 2005 1:31 am

Vincent,

thanks for the pointer, I'll have to look at the stylesheets. The problem we have here is from the other side - we have thousands of descriptors and datafiles and occasionally need to clean up. So I am writing some code which gets all descriptors in the system, links them to the actual data files in the temp directories and whatever datafiles are left over are considered "relics" (i.e. someone did a "rm" of the descriptor) and can be deleted. But so far the only way that I've found to get information about the descriptor (from a program) is by using the orchadmin command and then parsing the output; and I'm worried that this will change at the next version.

Come to think of it, it seems odd that this problem hasn't cropped up before.

leo_t_nice · Post by **leo_t_nice** » Wed Jun 29, 2005 11:10 am

We had a similar situation a few weeks ago. We almost ran out of disk-space so i wrote some code to search for 'orphaned' data files, and found that we had 142,000 data files with no descriptor... removing them released 60Gb!

Anyway, the method i used was to use 'orchadmin -f' rather than '-l' and then to use 'fgrep' to extract only the lines with references to datafiles. This was placed into a routine, which searched the system for descriptor files and extracted the 'real' filenames, writing them to a file. It was a simple job then to compare this list of files that 'should be' on the system against a list of files that 'actually were' on the system.

It did take all night to run, however

Hope this helps

ArndW · Post by **ArndW** » Wed Jun 29, 2005 11:35 am

Leo_t_nice,

I put together some BASIC code to gather all the -l information for a search of the whole system into a hash file, then searched the TMP folders and removed valid dataset data files. Found over 50Gb of junked storage today! But my method is kludgy and ungainly and I'm still hoping for a better way of getting the dataset details from the system.

This is a fast system, so it only took 30 minutes to process about 5000 datasets with the -l option.

bcarlson · Post by **bcarlson** » Fri Sep 30, 2005 3:15 pm

ArndW, we have a multi-node system (so the descriptor will be on one system and the data files will spread over 3 servers) and are running into similar issues with orphaned datasets. Does your program handle that? If so, is the program something you can/would be willing to share?

Brad.

ArndW · Post by **ArndW** » Sat Oct 01, 2005 11:11 am

At the moment this forum doesn't have have a "code" section, but PM me with your e-mail address and I'll fire off the program to you.