Px DataSet information

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Px DataSet information

Post by ArndW »

I would like to get information on my system's datasets into a data file; right now I am issuing a "orchadmin describe -l {mydsfile}" but I am having to parse the output myself. Is there some other method or existing procedure to collect all this information in a compressed form?
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I use an XML export style sheet to extract a list of input and output datasets for each job. This gives me a list of all datasets and the jobs using them.

There is an XLS stylesheet already in the DataStage client directory. I modified it to show just datasets.

If you do regular backups of your DataStage project then the export files will be readily available for a range of XML reporting.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Vincent,

thanks for the pointer, I'll have to look at the stylesheets. The problem we have here is from the other side - we have thousands of descriptors and datafiles and occasionally need to clean up. So I am writing some code which gets all descriptors in the system, links them to the actual data files in the temp directories and whatever datafiles are left over are considered "relics" (i.e. someone did a "rm" of the descriptor) and can be deleted. But so far the only way that I've found to get information about the descriptor (from a program) is by using the orchadmin command and then parsing the output; and I'm worried that this will change at the next version.

Come to think of it, it seems odd that this problem hasn't cropped up before.
leo_t_nice
Participant
Posts: 25
Joined: Thu Oct 02, 2003 8:57 am

Post by leo_t_nice »

We had a similar situation a few weeks ago. We almost ran out of disk-space so i wrote some code to search for 'orphaned' data files, and found that we had 142,000 data files with no descriptor... removing them released 60Gb!

Anyway, the method i used was to use 'orchadmin -f' rather than '-l' and then to use 'fgrep' to extract only the lines with references to datafiles. This was placed into a routine, which searched the system for descriptor files and extracted the 'real' filenames, writing them to a file. It was a simple job then to compare this list of files that 'should be' on the system against a list of files that 'actually were' on the system.

It did take all night to run, however :?

Hope this helps
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Leo_t_nice,

I put together some BASIC code to gather all the -l information for a search of the whole system into a hash file, then searched the TMP folders and removed valid dataset data files. Found over 50Gb of junked storage today! But my method is kludgy and ungainly and I'm still hoping for a better way of getting the dataset details from the system.

This is a fast system, so it only took 30 minutes to process about 5000 datasets with the -l option.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

ArndW, we have a multi-node system (so the descriptor will be on one system and the data files will spread over 3 servers) and are running into similar issues with orphaned datasets. Does your program handle that? If so, is the program something you can/would be willing to share?

Brad.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

At the moment this forum doesn't have have a "code" section, but PM me with your e-mail address and I'll fire off the program to you.
Post Reply