Data set descriptor location

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jxblack
Participant
Posts: 8
Joined: Tue Oct 19, 2004 3:30 pm
Location: Sydney

Data set descriptor location

Post by jxblack »

Hi there,

I have a question regarding the set-up of data sets on a project. A data set has a descriptor and one or more data files (the actual number depending on how many nodes/partitioning is specified).

Now these data set data files will be stored by the Parallel Engine on the resource disk e.g. /disk1/Ascential/DataStage/DataSets but the location of the data set descriptor is determined by the path name specified in the Data Set stage in each job.

Can I just confirm what the best practice is (if one exists) about where the data set descriptors should be located - should they be in the same area as the data files i.e in the resource disk directory, or should they be located in a completely separate directory independent of the configuration file area?

Many thanks,

James
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Completely separate, on a separate file system for preference. I usually create a subdirectory called ControlFiles in the project directory on the server.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jxblack
Participant
Posts: 8
Joined: Tue Oct 19, 2004 3:30 pm
Location: Sydney

Post by jxblack »

Thanks Ray.

What would be the reasons for this specifically?

Is it for ease of maintenance of these files, or as a general rule we shouldn't be writing directly to the resource/scratch areas as these are internal to DataStage?

The reason I'm asking is that the proposed directory and file system organisation at the site I'm working at is not differentiating between where the descriptor and the data files of the data sets should be located.
Alokby
Premium Member
Premium Member
Posts: 9
Joined: Wed Sep 15, 2004 7:27 am

Post by Alokby »

I do create a folder for datasets and create sub folders one for the descripter and one for the data
e.g.
dataset
-data
-desc
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

My main reason for keeping them on separate file systems is that if you lose one you don't lose the other, and may therefore be able to reconstruct at least the structure (maybe even restore from backups).

My reason for keeping the control files in a subdirectory in the project directory is mainly "keeping everything together", with a secondary reason that I can compare between, say, dev and test to verify that they're behaving similarly.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply