Hi there,
I have a question regarding the set-up of data sets on a project. A data set has a descriptor and one or more data files (the actual number depending on how many nodes/partitioning is specified).
Now these data set data files will be stored by the Parallel Engine on the resource disk e.g. /disk1/Ascential/DataStage/DataSets but the location of the data set descriptor is determined by the path name specified in the Data Set stage in each job.
Can I just confirm what the best practice is (if one exists) about where the data set descriptors should be located - should they be in the same area as the data files i.e in the resource disk directory, or should they be located in a completely separate directory independent of the configuration file area?
Many thanks,
James
Data set descriptor location
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Thanks Ray.
What would be the reasons for this specifically?
Is it for ease of maintenance of these files, or as a general rule we shouldn't be writing directly to the resource/scratch areas as these are internal to DataStage?
The reason I'm asking is that the proposed directory and file system organisation at the site I'm working at is not differentiating between where the descriptor and the data files of the data sets should be located.
What would be the reasons for this specifically?
Is it for ease of maintenance of these files, or as a general rule we shouldn't be writing directly to the resource/scratch areas as these are internal to DataStage?
The reason I'm asking is that the proposed directory and file system organisation at the site I'm working at is not differentiating between where the descriptor and the data files of the data sets should be located.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
My main reason for keeping them on separate file systems is that if you lose one you don't lose the other, and may therefore be able to reconstruct at least the structure (maybe even restore from backups).
My reason for keeping the control files in a subdirectory in the project directory is mainly "keeping everything together", with a secondary reason that I can compare between, say, dev and test to verify that they're behaving similarly.
My reason for keeping the control files in a subdirectory in the project directory is mainly "keeping everything together", with a secondary reason that I can compare between, say, dev and test to verify that they're behaving similarly.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.