Dataset
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 40
- Joined: Tue Nov 11, 2008 5:49 am
Re: Dataset
you need to derive your dataset path,whatever path you have defiend the dataset will reside on that path.balaya.ds wrote:while loading dataset how many files are created internally?
and what is default path of dataset ...?
while laoding the dataset based on the nodes (you have defined in the configuration file) the reocrds will be loaded in to the dataset.
Knowledge is Fair,execution is matter!
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
At least one file per resource disk mentioned in the node pool that the Data Set stage is using from the configuration file. More than one file if the operating system limits file size (for example to 2GB). More than one file (potentially) if you append to the Data Set.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
My question is in the samelines of this post, so i am continuing in this post. please let me know if i have to start a new thread.ray.wurlod wrote:At least one file per resource disk mentioned in the node pool that the Data Set stage is using from the configuration file. More than one file if the operating system limits file size (for example t ...
Till now i have been thinking that when we create a dataset, the descriptor file will be stored in resource disk and the data file(s) will be stored in scratch disk space.
But as per this post , i understand that even the data files will be stored in resource disk. means the dataset has nothing to do with scratch disk?
i request you to kindly guide me in this regard.
Thanks,
Sudheer
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
The descriptor is stored in folder specified in dataset stage property (project directory if no folder specified) and datafiles are stored in Resource disk specified in configuration file. Hence, Scratch disk is never used for dataset storage unless resource and scratch disk are same in configuration file or may be for virtual datasets(not for datasets itself).
Scratch disk is used as buffer between/for processes and should get cleaned after job completion.
Scratch disk is used as buffer between/for processes and should get cleaned after job completion.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Genius may have its limitations, but stupidity is not thus handicapped.
-
- Premium Member
- Posts: 457
- Joined: Tue Sep 25, 2007 4:05 pm
Right on.priyadarshikunal wrote: The descriptor is stored in folder specified in dataset stage property (project directory if no folder specified) and datafiles are stored in Resource disk specified in configuration file.
The Resource disk is a permanent storage for the data set data file. However, as the discussion is going on I think the question I am about to post is relevant here. We are having our resource disk in a path and that is getting filled up pretty fast. I was going through that directory the other day and found that data files were sitting there from a couple of years ago. However, it is quite intermittent. The same data file is not present for all the days, but it is present on random dates (at least it seems random dates to me).
The question is, why are these still sitting there on all these dates. All of these data sets are set to be "Overwritten" in every run. So, why are these data files sitting there from times immemorial?
Vivek Gadwal
Experience is what you get when you didn't get what you wanted
Experience is what you get when you didn't get what you wanted
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
Many reasons, the first one I would suspect is that you changed the path for your descriptor file, or someone deleted the descriptor file. When it is set to overwrite, it reads the descriptor to delete the data from each of the node locations. If someone deleted your descriptor it wouldn't have this informtion. If you moved the path for your descriptor, it would be like a new data set and wouldn't overwrite anything.
Compare the date on the descriptors to the date on the data files in your resource locations and clean up the ones where the dates don't match.
Compare the date on the descriptors to the date on the data files in your resource locations and clean up the ones where the dates don't match.
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
-
- Premium Member
- Posts: 457
- Joined: Tue Sep 25, 2007 4:05 pm
Thanks for your response. I am not sure if somebody deleted that descriptor file as I am relatively new at this place. Ever since I got here though, none of that happened. Anyway, is it okay if I do a simple "rm" on those unnecessary data files? It would not have any other repercussions?kwwilliams wrote: Compare the date on the descriptors to the date on the data files in your resource locations and clean up the ones where the dates don't match.
Vivek Gadwal
Experience is what you get when you didn't get what you wanted
Experience is what you get when you didn't get what you wanted
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
You would need to try to ensure that the locations of your dataset descriptor does not match to the data file you are removing. You wouldn't want to delete data that someone is dependent upon. Most environments will have a handful of environmental vairables used to direct the location of the descriptor. If you're not sure ask someone who has been there for a while. If there as old as you say, then I would think that it would be safe to remove.
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
-
- Premium Member
- Posts: 457
- Joined: Tue Sep 25, 2007 4:05 pm
Righto!kwwilliams wrote:You would need to try to ensure that the locations of your dataset descriptor does not match to the data file you are removing. You wouldn't want to delete data that someone is dependent upon. Most environments will have a handful of environmental vairables used to direct the location of the descriptor. If you're not sure ask someone who has been there for a while. If there as old as you say, then I would think that it would be safe to remove.
Deleting the data file without right iformation may lead to disaster.
Why not "orchadmin delete" the descriptor file which you are sure about.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Premium Member
- Posts: 457
- Joined: Tue Sep 25, 2007 4:05 pm
Normally we would do that. However, there are these files sitting from 2 years ago. I see the same file (of course, the extended name - some Hex things appended to the data set name - is different) again for later dates (including the latest date).kumar_s wrote: Why not "orchadmin delete" the descriptor file which you are sure about.
Vivek Gadwal
Experience is what you get when you didn't get what you wanted
Experience is what you get when you didn't get what you wanted
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
The question being posed is why would he have older dates on dataset resource files than exist on the descriptor files. His dataset overwrite function is not working and he was seeking an answer to why. Orchadmin is not needed for this situation.vivekgadwal wrote:kumar_s wrote: Why not "orchadmin delete" the descriptor file which you are sure about.
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com