Clearing Resource directory?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
whenry6000
Premium Member
Premium Member
Posts: 129
Joined: Thu Mar 02, 2006 8:28 am

Clearing Resource directory?

Post by whenry6000 »

All, I am using Datastage 8, and when I look at the directory /opt/IBM/InformationServer/Server/Datasets, I can see data written to files there, even though in the Dataset stage in my jobs, I have chosen another location to physically write out the dataset.

Does Datastage always write the data to two locations? and how can I clean out this directory. Should it be doing this automatically? if so, where is this set?

Thanks!
Raghumreddy
Participant
Posts: 24
Joined: Fri Aug 26, 2005 3:52 pm
Contact:

Re: Clearing Resource directory?

Post by Raghumreddy »

With ArchAdmin command you can remove the datasets that were built in 7.5 and i am not sure about 8 and above
HTH
Raghu Mule
whenry6000
Premium Member
Premium Member
Posts: 129
Joined: Thu Mar 02, 2006 8:28 am

Re: Clearing Resource directory?

Post by whenry6000 »

Raghumreddy wrote:With ArchAdmin command you can remove the datasets that were built in 7.5 and i am not sure about 8 and above
HTH
Raghu Mule
Thanks for the response. So it seems that removal of the "temporary" data created in the Datasets directory doesn't happen automatically?? I am actually choosing a different location for the dataset, but it still writes to the Datasets directory (or whatever directory is declared in the APT_CONFIG file) as well as the location I've chosen in the Dataset stage. Is this normal behavior?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What exactly is it writing to this second location? A complete copy of the dataset? Something else? Take a peek at the files and let us know what you are seeing there.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Completely normal behavior. The control file is written to the location specified in the stage, and the data files are written to the locations specified in the config file.

Mike
whenry6000
Premium Member
Premium Member
Posts: 129
Joined: Thu Mar 02, 2006 8:28 am

Post by whenry6000 »

Mike wrote:Completely normal behavior. The control file is written to the location specified in the stage, and the data files are written to the locations specified in the config file.

Mike
I'm not sure it's a control file. At the end of this post is a sample from my configuration file (32_Node.apt). The job that I am running has the destination for the dataset as a totally different location than the disk location. When I run the job, if I go to the resource disk location below, there is a file with a name as follows, in addition to a .ds file at the output location named in the Dataset stage:

PME_Fact_Act_Fin_Emp_Time_LU.datastag.mclndwetl-dev.0000.0029.0000.6059.cb3499a5.001d.5b5cf9f8

This file is 107 MB. As I have multiple nodes set up, these are up to 377 MB in size. It doesn't appear to be a temp file, as it remains after the job is finished. I can't open it with the Dataset Manager utility, and I can't do a tail as it appears to be some kind of binary file. So what is it and how do I manage them as I don't want them permanently. Do I have to manually remove these??

node "node0"
{
fastname "mclndwetl-dev"
pools ""
resource disk "/data/PME_DM_1/Datasets/ds0" {pools ""}
resource scratchdisk "/data/PME_DM_1/Scratch/scr0" {pools ""}
}
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You will need to check ALL your configuration files (particularly default.apt), and all those who might have used them. Are the data files recent?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
whenry6000
Premium Member
Premium Member
Posts: 129
Joined: Thu Mar 02, 2006 8:28 am

Post by whenry6000 »

ray.wurlod wrote:You will need to check ALL your configuration files (particularly default.apt), and all those who might have used them. Are the data files recent?
I have attached my default.apt file below. The times of these dataset files that are getting created correspond to the times one of the other developers on the team is running a job. He is using the configuration file I set up (32_Node.apt). I looked at his job and he is writing the actual dataset to the directory /opt/IBM/InformationServer/Server/Datasets (The File sub property under Target is set to /opt/IBM/InformationServer/Server/Datasets/PME_Fact_Entity_Fund_LU). If I look at that directory, there is a file with that name that has 29243 bytes.
If I go to the /data/PME_DM_1/Datasets/ directories, there are files with the same timestamp with names like PME_Fact_Ent_Fund_LU.datastag.mclndwetl-dev.0000.0000.0000.3060.cb34e2b0.0000.41acfe1f that has 1179648 bytes in it (there is one in each of the resource disk directories for each node, some of which are larger and some are smaller).

I'm just trying to understand 1) what are they (are they datasets, some kind of temp file or maybe get created if the original directory runs out of space); 2) how these are getting created; and 3) what I need to do to clean them up, since I'm not explicitly creating them as far as I know.


{
node "node1"
{
fastname "mclndwetl-dev"
pools ""
resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""}
resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""}
}
}
whenry6000
Premium Member
Premium Member
Posts: 129
Joined: Thu Mar 02, 2006 8:28 am

Post by whenry6000 »

ray.wurlod wrote:You will need to check ALL your configuration files (particularly default.apt), and all those who might have used them. Are the data files recent?
Something I found on the Internet that might explain why we're having trouble with datasets:

'When you create a data set or file set, you specify where the controlling file is called and where it is stored, but the controlling file points to other files that store the data. These files are written to the directory that is specified by the resource disk field.8.resource scratchdisk Specifies the name of a directory where intermediate, temporary data is stored'

It seems like the file name we give in the Dataset stage in the job is really just a pointer to a dataset that is physically created in the resource directory given in the configuration file. Is this the case??
jamach
Charter Member
Charter Member
Posts: 6
Joined: Mon Jul 03, 2006 1:38 pm
Location: Texas

Post by jamach »

whenry6000 wrote:
ray.wurlod wrote:You will need to check ALL your configuration files (particularly default.apt), and all those who might have used them. Are the data files recent?
Something I found on the Internet that might explain why we're having trouble with datasets:

'When you create a data set or file set, you specify where the controlling file is called and where it is stored, but the controlling file points to other files that store the data. These files are written to the directory that is specified by the resource disk field.8.resource scratchdisk Specifies the name of a directory where intermediate, temporary data is stored'

It seems like the file name we give in the Dataset stage in the job is really just a pointer to a dataset that is physically created in the resource directory given in the configuration file. Is this the case??
That is correct. The .ds file is just a pointer to the files actually containing data. The dataset will consist of multiple physical files, typically one per computing node. In a cluster or grid configuration, those physical files will be distributed across the individual machines in the cluster or grid.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Usually any file name containing LU in a directory named as resource disk in a configuration file is created by creating a Lookup File Set. The next component of the file name is either the project or the job from which it was created (I can't recall which). Lookup File Sets have control files with a suffix of ".fs" if your developers are following recommended conventions.

If resource disk becomes full a fatal error is generated - no different file system is used.

Cleaning up of File Sets and Lookup File Sets involves examining the control file, deleting every file mentioned therein, then deleting the control file. A "tips and tricks" covering exactly this technique is on the DSXchange Learning Center Tips and Tricks page.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply