Clearing Resource directory?

whenry6000 · Post by **whenry6000** » Mon Jan 12, 2009 10:20 am

All, I am using Datastage 8, and when I look at the directory /opt/IBM/InformationServer/Server/Datasets, I can see data written to files there, even though in the Dataset stage in my jobs, I have chosen another location to physically write out the dataset.

Does Datastage always write the data to two locations? and how can I clean out this directory. Should it be doing this automatically? if so, where is this set?

Thanks!

Raghumreddy · Post by **Raghumreddy** » Mon Jan 12, 2009 11:13 am

With ArchAdmin command you can remove the datasets that were built in 7.5 and i am not sure about 8 and above
HTH
Raghu Mule

whenry6000 · Post by **whenry6000** » Mon Jan 12, 2009 11:47 am

Raghumreddy wrote:With ArchAdmin command you can remove the datasets that were built in 7.5 and i am not sure about 8 and above
HTH
Raghu Mule

Thanks for the response. So it seems that removal of the "temporary" data created in the Datasets directory doesn't happen automatically?? I am actually choosing a different location for the dataset, but it still writes to the Datasets directory (or whatever directory is declared in the APT_CONFIG file) as well as the location I've chosen in the Dataset stage. Is this normal behavior?

chulett · Post by **chulett** » Mon Jan 12, 2009 12:04 pm

What exactly is it writing to this second location? A complete copy of the dataset? Something else? Take a peek at the files and let us know what you are seeing there.

Mike · Post by **Mike** » Mon Jan 12, 2009 12:19 pm

Completely normal behavior. The control file is written to the location specified in the stage, and the data files are written to the locations specified in the config file.

Mike

whenry6000 · Post by **whenry6000** » Mon Jan 12, 2009 1:29 pm

Mike wrote:Completely normal behavior. The control file is written to the location specified in the stage, and the data files are written to the locations specified in the config file.

Mike

I'm not sure it's a control file. At the end of this post is a sample from my configuration file (32_Node.apt). The job that I am running has the destination for the dataset as a totally different location than the disk location. When I run the job, if I go to the resource disk location below, there is a file with a name as follows, in addition to a .ds file at the output location named in the Dataset stage:

PME_Fact_Act_Fin_Emp_Time_LU.datastag.mclndwetl-dev.0000.0029.0000.6059.cb3499a5.001d.5b5cf9f8

This file is 107 MB. As I have multiple nodes set up, these are up to 377 MB in size. It doesn't appear to be a temp file, as it remains after the job is finished. I can't open it with the Dataset Manager utility, and I can't do a tail as it appears to be some kind of binary file. So what is it and how do I manage them as I don't want them permanently. Do I have to manually remove these??

node "node0"
{
fastname "mclndwetl-dev"
pools ""
resource disk "/data/PME_DM_1/Datasets/ds0" {pools ""}
resource scratchdisk "/data/PME_DM_1/Scratch/scr0" {pools ""}
}

ray.wurlod · Post by **ray.wurlod** » Mon Jan 12, 2009 2:01 pm

You will need to check ALL your configuration files (particularly default.apt), and all those who might have used them. Are the data files recent?

whenry6000 · Post by **whenry6000** » Mon Jan 12, 2009 2:29 pm

ray.wurlod wrote:You will need to check ALL your configuration files (particularly default.apt), and all those who might have used them. Are the data files recent?

I have attached my default.apt file below. The times of these dataset files that are getting created correspond to the times one of the other developers on the team is running a job. He is using the configuration file I set up (32_Node.apt). I looked at his job and he is writing the actual dataset to the directory /opt/IBM/InformationServer/Server/Datasets (The File sub property under Target is set to /opt/IBM/InformationServer/Server/Datasets/PME_Fact_Entity_Fund_LU). If I look at that directory, there is a file with that name that has 29243 bytes.
If I go to the /data/PME_DM_1/Datasets/ directories, there are files with the same timestamp with names like PME_Fact_Ent_Fund_LU.datastag.mclndwetl-dev.0000.0000.0000.3060.cb34e2b0.0000.41acfe1f that has 1179648 bytes in it (there is one in each of the resource disk directories for each node, some of which are larger and some are smaller).

I'm just trying to understand 1) what are they (are they datasets, some kind of temp file or maybe get created if the original directory runs out of space); 2) how these are getting created; and 3) what I need to do to clean them up, since I'm not explicitly creating them as far as I know.

{
node "node1"
{
fastname "mclndwetl-dev"
pools ""
resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""}
resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""}
}
}

whenry6000 · Post by **whenry6000** » Mon Jan 12, 2009 2:41 pm

ray.wurlod wrote:You will need to check ALL your configuration files (particularly default.apt), and all those who might have used them. Are the data files recent?

Something I found on the Internet that might explain why we're having trouble with datasets:

'When you create a data set or file set, you specify where the controlling file is called and where it is stored, but the controlling file points to other files that store the data. These files are written to the directory that is specified by the resource disk field.8.resource scratchdisk Specifies the name of a directory where intermediate, temporary data is stored'

It seems like the file name we give in the Dataset stage in the job is really just a pointer to a dataset that is physically created in the resource directory given in the configuration file. Is this the case??

jamach · Post by **jamach** » Mon Jan 12, 2009 2:50 pm

whenry6000 wrote:
ray.wurlod wrote:You will need to check ALL your configuration files (particularly default.apt), and all those who might have used them. Are the data files recent?
Something I found on the Internet that might explain why we're having trouble with datasets:

'When you create a data set or file set, you specify where the controlling file is called and where it is stored, but the controlling file points to other files that store the data. These files are written to the directory that is specified by the resource disk field.8.resource scratchdisk Specifies the name of a directory where intermediate, temporary data is stored'

It seems like the file name we give in the Dataset stage in the job is really just a pointer to a dataset that is physically created in the resource directory given in the configuration file. Is this the case??

That is correct. The .ds file is just a pointer to the files actually containing data. The dataset will consist of multiple physical files, typically one per computing node. In a cluster or grid configuration, those physical files will be distributed across the individual machines in the cluster or grid.

ray.wurlod · Post by **ray.wurlod** » Mon Jan 12, 2009 4:22 pm

Usually any file name containing LU in a directory named as resource disk in a configuration file is created by creating a Lookup File Set. The next component of the file name is either the project or the job from which it was created (I can't recall which). Lookup File Sets have control files with a suffix of ".fs" if your developers are following recommended conventions.

If resource disk becomes full a fatal error is generated - no different file system is used.

Cleaning up of File Sets and Lookup File Sets involves examining the control file, deleting every file mentioned therein, then deleting the control file. A "tips and tricks" covering exactly this technique is on the DSXchange Learning Center Tips and Tricks page.

DSXchange

Clearing Resource directory?

Clearing Resource directory?

Re: Clearing Resource directory?

Re: Clearing Resource directory?