Page 1 of 1

Deleting Datasets during Sequence run

Posted: Wed Mar 07, 2012 8:25 am
by Jboyd
I was wondering if anyone had ever had the issue of having to delete datasets at the end of a sequence run?

We created a large sequence for daily runs. At the end of the run we want part of the sequence to delete the datasets created during the sequence as we will just recrate them in the next days run and the old ones are of no value at the time of next run.

Any ideas on the best way to delete these datasets at the end of the sequence run?

Posted: Wed Mar 07, 2012 8:39 am
by BI-RMA
You only have to do that if You parameterise the names of the descriptor-files of your intermediary datasets to have different names in different job-runs, or indeed, if you have got shortage of disk-space and need to recover space to run other jobs after your sequence has finished. Otherwise you can just leave the write-mode of the datasets in their default setting "Overwrite" and the datasets will be replaced automatically the next time the job is running.

If you really have to delete the datasets, use the $APT_ORCHHOME/bin/orchadmin rm command from a command-activity in a sequence to delete datasets. DO NOT delete the descriptor-files of the datasets only.

Posted: Wed Mar 07, 2012 9:05 am
by chulett
I agree, unless you are timestamping your dataset names and need to prune out the older ones, why bother? They get overwritten the next day, what do you buy by deleting them temporarily? :?

Posted: Wed Mar 07, 2012 9:18 am
by Jboyd
Our admins here would like them deleted at the end of the run for disk space issues is why the issue initially arose.

So we had discussed orchadm rm as a solution but didn't if there was any built in funtionality of datastage that could help in this.

Posted: Wed Mar 07, 2012 9:29 am
by BI-RMA
Jboyd wrote:So we had discussed orchadm rm as a solution but didn't if there was any built in funtionality of datastage that could help in this.
orchadmin -rm is the built in functionality of datastage that definitely helps in this.

Posted: Wed Mar 07, 2012 9:33 am
by Jboyd
Our build for this is still down the road just in design discussions as of now. I will let you know how it goes and if there are any issues.

Thanks

Posted: Wed Mar 07, 2012 10:18 am
by PaulVL
Some sites run on a GRID and have a "shared environment" (mine for example). We try to enforce a "Zero Data Footprint" rule where you extract your data, transform it, load it, and delete it off my DataStage box because someone else needs the space.

We also have security concerns about data at rest.
We have concerns of STALE DATA that might polute PROD.

There are many reasons why you WANT to delete the data. There are technical aspects that tell you that you don't have to. But the deletion of the data is not a technical issue. It's mostly politics and policy.


orchadmin rm (no dash in front of it) dataset_name

Posted: Wed Mar 07, 2012 12:27 pm
by Jboyd
If I were wanting to delete a group of datasets, could we prefix the group we want with a catch all phrase for the project and then orchadmin rm 'catch all phrase'* ?

Posted: Wed Mar 07, 2012 1:12 pm
by PaulVL
Treat it as any other command line call.

orchadmin rm *.ds
orchadmin rm /fullpath/*.ds (better)
orchadmin rm /fullpath/filename1.ds (best)
orchadmin rm /fullpath/filename2.ds (best)
orchadmin rm /fullpath/filename3.ds (best)