Deleting Datasets during Sequence run
Moderators: chulett, rschirm, roy
Deleting Datasets during Sequence run
I was wondering if anyone had ever had the issue of having to delete datasets at the end of a sequence run?
We created a large sequence for daily runs. At the end of the run we want part of the sequence to delete the datasets created during the sequence as we will just recrate them in the next days run and the old ones are of no value at the time of next run.
Any ideas on the best way to delete these datasets at the end of the sequence run?
We created a large sequence for daily runs. At the end of the run we want part of the sequence to delete the datasets created during the sequence as we will just recrate them in the next days run and the old ones are of no value at the time of next run.
Any ideas on the best way to delete these datasets at the end of the sequence run?
You only have to do that if You parameterise the names of the descriptor-files of your intermediary datasets to have different names in different job-runs, or indeed, if you have got shortage of disk-space and need to recover space to run other jobs after your sequence has finished. Otherwise you can just leave the write-mode of the datasets in their default setting "Overwrite" and the datasets will be replaced automatically the next time the job is running.
If you really have to delete the datasets, use the $APT_ORCHHOME/bin/orchadmin rm command from a command-activity in a sequence to delete datasets. DO NOT delete the descriptor-files of the datasets only.
If you really have to delete the datasets, use the $APT_ORCHHOME/bin/orchadmin rm command from a command-activity in a sequence to delete datasets. DO NOT delete the descriptor-files of the datasets only.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
There are the grateful those are happy." Francis Bacon
orchadmin -rm is the built in functionality of datastage that definitely helps in this.Jboyd wrote:So we had discussed orchadm rm as a solution but didn't if there was any built in funtionality of datastage that could help in this.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
There are the grateful those are happy." Francis Bacon
Some sites run on a GRID and have a "shared environment" (mine for example). We try to enforce a "Zero Data Footprint" rule where you extract your data, transform it, load it, and delete it off my DataStage box because someone else needs the space.
We also have security concerns about data at rest.
We have concerns of STALE DATA that might polute PROD.
There are many reasons why you WANT to delete the data. There are technical aspects that tell you that you don't have to. But the deletion of the data is not a technical issue. It's mostly politics and policy.
orchadmin rm (no dash in front of it) dataset_name
We also have security concerns about data at rest.
We have concerns of STALE DATA that might polute PROD.
There are many reasons why you WANT to delete the data. There are technical aspects that tell you that you don't have to. But the deletion of the data is not a technical issue. It's mostly politics and policy.
orchadmin rm (no dash in front of it) dataset_name