Deleting Datasets during Sequence run

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Jboyd
Participant
Posts: 15
Joined: Mon Mar 14, 2011 12:55 pm

Deleting Datasets during Sequence run

Post by Jboyd »

I was wondering if anyone had ever had the issue of having to delete datasets at the end of a sequence run?

We created a large sequence for daily runs. At the end of the run we want part of the sequence to delete the datasets created during the sequence as we will just recrate them in the next days run and the old ones are of no value at the time of next run.

Any ideas on the best way to delete these datasets at the end of the sequence run?
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

You only have to do that if You parameterise the names of the descriptor-files of your intermediary datasets to have different names in different job-runs, or indeed, if you have got shortage of disk-space and need to recover space to run other jobs after your sequence has finished. Otherwise you can just leave the write-mode of the datasets in their default setting "Overwrite" and the datasets will be replaced automatically the next time the job is running.

If you really have to delete the datasets, use the $APT_ORCHHOME/bin/orchadmin rm command from a command-activity in a sequence to delete datasets. DO NOT delete the descriptor-files of the datasets only.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I agree, unless you are timestamping your dataset names and need to prune out the older ones, why bother? They get overwritten the next day, what do you buy by deleting them temporarily? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Jboyd
Participant
Posts: 15
Joined: Mon Mar 14, 2011 12:55 pm

Post by Jboyd »

Our admins here would like them deleted at the end of the run for disk space issues is why the issue initially arose.

So we had discussed orchadm rm as a solution but didn't if there was any built in funtionality of datastage that could help in this.
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Jboyd wrote:So we had discussed orchadm rm as a solution but didn't if there was any built in funtionality of datastage that could help in this.
orchadmin -rm is the built in functionality of datastage that definitely helps in this.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
Jboyd
Participant
Posts: 15
Joined: Mon Mar 14, 2011 12:55 pm

Post by Jboyd »

Our build for this is still down the road just in design discussions as of now. I will let you know how it goes and if there are any issues.

Thanks
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Some sites run on a GRID and have a "shared environment" (mine for example). We try to enforce a "Zero Data Footprint" rule where you extract your data, transform it, load it, and delete it off my DataStage box because someone else needs the space.

We also have security concerns about data at rest.
We have concerns of STALE DATA that might polute PROD.

There are many reasons why you WANT to delete the data. There are technical aspects that tell you that you don't have to. But the deletion of the data is not a technical issue. It's mostly politics and policy.


orchadmin rm (no dash in front of it) dataset_name
Jboyd
Participant
Posts: 15
Joined: Mon Mar 14, 2011 12:55 pm

Post by Jboyd »

If I were wanting to delete a group of datasets, could we prefix the group we want with a catch all phrase for the project and then orchadmin rm 'catch all phrase'* ?
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Treat it as any other command line call.

orchadmin rm *.ds
orchadmin rm /fullpath/*.ds (better)
orchadmin rm /fullpath/filename1.ds (best)
orchadmin rm /fullpath/filename2.ds (best)
orchadmin rm /fullpath/filename3.ds (best)
Post Reply