Page 1 of 1

Dataset Magic

Posted: Wed Nov 10, 2010 12:14 pm
by daignault
I'm looking to update a unix (linux) MAGIC file so I can scan a directory and identifiy both Dataset headers as well as Dataset parts for cleaning/purging older datasets on the system using the file command to identify the object.

I've identified that all Dataset headers start with "Torrent Orchestrate" so this can be placed in the magic file, but I'm not sure about the dataset parts.

It appears that dataset parts start with x001 x000 x000 x000 x000 but I suspect this is going to be too generic. Has anyone tried to setup MAGIC for Datastage?

Thanks in advance,

Ray D

Posted: Wed Nov 10, 2010 3:46 pm
by ray.wurlod
Don't know about magic, but Data Set and File Set segment files do have a very idiosyncratic naming convention that you might be able to leverage. There are probably some identifying bytes farther into the header than the first few bytes, but I haven't looked in there in some time.

Posted: Wed Nov 10, 2010 4:13 pm
by mhester
I believe you are approaching this incorrectly. You should not attempt to automate the cleanup / purge by identifying the dataset header and then identifying the dataset data partitions, rather you should identify the headers and then use the tools (orchadmin) to handle the cleanup/archive/purge. This way you know it is correctly handled.

Posted: Thu Nov 11, 2010 3:21 am
by ArndW
mhester - unfortunately, the ".ds" descriptor files often get deleted using OS commands, leaving orphaned (large) data files, so one needs to find all descriptor files, and any data files that don't link to a descriptor are orphans and can get deleted.

I posted This FAQ on Magic/Orphans which you can use, Ray (Daignault, not Wurlod).

Posted: Thu Nov 11, 2010 8:07 am
by daignault
Thanks Arnd, you took the words from my typing fingers :). I searched the system for dataset and magic but I did not discover your posting. That would have saved the group my rantings.

FYI, I'm doing some Datastage admin work for a large datastage site using offshore contractors. Sometimes the jobs developed are not quite up to spec....such as Datasets and Dataset parts using a suffix of ".txt", Job cleanup using a rm on the dataset header and not the dataset part files, etc.

A background using the file command is here: ( http://unixhelp.ed.ac.uk/CGI/man-cgi?file ) "file" uses a number of methods of identifying the file type. It must either be already aware of the file structure, or have some unique part of that structure defined in a file normally labeled "magic". Not to be confused with this old poster some of us owned ( http://boldt.us/humor/unix_magic.html ).

Cheers,

Ray D

Posted: Thu Nov 11, 2010 8:36 am
by karrisuresh
Hi orchadmin command followed by filename

1)better to write a program in which pass the filenames as operators

2)or in the datastage ->designer->tools->dataset management->go to the path and select the dataset and delete it

3) To flush the dataset

develop a job in wchi src is row generator and put the condition number of rows 0 and connect it to dataset .
when u run the job the job flushes the dataset ie
keep the dataset empty but not removes it


as per ur req u can use any of the above 3 options

Posted: Thu Nov 11, 2010 8:48 am
by ArndW
karrisuresh - the problem is that if you delete the descriptor file by mistake then you have no way of knowing which of the files in the dataset directory are "orphaned" and will never be used again; therefore your suggestion, while being the correct way to do things in a perfect world, will not solve the OPs problem.

Posted: Thu Nov 11, 2010 9:09 am
by mhester
Ray,

I get it :D

This can be a problem and I have asked one of the framework developers at IBM if there is a way to find orphaned resource data.

I will let you know what I hear.

Posted: Fri Nov 12, 2010 9:14 am
by ArndW
mike - I'm a bit chagrined by your response since the FAQ that I wrote explains exactly how to locate such orphaned files.

Posted: Fri Nov 12, 2010 10:01 am
by chulett
I'm assuming he means something more... official. A button, perhaps. :wink:

Posted: Fri Nov 12, 2010 1:00 pm
by mhester
ArndW wrote:mike - I'm a bit chagrined by your response since the FAQ that I wrote explains exactly how to locate such orphaned files. ...
I guess if I paid the premium membership I would have seen your response and not responded in the first place :lol:

Sorry to trump your answer - did not mean too.

And Craig, nothing official, but I did verify with the framework developer @ IBM that there is noting within the data that will identify the header other than the name on disk (which if I had the premium membership Arnd probably already mentioned).

You know us pilots Arnd - too cheap to buy anything :wink: