Page 1 of 1

process any and all sequential files in a directory

Posted: Tue Aug 17, 2004 5:13 am
by dzdiver
A new project is to have a variable number of files dropped into a particular directory. Is there a way to have datastage process each of them in turn until all processed? I realise I can do something like run a loop in basic and move the files away once done. But can I tell a stage a sequential filename to work on? I think I could run a cmd to get the name of one of the files there. I thought about using a wait for file activity and wait for wildcard *.* file, then move files in turn to where it would look, but the wildcard for any file didnt work. Maybe need special syntax or something?
(Im a newbie to the basic used btw.)
TIA,
Brian.

Posted: Tue Aug 17, 2004 5:46 am
by denzilsyb
Do you have the FOLDER stage available for you? perhaps this could help. Havent used it myself, but one of our developers have.

Otherwise, do the BASIC code you mention, the only problem I foresee is to determine that when a file is placed there, it is not growing anymore - i.e. all the data has been written to the file and it is ready to process.

With BASIC, depending on the file name you could then start a job that is dependant on the filename by parsing the filename as a parameter to the job (if it is declared in the job). therefore - if file1 is available, start job1.

Posted: Tue Aug 17, 2004 6:04 am
by dzdiver
Yes, saw the folder stage but didnt want all data in one column and then what?
Im thinking since my post that I can have an external shell script copy files one at a time until theyre processed into a fixed dir with the same copied to name. Now wildcard needed any more. I will experiment.
Thanks for the post.
B.

Posted: Tue Aug 17, 2004 6:11 am
by denzilsyb
didnt want all data in one column and then what?
well, use the substring to get the data out into the various columns through a transform stage. you'd need to know what you were looking for and where in the column the data is situated.

you could alternatively have the shell script execute a datastage job when the file is ready for processing. in windows its dsjob.exe, Im not sure what the unix command is - the manual should have it documented. If anything, it should be in the DSEngine/bin directory.

Posted: Tue Aug 17, 2004 6:46 am
by tonystark622
You can use the Row Splitter stage, if you have v7 or higher, to split the data out to individual columns.

Tony

Posted: Tue Aug 17, 2004 7:09 am
by chulett
You can use the Folder stage just to retrieve filenames but only defining the first column in the stage. The second 'all data' column is primarily intended for use with XML data to feed the XML stages.

Also, if you are running 7.5, they've added looping stages in Sequencer jobs to allow you to do exactly the kind of thing you are asking about without have to write any job control code. 8)

Posted: Tue Aug 17, 2004 8:41 am
by hassers
I have a group of files coming into a specific UNIX Subdirectory,
they were all of the same format.

I run a DataStage job in the parameters I have the before Stage set the ExecSH
The script concats the files in the subdirectory to a single file. and then moves the data files to an Archive.

It seems to work for me, but I'm not running in real-time.

Posted: Tue Aug 17, 2004 9:57 am
by dzdiver
thanks for the info, however I need access to the name of the file in a transform. I cant seem to get that AND the data if I use folder stage and row splitter so I will be doing as follows:-
copy file to processing area with destination fixed name e.g. ProcessMe.
copy a file into processing area called FileName that contains file's original name.
have script then kick off wrapper job that contains BASIC to:-
read the filename from FileName into variable filename
start actual transform job with
DSSetParam filename
This way I get the filename as needed.
Whew, a bit long winded. Maybe Ill just write a pl/sql job instead...
Anyway, thanks for all the suggestions and ideas.