process any and all sequential files in a directory

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
dzdiver
Participant
Posts: 36
Joined: Tue May 25, 2004 8:55 am
Location: global

process any and all sequential files in a directory

Post by dzdiver »

A new project is to have a variable number of files dropped into a particular directory. Is there a way to have datastage process each of them in turn until all processed? I realise I can do something like run a loop in basic and move the files away once done. But can I tell a stage a sequential filename to work on? I think I could run a cmd to get the name of one of the files there. I thought about using a wait for file activity and wait for wildcard *.* file, then move files in turn to where it would look, but the wildcard for any file didnt work. Maybe need special syntax or something?
(Im a newbie to the basic used btw.)
TIA,
Brian.
denzilsyb
Participant
Posts: 186
Joined: Mon Sep 22, 2003 7:38 am
Location: South Africa
Contact:

Post by denzilsyb »

Do you have the FOLDER stage available for you? perhaps this could help. Havent used it myself, but one of our developers have.

Otherwise, do the BASIC code you mention, the only problem I foresee is to determine that when a file is placed there, it is not growing anymore - i.e. all the data has been written to the file and it is ready to process.

With BASIC, depending on the file name you could then start a job that is dependant on the filename by parsing the filename as a parameter to the job (if it is declared in the job). therefore - if file1 is available, start job1.
dnzl
"what the thinker thinks, the prover proves" - Robert Anton Wilson
dzdiver
Participant
Posts: 36
Joined: Tue May 25, 2004 8:55 am
Location: global

Post by dzdiver »

Yes, saw the folder stage but didnt want all data in one column and then what?
Im thinking since my post that I can have an external shell script copy files one at a time until theyre processed into a fixed dir with the same copied to name. Now wildcard needed any more. I will experiment.
Thanks for the post.
B.
denzilsyb
Participant
Posts: 186
Joined: Mon Sep 22, 2003 7:38 am
Location: South Africa
Contact:

Post by denzilsyb »

didnt want all data in one column and then what?
well, use the substring to get the data out into the various columns through a transform stage. you'd need to know what you were looking for and where in the column the data is situated.

you could alternatively have the shell script execute a datastage job when the file is ready for processing. in windows its dsjob.exe, Im not sure what the unix command is - the manual should have it documented. If anything, it should be in the DSEngine/bin directory.
dnzl
"what the thinker thinks, the prover proves" - Robert Anton Wilson
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

You can use the Row Splitter stage, if you have v7 or higher, to split the data out to individual columns.

Tony
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You can use the Folder stage just to retrieve filenames but only defining the first column in the stage. The second 'all data' column is primarily intended for use with XML data to feed the XML stages.

Also, if you are running 7.5, they've added looping stages in Sequencer jobs to allow you to do exactly the kind of thing you are asking about without have to write any job control code. 8)
-craig

"You can never have too many knives" -- Logan Nine Fingers
hassers
Participant
Posts: 14
Joined: Thu Dec 11, 2003 11:34 am
Location: Chester, UK

Post by hassers »

I have a group of files coming into a specific UNIX Subdirectory,
they were all of the same format.

I run a DataStage job in the parameters I have the before Stage set the ExecSH
The script concats the files in the subdirectory to a single file. and then moves the data files to an Archive.

It seems to work for me, but I'm not running in real-time.
Thanks

Steve
dzdiver
Participant
Posts: 36
Joined: Tue May 25, 2004 8:55 am
Location: global

Post by dzdiver »

thanks for the info, however I need access to the name of the file in a transform. I cant seem to get that AND the data if I use folder stage and row splitter so I will be doing as follows:-
copy file to processing area with destination fixed name e.g. ProcessMe.
copy a file into processing area called FileName that contains file's original name.
have script then kick off wrapper job that contains BASIC to:-
read the filename from FileName into variable filename
start actual transform job with
DSSetParam filename
This way I get the filename as needed.
Whew, a bit long winded. Maybe Ill just write a pl/sql job instead...
Anyway, thanks for all the suggestions and ideas.
Post Reply