Waiting for multiple files

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
ravindras83
Participant
Posts: 15
Joined: Tue Sep 22, 2009 5:54 am

Waiting for multiple files

Post by ravindras83 »

Hi All

I have a situation where we get about 350 different source files (with different metadata). These files have date suffix in the name but the file name is distinct.

e.g A_02122012,B_02122012 etc.

We need to wait for the files to appear before starting processing.

I checked the wait for file stage, but it does not accept multiple file names and wildcards.

Is there way to do this without using 350 WTF stages?

I tried to put the WTF stage in between start and end loop and calculate file name through if then else based on counter in user activity stage. Though it is very big if then else statement it works.

But I cannot terminate the loop if file is not available after the wait time.

Is it possible to terminate the loop using Terminator activity in between (i.e. before all repeatations are complete)?

Right now i have written a routine to do this. I want to know if there is any better way to achieve this.

Note: requirement is that all files be present before starting processing.

Thanks
ravindras83
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sorry, but it is abbreviated as the WFF stage, the WTF stage would be something else entirely. :wink:

And you're correct in tha the WFF stage does not support wildcards, as has been discussed here a number of times. You are better off writing a script or routine to ensure that "all files" have arrived and use that in a Sequence job to conditionally run the load job only when that condition has been met.
-craig

"You can never have too many knives" -- Logan Nine Fingers
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

With 350 source files on your plate, I suggest you have a different sort of problem that could use a different approach. Not knowing what sort of job scheduling automation you have, that would be my first choice for a solution: wait for the process (job) that creates or sends the file(s) to finish rather than wait for the files themselves.

A hybrid of that would be a job that is your file watcher. It looks like you would want several such jobs so you don't have so many files being watched by just one job.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

WTF stage... too funny!

Create a file having the list of file names or file name prefixes. Create a Unix shell script that reads the file list, appends date info using the Unix date command and format options, and checks if each file exists. If any file does not exist, fail, otherwise pass. The script can return different codes based on pass/fail such as 0 or 1. DataStage sequence job can call the script and act according to its return code.
Choose a job you love, and you will never have to work a day in your life. - Confucius
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

Craig, that's a good one.. :)

This may not cover all negative scenarios/error handling/exceptions, but can be a simple check.

Assumption: You will not have any other files in that folder other than the expected 350 files ;)

Instead of having a WFF stage in the loop, just have a command stage with ls -1 | wc -l (or something similar to that), which will give you the file count. Once the count reaches 350, trigger your load jobs!!
Kandy
_________________
Try and Try again…You will succeed atlast!!
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Waiting for multiple files

Post by SURA »

If i would be in there, i will as to create 2 folder.

1 is for LANDING and the other is for STAT. In the LANDING all the real data files will be placed. Once all the files arrived, finally the loaded.ok (0 size file) file will be placed in the STAT folder. You wait for file will be looking for this loaded.ok file. At the end of the process delete the .ok file. So that it can do the same in the next run.

Let me know if my understanding is not correct.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Post by vamsi.4a6 »

qt_ky wrote:Create a file having the list of file names or file name prefixes. Create a Unix shell script that reads the file list, appends date info using the Unix date command and format options, and checks if each file exists.
1)Can anybody explain what is the need for appends date info using the Unix date command and format options?

I think if we created the file with the the list of file names this step is not required and please correct me if i am wrong.
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Re: Waiting for multiple files

Post by kandyshandy »

ravindras83 wrote:I have a situation where we get about 350 different source files (with different metadata). These files have date suffix in the name but the file name is distinct.
OP mentioned the suffix and that's why Eric suggested the date appending logic ;)
Kandy
_________________
Try and Try again…You will succeed atlast!!
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Re: Waiting for multiple files

Post by qt_ky »

The reason is given in the first 2 lines of this topic, but here it is again:
ravindras83 wrote:These files have date suffix in the name but the file name is distinct.

e.g A_02122012,B_02122012 etc.
And one can easily imagine that the date suffix in each file name will change every day.

Unix date command is one way to generate a required date format. For example (MMDDYYYY format for April 18, 2012):

date +%m%d%Y
04182012
Choose a job you love, and you will never have to work a day in your life. - Confucius
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I tend to shy away from date specific solutions like that as you have no ability to 'catch up' if for whatever reason you miss a day or files arrive late with a different date on them. Better to have a mechanism to take whatever is there regardless of date and then ensure they are archived / moved / recorded so they aren't processed again.

To a point made earlier about tying into the delivery system - that would always be the best solution if available. Most of the time you are at the mercy of whomever is delivering anything so you need to do your best to make sure you have everything. An Enterprise scheduler that could kick off the load job after the delivery process is complete would be ideal. A semaphore that you wait for that is delivered last from the source system would be nice as well. Most of the time you're on your own, however.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ravindras83
Participant
Posts: 15
Joined: Tue Sep 22, 2009 5:54 am

Post by ravindras83 »

Thanks all for your replies

sorry for the wrong abbreviation

i am doing what qt_ky suggested but through DS routine.

Kindly anyone help me with the other question.

can i terminate (using terminator activity) a loop before loop repeatations are complete?


thanks
ravindras83
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

You can (not) code for an implied end to the loop before it completes its maximum iterations.

I have a loop that runs up to 20 times based on the contents of up to 20 files. If the next file exists but has nothing in it, I have an abort link. If it has more than one row in it, I have a processing link. A file that has exactly one row in it has no link for it, and the loop ends on that condition with an Info message that the loop did not complete its defined number of iterations.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You can exit the loop early if you need to, simply branch from inside the loop to a Terminator (if you want things to stop abruptly) or branch to a stage past the End Loop stage if you just want to sneak out early. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ravindras83
Participant
Posts: 15
Joined: Tue Sep 22, 2009 5:54 am

Post by ravindras83 »

thanks all

closing the topic
Post Reply