Wait for file routine - handling incomplete files?

sjordery · Post by **sjordery** » Wed Feb 20, 2008 11:32 am

Hello All,

I have a sequence that runs sub-sequences once files are in place in a specified directory. I am using a routine for this, as specified by Ray Wurlod at this link viewtopic.php?t=115198&highlight=wait+file+wildcard - thanks Ray! I cannot use Wait For File because the file name is not always the same.

It is all working well, but I stumbled across a problem - in one case, when the routine was executed, a large file was mid-way through being FTP'd to the directory.. the routine saw the file was there, and kicked off the sub-sequence too early.

Can anyone suggest a way around this please? I presume that the Wait For File Activity must do something internally to only kick off jobs once the whole file is in place?

Many thanks as ever.

S

chulett · Post by **chulett** » Wed Feb 20, 2008 11:57 am

No, the WFF stage would have the same problem - see file, get file.

The typical solution would be to utilize a semaphore file, a small usually zero-byte file that is sent after the main file. You poll for the semaphore and, when it arrives, go get the main file.

Either that or you need to build logic into your routine to see of the file is complete. I've seen people grep for the presence of known trailer information, check multiple times to see if the file is growing or use something like 'fuser' to know if the file is open by another user.

I prefer the semaphore approach.

sjordery · Post by **sjordery** » Wed Feb 20, 2008 12:03 pm

chulett wrote:I prefer the semaphore approach.

Thanks for the quick reply and your time Craig - I'll try and work the semaphore approach into my plan. :D

Cheers
S

ArndW · Post by **ArndW** » Thu Feb 21, 2008 2:16 am

I like to FTP the file using a script and using a suffix such as ".in_process" and then, once the FTP is successful, rename the file to the correct name.

sjordery · Post by **sjordery** » Thu Feb 21, 2008 3:15 am

Thanks very much for that ArndW - another top tip!

Cheers,
S

stefanfrost1 · Post by **stefanfrost1** » Thu Feb 21, 2008 6:46 am

I use a different approach (on linux and unix plattforms) and that is when any file has been ftp:d or written to a specific directory , say /arrival. On completion the file(s) are moved (using mv) to another directory where datastage is listing for files, say /complete.

Datastage then moves any files to be used to another directory, say /inprogress, and when finished processing to a archive directory (if needed), say /archive.

This way i can ensure restartability and integrety not only in delivery but also when (if) the data integretion job fails. It will restart and may add additional new files to the process and/or reprocess any from the previous run. Meanwhilst new files can be delivered without disturbing the data integration process.

chulett · Post by **chulett** » Thu Feb 21, 2008 8:19 am

Sure, excellent point, we do the same thing when multiple files are involved - especially if they trickle in over the course of the day. Didn't mention that as we were talking about a single file.

sjordery · Post by **sjordery** » Thu Feb 21, 2008 8:35 am

stefanfrost1 wrote:I use a different approach (on linux and unix plattforms) and that is when any file has been ftp:d or written to a specific directory , say /arrival. On completion the file(s) are moved (using mv) to another directory where datastage is listing for files, say /complete.

Datastage then moves any files to be used to another directory, say /inprogress, and when finished processing to a archive directory (if needed), say /archive.

This way i can ensure restartability and integrety not only in delivery but also when (if) the data integretion job fails. It will restart and may add additional new files to the process and/or reprocess any from the previous run. Meanwhilst new files can be delivered without disturbing the data integration process.

Thanks Stefan. Can I ask what triggers the 'mv' from /arrival to /complete?

My original process executed a job that waited for a file to arrive and moved the file, once present, from a landing to an input directory, but mv was fired as soon as the file first hit the landing directory, so moved an incomplete file...

Thanks
S

chulett · Post by **chulett** » Thu Feb 21, 2008 8:43 am

Whomever was transferring the file would need to do that, post transfer. You can't, unless you first check to ensure the file is completely transferred, and if you do that we're back where we started.

sjordery · Post by **sjordery** » Thu Feb 21, 2008 9:24 am

chulett wrote:Whomever was transferring the file would need to do that, post transfer. You can't, unless you first check to ensure the file is completely transferred, and if you do that we're back where we started.

Ah, got you - so the 'mv' would be issues by whatever application posted the file. I'm there now!

Cheers,
S