Downloading FTP based on a flag file

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Downloading FTP based on a flag file

Post by VCInDSX »

I have a requirement to download files from an FTP site using the following rules.
1. This will be a daily job run via scheduler or cron or batch
2. Connect to a known FTP site and check if a file with the pattern "FILE_PREFIX_YYYYMMDDHHMMSS.ready" exists.
2.a The HHMMSS is not constant and might vary on a given day.
2.b Ideally speaking we should look for "FILE_PREFIX_YYYYMMDD*.ready"
3. If the above file is not found, sleep for a configured amount of time (3 minutes, e.g.) and try again.
4. This can be repeated for a configurable number of retry attempts (5 times, e.g.)
5. If the "FILE_PREFIX_YYYYMMDDHHMMSS.ready" file is found, it signals the presence of "FILE_PREFIX_YYYYMMDDHHMMSS.zip" file in that same FTP folder.
6. Download the "FILE_PREFIX_YYYYMMDDHHMMSS.zip" file to local box.
7. Unzip this file to a configured folder.
8. Another job is launched that uses this file for parsing and loading into a database table. This job for step 8 alone is ready and has been tested.

For steps 1 to 7, from various other posts, I understand that shell scripts would be ideal for FTP operation.
I can handle the FTP command by supplying the FTP params in an external file. I am not very conversant with Shell scripting and was wondering if it would be better to do the polling, sleep and retry in Datastage or do some more searches (google et al) and script it out completely.

The other option is to write a perl script to do all this and invoke via Execute command stage. In both cases, I would have to do some additional coding to handle the return code from these scripts.

Final option is to write a DS Basic Routine to do all this in one shot by taking all the input parameters. Would that be the best solution?

Not picky about Server or PX jobs. Whichever is easy to implement and maintain can be chosen.

On the Pattern based FTP get-ting, I came across WGET which has powerful built-features for FTP download and allows retries, wait/sleep, et al in one command line. Has anyone used that from within Datastage?

Please let me know if you need any additional details that will help you help me.

Thanks in advance for your invaluable time and help.
-V
Post Reply