Page 1 of 1

retrieving a list of files in a remote ftp location

Posted: Mon Nov 10, 2008 3:44 pm
by smoffa
I have the need to retrieve a list of files names from a ftp server directory, then process some of the files found in it. I don't think that I can use a folder stage in a server job because of the need to enter a username/password. I don't think I can use a ftp stage either.

I am not familiar with parallel jobs to know if this is possible there either.

Does anyone have any suggestions?

Posted: Mon Nov 10, 2008 4:06 pm
by chulett
You're going to have to script this, as far as I know. Connect to the remote server via ftp and then issue a 'dir' command. Using DSExecute() from a routine to run the 'ftp script' will capture the filenames in a dynamic array which you can then loop through.

Posted: Mon Nov 10, 2008 4:07 pm
by Nagaraj
Try to use Sequential file stage, there you have an option of choosing the file name format from which you can process more than one file.

Posted: Mon Nov 10, 2008 4:22 pm
by chulett
The Sequential File stage on a remote server location? No can do unless it's also mounted locally in some fashion but then you wouldn't need ftp.

Posted: Mon Nov 10, 2008 4:24 pm
by Nagaraj
I meant we can ftp the files from the server and dump it in our local server and then process it accordingly.

Posted: Mon Nov 10, 2008 4:26 pm
by chulett
Of course, but from what I read the OP wants to get a list of all of the remote files and then bring only some of them down for processing.

Posted: Mon Nov 10, 2008 4:30 pm
by Nagaraj
yes we can do that thru shell script,

i dont think it is a best practice to put all this logic in a datastage job which would take more time to get the job done.

Posted: Tue Nov 11, 2008 7:39 am
by smoffa
Craig is correct. I only want to retrieve the remote files that I need (i.e. not processed). There could be a lot of files in the ftp server directory, so I definitely don't want to get all of them.

Thank you for taking the time to respond. I guess I need to write a script to retrieve the file list.

Posted: Tue Nov 11, 2008 7:41 am
by smoffa
Craig is correct. I only want to retrieve the remote files that I need (i.e. not processed). There could be a lot of files in the ftp server directory, so I definitely don't want to get all of them.

Thank you for taking the time to respond. I guess I need to write a script to retrieve the file list.

Posted: Tue Nov 11, 2008 9:54 am
by Nagaraj
Yes that is what i was talking all about...
You can handle that easily thru scripting,

Posted: Tue Nov 11, 2008 10:15 am
by chulett
So, a snippet from a script for a job where I needed to do almost the exact same thing:

Code: Select all

    ftp -ni ${ftp_host} <<!!
    user ${user} ${pwd}
    cd ${destination}
    dir ${pattern}
    quit
!!
Wrapped in whatever level of error handling you feel you need. When the script is called from DSExecute() the output from the dir command will be captured in whatever you've called the Output dynamic array variable. You can then use functions like DCount to see how many files you've found and array notation and looping structures to iterate through the list.

Posted: Tue Nov 11, 2008 3:33 pm
by smoffa
Craig/Nagaraj,

Thanks for you help so far.

Part 2:

I have created my script and now have a list of files that I would like to process. I know that I can use a start loop stage in a sequencer job to loop and process all the files in my list. My question is how can I pass the file list to this stage?

Can I use a routine activity stage to call my file list script and pass the output to the startloop stage directly?

Posted: Tue Nov 11, 2008 6:31 pm
by chulett
UserVariables Activity stage, call the routine there so the list ends up in a variable that can be leveraged by the Start Loop stage.