I wanted to share this private email with everyone so that we all benefit from the discussion:
Just noted your response in the DSXchange to request for a method to transport files via method other than FTP.
You say never to use the FTP stage as it is a gimmick. We use this stage fairly extensively throughout our system (DS 5.2 on Unix) without any problems. Granted most file sizes are relatively small but we very rarely have any hitches with the jobs.
I am curious to understand why we should not be using it when we have had very few problems with it.
I would very much appreciate your advice on this as we are looking at an upgrade to version 7.1 very shortly and if there is good reason to change the FTP stage's then now would be the time.
Please let me know.
The FTP stage "reads" the file, while an command line FTP preserves the file without prejudice in the transfer. So, using the FTP stage to just "move" a file involves "reading" and "writing" rows and columns, whereas command line FTP doesn't care about content.
For small volume files, an NFS or Samba mount is really elegant. The network performance is not so critical as the volume is low. This means that you use the Sequential stage, which has a lot more features and is easier to work with.
For large volume sources, you must consider using multiple job instance designs to parallel process the source data. In order to parallel process a source data set, you need to be able to "partition" or "cut" the data into equal groups. You can't do this if the data is remote via FTP and it's really difficult if it's in a table. In a local sequential file, you have many options and it's really easy.
Basically, if high volumes are your concern, parallel job instances is your solution. Having the data local is required for maximum throughput inbound and outbound. Once the data is produced, moving it remotely is optimally achieved using command line FTP, whereby compression and dedicated transfer can take place.
If the FTP stage is working for you, great. But, you may want to benchmark using the stage on a 30GB 50 million row remote file. Just write a simple job that parses the file and eliminates some columns and write the output. Benchmark the FTP stage as the reader versus a command line transfer and then a Sequential stage as the reader in the transform job.
Don't even mention restart capability. If your process dies 1/2 way thru, you will re-incur the full transfer, whereas the localized file gives you the ability to have the job skip rows if you build in restart from an @INROWNUM to job parameter constraint check. That same check of FTP versus Sequential is lightyears in difference on performance.