Reading Remote datafile without using FTP Stage !!!!!

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
murur
Participant
Posts: 19
Joined: Wed Apr 14, 2004 7:55 am

Reading Remote datafile without using FTP Stage !!!!!

Post by murur »

Is there any other way EXCEPT using FTP Stage to read and write datafile from remote server in a DataStage Server Job


DataStage Server is running on UNIX server
Source/Traget files reside in WindowNT server


:idea: :roll: :idea:
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Never use this stage, it's a sales gimmick. Use an FTP command line script to move files locally, and then distribute when done. The FTP stage is not practical because the only reason to use it for large files is to prevent copying the file locally, except, a large file will take FOREVER to read using this stage via network, making the likelyhood of a dropped connection increase.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you can mount the disks, for example via SAMBA, then they are accessible. In older versions of DataStage you may need to enable NFS via the ALLOWNFS configuration parameter.

Or, as Ken suggests, yoy can create a shell script that retrieves the file to the local machine (perhaps using rcp, perhaps using FTP) before processing it with DataStage. Such a script could be run from a before-job subroutine (ExecSH) or a Command Activity in a sequence.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I wanted to share this private email with everyone so that we all benefit from the discussion:
Just noted your response in the DSXchange to request for a method to transport files via method other than FTP.

You say never to use the FTP stage as it is a gimmick. We use this stage fairly extensively throughout our system (DS 5.2 on Unix) without any problems. Granted most file sizes are relatively small but we very rarely have any hitches with the jobs.

I am curious to understand why we should not be using it when we have had very few problems with it.

I would very much appreciate your advice on this as we are looking at an upgrade to version 7.1 very shortly and if there is good reason to change the FTP stage's then now would be the time.

Please let me know.
The FTP stage "reads" the file, while an command line FTP preserves the file without prejudice in the transfer. So, using the FTP stage to just "move" a file involves "reading" and "writing" rows and columns, whereas command line FTP doesn't care about content.

For small volume files, an NFS or Samba mount is really elegant. The network performance is not so critical as the volume is low. This means that you use the Sequential stage, which has a lot more features and is easier to work with.

For large volume sources, you must consider using multiple job instance designs to parallel process the source data. In order to parallel process a source data set, you need to be able to "partition" or "cut" the data into equal groups. You can't do this if the data is remote via FTP and it's really difficult if it's in a table. In a local sequential file, you have many options and it's really easy.

Basically, if high volumes are your concern, parallel job instances is your solution. Having the data local is required for maximum throughput inbound and outbound. Once the data is produced, moving it remotely is optimally achieved using command line FTP, whereby compression and dedicated transfer can take place.

If the FTP stage is working for you, great. But, you may want to benchmark using the stage on a 30GB 50 million row remote file. Just write a simple job that parses the file and eliminates some columns and write the output. Benchmark the FTP stage as the reader versus a command line transfer and then a Sequential stage as the reader in the transform job.

Don't even mention restart capability. If your process dies 1/2 way thru, you will re-incur the full transfer, whereas the localized file gives you the ability to have the job skip rows if you build in restart from an @INROWNUM to job parameter constraint check. That same check of FTP versus Sequential is lightyears in difference on performance.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply