Accessing files on a remote server

raj4756 · Post by **raj4756** » Thu Apr 08, 2004 8:30 am

Hi All,

I have DataStage installed on server A. How can I read the dataset on server B which does not have DataStage installed on it. I also want to be able to write to server B.

Thanks.

Raj

xcb · Post by **xcb** » Thu Apr 08, 2004 8:46 am

If you are running on windows you can map a drive to the remote server and read/write the dataset over your network. I don't know how to do it on UNIX but I'm sure it has the same functionality.

chulett · Post by **chulett** » Thu Apr 08, 2004 8:53 am

NFS mount or an equivalent mechanism. Or work locally and use FTP to get the files back and forth.

Deepak_J · Post by **Deepak_J** » Thu Apr 08, 2004 8:56 am

You can write FTP script to bring your file to server A and execute this script as a before job routine for your Job. Also you write the o/p on server A and then have another FTP script that would FTP the file to server B on a after job sub routine.
Hope this helps.

Deepak

raj4756 · Post by **raj4756** » Thu Apr 08, 2004 9:12 am

Could you please explain how NFS mount works. Any examples will be appreciated.
Also, does the FTP on a .ds file work the same way as a .dat file.

Let me know.

Thanks.

Raj

Deepak_J · Post by **Deepak_J** » Thu Apr 08, 2004 9:23 am

Please clarify, what do u mean by ?

does the FTP on a .ds file work the same way as a .dat file

Also, FTP would be a better solution in terms of speed and reliability.

Deepak

raj4756 · Post by **raj4756** » Thu Apr 08, 2004 9:51 am

What I mean is that the .ds file is just a pointer to the underlying data files in the DataFiles directory. Do I have to FTP all the underlying files as well.

Thanks.

Raj

kcbland · Post by **kcbland** » Thu Apr 08, 2004 9:57 am

raj4756 wrote:What I mean is that the .ds file is just a pointer to the underlying data files in the DataFiles directory. Do I have to FTP all the underlying files as well.

Thanks.

Raj

A .ds file is a specific work file structure that PX uses. It has no ability to be used by anything other than PX, so why would you put it anywhere else? If you need to create a data file to send to another server, then create a sequential file and ftp it. An alternative is to share a filesystem on on the DS server that is visible to the remote server (the remote server NFS (Network File System, I suggest you google it and learn about it, it's been around for quite a long time). Either way, your premise of distributing a .ds file is invalid.

raj4756 · Post by **raj4756** » Thu Apr 08, 2004 11:46 am

Ken,

We are in the process of comparing peformance of reads from partitioned oracle tables versus reading from .ds files.
The oracle database is on a remote server. So, we want to mimic this with the .ds by reading from the remote server . Moreover we don't
want to be testing on the production server and meddle with the production files.

Any suggestions?

Thanks.

Raj

kcbland · Post by **kcbland** » Thu Apr 08, 2004 12:44 pm

Your comparison is Oracle references versus a .ds Merge or Lookup operation. In the words of Tom Kyte, you're comparing apples to toaster ovens. The whole problem with database reference lookup calls is the saturation issue on the database server side combined with repititive query combined with network traffic combined with query queuing combined with shifting degrees of parallelism based on instantaneous parallel query slave usage on the server.

Both flavors of DataStage, Server and PX, encourage the use of localized reference structures (hash files, .ds, etc) to remove the database from the equation and put it into an optimized place, the DataStage transformation server.

raj4756 · Post by **raj4756** » Thu Apr 08, 2004 2:39 pm

Kenneth,

We are basically an Oracle shop (90%), and are thinking of using Oracle tables not just as reference, but as source data as well. So you don't think this is a great idea ?

Raj

kcbland · Post by **kcbland** » Thu Apr 08, 2004 2:50 pm

Databases are sources and targets. As reference objects they don't do well for a host of reasons. I'm not talking about joins, I'm talking about references, a totally different concept. As sources and targets, obviously, they are fine. But, when you have to manipulate, massage, or enrich data then temporary scratch pad objects not in a database have the highest performance.

ray.wurlod · Post by **ray.wurlod** » Thu Apr 08, 2004 5:05 pm

The question was about datasets in the PX environment.

All you should need to do, provided that the machines are in the same cluster, is to define the hostname along with other definitions of each resource, in the PX configuration file associated with the job. Information on editing configuration files is in the DataStage Manager Guide (man_gde.pdf) in your Docs folder. In particular, see Chapter 11 (The Parallel Engine Configuration File).