Best practise

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
akrzy
Participant
Posts: 121
Joined: Wed Dec 08, 2004 4:46 am

Best practise

Post by akrzy »

Hi,

What is the best practise to create productive environment with DS PX on SMP system?

Does DS PX should work on separate server? How many resources DS needs? It is good idea to install DS + Oracle on the same machine?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

akrzy,

this question is far too broad to answer effectively. There are just too many factors involved to make any response useful.

Generally it is best to separate the DataStage server machine from the machine that runs the database. A PX configuration is also highly dependant upon both the hardware and software environment. The installation guides give some rough numbers and, although I haven't checked, I am sure that there are white papers available from IBM/Ascential on configuration and sizing.

You will need to break your questions down into more explicit questions; i.e. "I have a 4 CPU Solaris system for DS shared with a Control-M server and 2Gb of memory with ~200Gb of disk on RAID-5. What tuning options do I have in the configuration file for Version 7.5?"
akrzy
Participant
Posts: 121
Joined: Wed Dec 08, 2004 4:46 am

Post by akrzy »

:) ok, more details:


I have a 8 CPU Solaris system for server and 16Gb of memory with ~4TB of disk on RAID. What tuning options do I have in the configuration file for Version 7.5?"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

As many as you like.

You can create configuration files for one processing node (for small jobs) through to 16 or even more processing nodes (for large jobs). Depending on how your file systems are laid out, you can specify disk and scratch disk resources to spread the I/O around as much as possible. One "best practice" is "use all the disks".

More memory would be useful. 2GB per CPU is the minimum recommended amount of memory for DataStage EE.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Akrzy,

see - progress already with Ray's response. Here's my 2-pence worth: The PX configuration specifies scratch devices. These are true scratch devices in that they won't be needed after a job finishes or aborts. So why use relatively expensive (in terms of I/O time) RAID constructs on them. If you have a RAID-5 array this is wasted on scratch, it is better put on a non-RAID or a RAID-0 device.

As Ray has already stated - spread your I/O across as many controller and spindles as possible.

You should have at least 8 nodes in your configuration file. Since you didn't mention a database on the system, I'll assume it is on another machine. Is this database partitioned? If you have a typical number of partitions it makes sense to make your DS Server's number of node a multiple of that amount.
Post Reply