Can we run a parallel Extender on s single processor

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
king999
Participant
Posts: 7
Joined: Thu Sep 29, 2005 3:46 pm

Can we run a parallel Extender on s single processor

Post by king999 »

hi
can we run parallel job on a single processor unit. if we configure the stage to sequential and do the parallel job
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Yes you can run a parallel job on a single CPU. You can also type with the cap lock off and start sentences with a capital letter but that's getting off the topic.

If you have a one CPU dev environment and you are delivering to a multiple CPU prod environment I recommend you configure at least two nodes to verify that your jobs are partitioning correctly. A single CPU should be able to handle two nodes.

Sometimes you get a requirement to run a job on a single node even though the default configuration for the server is multiple nodes. For example a small job that runs faster as a single instances, or a job that reads from and writes to a sequential file. Just create a config file with one node in it, add the $APT_CONFIG_FILE environment variable to your job and set it to the name of the 1 node file.
pavanns
Participant
Posts: 27
Joined: Wed Sep 28, 2005 8:00 pm
Location: ca

Post by pavanns »

I FEEL YOU CAN TRY IT OUT BY CHANGING THE CONFIG_APT FILE IN THE CONFIGURATIONS IN DS MANAGER MODULE..
pavan
track_star
Participant
Posts: 60
Joined: Sat Jan 24, 2004 12:52 pm
Location: Mount Carmel, IL

Post by track_star »

Of course you can....but don't expect great things. You could even run it in a single process if you really want to get crazy. :shock:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That would be a good trick! I assume you're not counting the conductor or player process in this, and combining all operators?

Hey, let's totally defeat the functionality of this parallel architecture that we've just spent a squillion dollars on! :roll:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sudheer05
Participant
Posts: 30
Joined: Sun Oct 02, 2005 1:36 pm

Post by sudheer05 »

Hi,
Enterprise Edition is the version of DataStage that allows you to develop
parallel jobs. These run on DataStage UNIX servers that are SMP, MPP, or
cluster systems, but you can install it on an Windows server in order to
develop jobs which can subsequently be run on a UNIX server.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well, there is a specific version of DataStage EE that allows you to run PX jobs on Windows - 7.5.X2 - from what I recall.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

chulett wrote:Well, there is a specific version of DataStage EE that allows you to run PX jobs on Windows - 7.5.X2 - from what I recall.
No dot, little "X". 7.5x2
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SPI
Participant
Posts: 5
Joined: Wed Oct 05, 2005 1:34 am
Location: Bordeaux

Check data integrity one node environment ?

Post by SPI »

I would like to spot a point of the reflexion of Vincent. - Can we have problems or differences with results between development and production if we develop jobs on a single node (no matter the number of processor) and exploit them on multiples nodes environment.
Is It really necessary to have two nodes minimum on the environment of development to check the data integrity ?
I would wish that somebody assure me the unpleasant feeling which I'm not making good job by working on a single node development environment...
SPI
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Your better off with 2 CPU minimum so you can verify your partitioning logic is ok for multiple nodes.

There is no way of verifying it otherwise.
Even experts make mistakes.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
track_star
Participant
Posts: 60
Joined: Sat Jan 24, 2004 12:52 pm
Location: Mount Carmel, IL

Post by track_star »

SPI--you definitely can (and more than likely will) have problems if you develop in one node and deploy to production in multiple nodes. Hash partitioning, joins, removing duplicates, and loading databases in parallel come to mind......

You can run PX on a single CPU (like to do UAT), just don't overload that single CPU. Roy is right, you're better off with multiple CPUs, but you can run a two node config file on a single CPU system.

Ray....the env var to set is APT_EXECUTION_MODE=single process. It's a pretty good tool for debugging--and not much else!!!
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I would never develop on a single node if my server could handle two. A single CPU server should be able to handle two nodes. If I had two CPUs I would consider four nodes. The more nodes you use the more likely it is that you will spot partitioning errors during your unit testing. If you cannot do it in your development environment then make sure you are doing it in your testing environment.

To move a job into production without ever checking it against multiple nodes is very dangerous.

We also keep several different configuration files in our dev environment so if we see something that we think is a partition problem we switch to a different number of nodes to see if we get a different result.
SPI
Participant
Posts: 5
Joined: Wed Oct 05, 2005 1:34 am
Location: Bordeaux

No multiple nodes configuration with Sun OS !!!

Post by SPI »

Thank you has all for your answers, my anguish was justified. Meanwhile, one gave me the following explanation : on our machine (SUN 5.8, PX V7.0.1), the jobs failed all the time when they were processed with multiples nodes configurations. The problem would be identified at Ascential support on SUN OS. The increase in the number of processor and RAM did nothing there. I acknowledge to find these explanations eccentric because the nodes define only directories and thus disk spaces to distribute the datasets. For me, there's no direct link with the core parameters or the dynamic capacity of the machine. Have you ever heard about this bug on SUN. Thank you for your answers.
SPI
track_star
Participant
Posts: 60
Joined: Sat Jan 24, 2004 12:52 pm
Location: Mount Carmel, IL

Post by track_star »

If you don't have it in your LD_LIBRARY_PATH for the projects, make sure /usr/lib/lwp is in there. Sun had a known issue with some of the libraries at 2.8 and provided those new ones to resolve the issues. I don't know if that would help with your issue of jobs failing when run with multi-node configs or not, but it might help. You could also drop some design-specific info into a new post--there might be something else going on.
Post Reply