Datastage job uses 1 CPU while remaning 7 are idle

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Umeshkn1704
Participant
Posts: 26
Joined: Fri Aug 01, 2014 11:47 am

Datastage job uses 1 CPU while remaning 7 are idle

Post by Umeshkn1704 »

Hi Team,
Advance wishes to you all - Happy New year.

I have a scenario where I've not found any solution so far and I seek your expert help.

IIS version - 8.7.0.1
OS - Linux 6.7 Santiago
CPU - 8 CPU

I have a sequencer when run it consumes 98% CPU. I did a mpstat and found that it is just using 1 CPU and remaining 7 CPU are sitting idle.

Can you please let me know what steps / configuration that needs to be made to utiliize all the available CPUs.

Thanks in advance.
Thanks
Umesh
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Run more jobs. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

DataStage sequences are of "server job" type, not parallel...in case that you don't have parallel jobs in that sequence

What's the value for the APT_CONFIG_FILE environment value? Are you including the that environment variable as a parameter in your sequence?
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I know my reply sounded a bit facetious but it really wasn't. DataStage doesn't have any control over the number of CPUs that get leveraged - your operating system does. And unless you want to start worrying about processor affinity I would suggest you not worry about it all that much and let the O/S do its job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

make sure the parallel jobs are set to run in parallel.

You can force the issue by splitting the data stream and forking the chunks into repeated copies of the same job (container, parallel job, or even a particular stage) if you must. I dislike the practice, but it works when nothing else will. This technique will cause your job to get an unfair % of available resources when many jobs are running at once, the others may "starve" if you over-do it. I highly recommend that you never manually fork beyond 1/2 of your available cpus, so if you have 8, fork it 4 times at most, maybe try just 3.

I assume that your server and OS and all are set up and configured to actually run things in parallel correctly. If you are set up for single threaded only, the above won't help a bit.

---- HAH, I remember the excitable boy song. Classic alternative stuff :)
Umeshkn1704
Participant
Posts: 26
Joined: Fri Aug 01, 2014 11:47 am

Post by Umeshkn1704 »

Hi,
Thanks for your response.

All the jobs designed within Sequencer are parallel jobs. The environment variable APT_CONFIG_FILE is set to use 4 node config file which is default in our environment within Sequencer and Individual jobs as well.

Thanks
Thanks
Umesh
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Umesh,
What happens when you run one of the parallel jobs by itself from DS Director? One that you have in that DS sequence...Did it execute in one node or four?

Could you put here the value of the environment variable APT_CONFIG_FILE from the logs? ( I know, you stated that it is set to 4 nodes by default, we just need to see what's the value to be able to determine what's causing the issue)
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

When your sequencer runs, are your underlying jobs set to run sequentially or at the same time. Note that this is NOT a question of "Are they Parallel Canvas Jobs". I am asking if your sequencer is kicking them off at the same time, or are they chained together sequentially.


when you run "ps -ef | grep DSD.RUN" do you see multiple jobs running?

Are you executing all on one host or does your APT file send them to other hosts (SMP/Grid setup)?
Post Reply