Page 1 of 2

Fatal Error: Unable to allocate communication resources

Posted: Thu Jun 11, 2009 6:43 am
by JPalatianos
Hi,
We have a job that starts with a sequential file and then does a lookup (via the lookup stage) on an ODBC stage. When we run the job we get the following error on the lookup stage.
1st warning:
main_program: Operator "parallel APT_KeyGenerator in skNewPlan" is not wave aware; the operator will be reset and rerun on each wave if multiple waves present. This may lead to incorrect results and memory issues. Update the operator to make it wave aware and calls setWaveAware() in describeOperator() to inform the framework that the operator knows how to handle waves.
2nd warning:
skNewPlan: When checking operator: When binding output interface field "METRIX_PLAN_KEY" to field "METRIX_PLAN_KEY": Implicit conversion from source type "uint64" to result type "int32": Possible range limitation.

3rd warning:
Sequential_File_11: When checking operator: A sequential operator cannot preserve the partitioning
of the parallel data set on input port 0.

Fatal error 1:
lkup_existingPlans,0: Fatal Error: Unable to allocate communication resources.

Fatal error 2:
node_node0: Player 4 terminated unexpectedly.
Player 2 terminated unexpectedly.

Fatal error 3:
main_program: APT_PMsectionLeader(1, node0), player 2 - Unexpected exit status 1.

last fatal error:
main_program: Step execution finished with status = FAILED.

Thanks - - John

Posted: Thu Jun 11, 2009 7:15 am
by JPalatianos
Just wanted to add that developers have only recently started developing Parallel jobs in our dev environment and am wondering if it is a configuration issue on my part. Being new to the enterprise edition, I was wondering If I missed something during the install/setup/configuration of DataStage?
Thanks - - John

Posted: Thu Jun 11, 2009 7:27 am
by ArndW
Are you doing wave processing (intentionally)? What stage is setting up end-of-waves? Are you sure that you've seen all the erorr messages? If, for instance, a disk fills up you will also get the "Unable to allocate communication resources." message, but that is not the cause of the problem, just an effect.

Posted: Thu Jun 11, 2009 7:33 am
by JPalatianos
Definitely not doing wave processing intentionally since neither I or the developer knows wghta that is. Sorry for the ignorance...but we are new to the world of parallel processing.

Regarding the errors these are all I see in director. There are other messages but all informational.

Posted: Thu Jun 11, 2009 9:59 am
by priyadarshikunal
This is a common warning when you use ODBC connector stage as a source and Surrogate key generator stage downstream. In that case it should not cause this error.

You should have some other information in logs. Since the error is coming in a lookup stage (i believe). Check for other log entry which says "Lookup failed" or some thing similar.

Check if you have any message handler attached.

Posted: Thu Jun 11, 2009 10:53 am
by JPalatianos
The only message I see from the LOOKup is this:
Fatal error 1:
lkup_existingPlans,0: Fatal Error: Unable to allocate communication resources.

Posted: Thu Jun 11, 2009 11:23 am
by JRodriguez
JPalatiano,

Did you implemented these suggested steps in your Windows PX processing environment?

http://publib.boulder.ibm.com/infocente ... e_win.html

If yes then can try a couple of steps to fix the issue in order of how easy they are:

Add the environment variable APT_DISABLE_COMBINATION=FALSE

or increase the DSIPC_OPEN_TIMEOUT value
or add the following 2 environment variables to the job, compile and re-run it.

APT_PM_CONDUCTOR_TIMEOUT=60
APT_PM_NODE_TIMEOUT=60

I have given you the initial settings for these environment variables. You can increase them in 30 second increments until the job runs without error


Finally, check the OS users/groups permission on the WINDOWS\TEMP directory and make sure to set the group Users to be 'All Access'

Posted: Thu Jun 11, 2009 11:27 am
by chulett
JRodriguez wrote:Add the environment variable APT_DISABLE_COMBINATION=FALSE
Guessing you actually meant True here.

Posted: Thu Jun 11, 2009 11:30 am
by JRodriguez
Nope, This should be set to "FALSE" to use less time between processes ...

Posted: Thu Jun 11, 2009 11:39 am
by chulett
OK... but isn't that the default? I thought the suggestion was to disable it so perhaps the problem stage/area would be more obvious in the score or logs but I guess not.

Carry on.

Posted: Thu Jun 11, 2009 11:54 am
by JRodriguez
Chulett,

Normally you disable combining operator to facilitate debugging as you stated .. in this case we are looking that the job used less resources

When you set the variable to TRUE more processes are generated, and the job will used more system resources which can cause this error

Posted: Thu Jun 11, 2009 12:10 pm
by chulett
Got that... but still, I don't believe adding that will change anything as the option of "operator combinality" is on and happening by default regardless. You specifically add the one you mentioned in order to set 'disable' to 'true' and thus stop it from happening.

That was my only point in the last post.

Posted: Thu Jun 11, 2009 12:28 pm
by JPalatianos
Thanks guys!!!
I added the variable APT_DISABLE_COMBINATION and set it to TRUE. All warning and errors are now gone.

Posted: Thu Jun 11, 2009 12:44 pm
by JRodriguez
Great!

if your warning and errors are gone after setting the environment variable to TRUE mean that you timeout setting were not enough

Leaving the environment variable set to true disables internal job optimizations and will cause other issues in the future ...

I would suggest fixing the root of this issue ... and then setting the environment variable back to FALS

Posted: Thu Jun 11, 2009 1:30 pm
by chulett
As noted, it's not really a solution, per se... still best to see if you can track down the root cause.