Page 1 of 1

waitForWriteSignal(): Premature EOF on node xxx

Posted: Mon Jun 22, 2015 2:41 pm
by hsahay
Hi

We have a job that reads from a sequential file then based upon the record type, writes to one of the several oracle stages.


Code: Select all

seq_file ...........transformer .....oraclestg1
                                 ......oraclestg2
                                   ........oraclestg3
                                   ...
                                   .....
                                     .............oraclestgn

At each oracle enterprise stage we are getting the following error -

waitForWriteSignal(): Premature EOF on node xxxx Socket operation on non-socket

We use a 4 node configuration. So we get the above error 4 times for every oracle stage.

This is all the error message that we get. Rest of them are about players section leaders terminating allover the place and whatnot ....

I have searched the forums but have not found anything that could help me.

Posted: Tue Jun 23, 2015 12:29 am
by singhald
I will suggest you to create a sample job and try to see if you are able to read your text from sequential stage.

share your findings

Posted: Tue Jun 23, 2015 1:41 am
by ArndW
That specific error message is typically not the cause of the problem, but a symptom of job processes terminating; i.e. you should look elsewhere in the job and logs for the root cause of the problem.

Posted: Tue Jun 23, 2015 2:06 am
by priyadarshikunal
As Arnd already noted, you will have to look through all the different error or warning messages. If you are not able to, then Try posting error and warning lines.

Posted: Tue Jun 23, 2015 11:59 am
by hsahay
Thanks for your response gentlemen.

Reading from the sequential file is not a problem because we are able to do a view data from the designer.

All the errors appear for the oracle enterprise stage.

There is no other message.

The only error messages are -

grh,0: Failure during execution of operator logic.
grh,0: Input 0 consumed 0 records.
grh,0: Fatal Error: waitForWriteSignal(): Premature EOF on node server.domain.com Socket operation on non-socket

(grh is the name of one oracle stage)

The above set of messages repeat for each oracle stage.

Then we get

node_node01: Player 1 terminated unexpectedly.
main_program: APT_PMsectionLeader(1, node01), player 1 - Unexpected exit status 1.
node_node01: Player 25 terminated unexpectedly.
main_program: APT_PMsectionLeader(1, node01), player 25 - Unexpected exit status 1.

followed by similar messages for player 2, player 3....on node 02..node03 etc ...

Posted: Tue Jun 23, 2015 12:28 pm
by qt_ky
How many jobs are running at once when these errors show up? And how many processes are running at once?

Posted: Tue Jun 23, 2015 12:52 pm
by hsahay
This is the test environment.

Only this job was running.
Please note that other jobs using oracle enterprise stage are working fine. So the problem is with this job in particular. Also, this is a working job from 8.1 version and being run for the first time in 11.3 where it is failing.


ulimit -a inside the job shows

time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) unlimited
memory(kbytes) unlimited
coredump(blocks) unlimited
nofiles(descriptors) 131072
threads(per process) unlimited
processes(per user) unlimited

Posted: Wed Jun 24, 2015 1:49 am
by ArndW
I recommend doing the following steps to simplify your error search:

1. Add $APT_CONFIG_FILE to your job with a 1-node configuration. This will reduce your processes and make debugging much easier.
2. After doing (1) and checking to make sure that the error is still present (and that you still cannot find details in the log), replace the Oracle output stage with a PEEK stage and re-run the job; the error will go away if the root cause is indeed in the Oracle stage.
3. What type of write is being done in the Oracle stage - insert, update, bulk load? Anything different from the other jobs which work?
4. Add a reject link to the stage and see if the job still aborts or just writes lots of rejects.

Posted: Wed Jun 24, 2015 1:00 pm
by hsahay
Created a copy of the job with just one oracle stage.
job failed.

Replaced the oracle stage with a sequential file stage.
Job failed.

Removed all stage derivations from the transformer.
job was successful

Went back to the original job that had one oracle stage and Changed
the first stage variable derivation to just -1. So set the variable to -1
Job failed

Added APT_DISABLE_COMBINE=True
Job was successful.

Seems like the problem is with the stage variables in the transformer. But this is a working job in 8.1 and it failing only in 11.3 which leads me to think that there is something wrong with our C compiler and the way the transformer has been compiled.

Now working with IBM techsupport to figure out what went wrong where.

Posted: Wed Jun 24, 2015 5:53 pm
by ray.wurlod
What data type is the stage variable whose value you set to -1? If it's a string of some kind, any literal value needs to appear inside quotation marks, for example "-1".

Posted: Wed Jun 24, 2015 9:49 pm
by qt_ky
hsahay wrote:Added APT_DISABLE_COMBINE=True
Job was successful.
I think it's actually APT_DISABLE_COMBINATION but...

:idea: This clue alone rings a bell.

I ran into this same behavior on 11.3 due to using a newer compiler than what was supported. Even the same compiler with newer PTF levels applied caused this problem too.

The compiler must be exactly the version specified on the system requirements page.

See this topic:

viewtopic.php?t=153699&highlight=compiler

Posted: Fri Jun 26, 2015 10:26 am
by hsahay
Thanks QT_KY

That was exactly the problem. We had 13.1 compiler version whereas, apparently, Datastage is certified only with 12.1

Once we downgraded our compiler to 12.1 it all worked fine.

I will mark the thread resolved.