waitForWriteSignal(): Premature EOF on node xxx

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
hsahay
Premium Member
Premium Member
Posts: 175
Joined: Wed Mar 21, 2007 9:35 am

waitForWriteSignal(): Premature EOF on node xxx

Post by hsahay »

Hi

We have a job that reads from a sequential file then based upon the record type, writes to one of the several oracle stages.


Code: Select all

seq_file ...........transformer .....oraclestg1
                                 ......oraclestg2
                                   ........oraclestg3
                                   ...
                                   .....
                                     .............oraclestgn

At each oracle enterprise stage we are getting the following error -

waitForWriteSignal(): Premature EOF on node xxxx Socket operation on non-socket

We use a 4 node configuration. So we get the above error 4 times for every oracle stage.

This is all the error message that we get. Rest of them are about players section leaders terminating allover the place and whatnot ....

I have searched the forums but have not found anything that could help me.
vishal
singhald
Participant
Posts: 180
Joined: Tue Aug 23, 2005 2:50 am
Location: Bangalore
Contact:

Post by singhald »

I will suggest you to create a sample job and try to see if you are able to read your text from sequential stage.

share your findings
Regards,
Deepak Singhal
Everything is okay in the end. If it's not okay, then it's not the end.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

That specific error message is typically not the cause of the problem, but a symptom of job processes terminating; i.e. you should look elsewhere in the job and logs for the root cause of the problem.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

As Arnd already noted, you will have to look through all the different error or warning messages. If you are not able to, then Try posting error and warning lines.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
hsahay
Premium Member
Premium Member
Posts: 175
Joined: Wed Mar 21, 2007 9:35 am

Post by hsahay »

Thanks for your response gentlemen.

Reading from the sequential file is not a problem because we are able to do a view data from the designer.

All the errors appear for the oracle enterprise stage.

There is no other message.

The only error messages are -

grh,0: Failure during execution of operator logic.
grh,0: Input 0 consumed 0 records.
grh,0: Fatal Error: waitForWriteSignal(): Premature EOF on node server.domain.com Socket operation on non-socket

(grh is the name of one oracle stage)

The above set of messages repeat for each oracle stage.

Then we get

node_node01: Player 1 terminated unexpectedly.
main_program: APT_PMsectionLeader(1, node01), player 1 - Unexpected exit status 1.
node_node01: Player 25 terminated unexpectedly.
main_program: APT_PMsectionLeader(1, node01), player 25 - Unexpected exit status 1.

followed by similar messages for player 2, player 3....on node 02..node03 etc ...
vishal
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

How many jobs are running at once when these errors show up? And how many processes are running at once?
Choose a job you love, and you will never have to work a day in your life. - Confucius
hsahay
Premium Member
Premium Member
Posts: 175
Joined: Wed Mar 21, 2007 9:35 am

Post by hsahay »

This is the test environment.

Only this job was running.
Please note that other jobs using oracle enterprise stage are working fine. So the problem is with this job in particular. Also, this is a working job from 8.1 version and being run for the first time in 11.3 where it is failing.


ulimit -a inside the job shows

time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) unlimited
memory(kbytes) unlimited
coredump(blocks) unlimited
nofiles(descriptors) 131072
threads(per process) unlimited
processes(per user) unlimited
vishal
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I recommend doing the following steps to simplify your error search:

1. Add $APT_CONFIG_FILE to your job with a 1-node configuration. This will reduce your processes and make debugging much easier.
2. After doing (1) and checking to make sure that the error is still present (and that you still cannot find details in the log), replace the Oracle output stage with a PEEK stage and re-run the job; the error will go away if the root cause is indeed in the Oracle stage.
3. What type of write is being done in the Oracle stage - insert, update, bulk load? Anything different from the other jobs which work?
4. Add a reject link to the stage and see if the job still aborts or just writes lots of rejects.
hsahay
Premium Member
Premium Member
Posts: 175
Joined: Wed Mar 21, 2007 9:35 am

Post by hsahay »

Created a copy of the job with just one oracle stage.
job failed.

Replaced the oracle stage with a sequential file stage.
Job failed.

Removed all stage derivations from the transformer.
job was successful

Went back to the original job that had one oracle stage and Changed
the first stage variable derivation to just -1. So set the variable to -1
Job failed

Added APT_DISABLE_COMBINE=True
Job was successful.

Seems like the problem is with the stage variables in the transformer. But this is a working job in 8.1 and it failing only in 11.3 which leads me to think that there is something wrong with our C compiler and the way the transformer has been compiled.

Now working with IBM techsupport to figure out what went wrong where.
vishal
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What data type is the stage variable whose value you set to -1? If it's a string of some kind, any literal value needs to appear inside quotation marks, for example "-1".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

hsahay wrote:Added APT_DISABLE_COMBINE=True
Job was successful.
I think it's actually APT_DISABLE_COMBINATION but...

:idea: This clue alone rings a bell.

I ran into this same behavior on 11.3 due to using a newer compiler than what was supported. Even the same compiler with newer PTF levels applied caused this problem too.

The compiler must be exactly the version specified on the system requirements page.

See this topic:

viewtopic.php?t=153699&highlight=compiler
Choose a job you love, and you will never have to work a day in your life. - Confucius
hsahay
Premium Member
Premium Member
Posts: 175
Joined: Wed Mar 21, 2007 9:35 am

Post by hsahay »

Thanks QT_KY

That was exactly the problem. We had 13.1 compiler version whereas, apparently, Datastage is certified only with 12.1

Once we downgraded our compiler to 12.1 it all worked fine.

I will mark the thread resolved.
vishal
Post Reply