ODBC enterprise stage fatal error

Rohit_ben · Post by **Rohit_ben** » Fri May 04, 2012 12:41 am

One of our ETL jobs on DataStage 8.1 failed in Live environment with the following error -

APT_CombinedOperatorController,0: [IBM(DataDirect OEM)][ODBC SQL Server Driver]20157
odbcwrt_EX_POST_RESULTS,0: Failure during execution of operator logic.
...
main_program: APT_PMsectionLeader(1, node1), player 2 - Unexpected exit status 1.
...
orared_EX_POST_RESULTS,0: Fatal Error: Unable to allocate communication resources
...
APT_CombinedOperatorController,1: Fatal Error: waitForWriteSignal(): Premature EOF on node hkeosp08 Cannot allocate memory

The ETL job fetches data from Oracle database and populates a table in SQL Server database. We use the original DataDirect ODBC drivers which came with Information Server 8.1, to connect to SQL server.

The job failed after running for 30 minutes and processing 4 million records. It was re-run once and the job failed again at the same point. We had monitored the CPU utilization during the second run and found out that after processing about 3.7 million records approximately, the used swap memory size suddenly started increasing till the point that almost 100% swap was used. The job has failed at this point. The CPU utilization was between 30% to 60%. Also, what is the significance of the error - [IBM(DataDirect OEM)][ODBC SQL Server Driver]20157 ?

We have a total free scratch space of 35 GB and have a configuration of two nodes. No significant increase in the scratch space was observed during the job runs.

The /tmp space is 12 GB and around 9 GB was available before job execution. This is the same amount of space allocated and available on the UAT server. But the job has run successfully a number of times on the UAT server but has failed on Live server.

Also we have found out that the DSN entry in the .odbc.ini file has an entry as QEWSD=39262.

DSN entry for SQL Server:

[DPServer]
Driver=/opt/IBM/InformationServer/Server/branded_odbc/lib/VMmsss23.so
Description=DataDirect SQL Server Wire Protocol
Database=
Address=
QuotedId=No
AnsiNPW=No
QEWSD=39262
DescribeAtPrepare=0

This job was earlier a server job with insert array size of 1 in ODBC Enterprise stage. This server job was converted to parallel which has failed. We had changed the insert array size to 50 in the new parallel job to boost performance. Also the execution of the ODBC enterprise stage is parallel (which is default).

As a fallback, we have ran the server job again and it has finished successfully.

qt_ky · Post by **qt_ky** » Fri May 04, 2012 6:44 pm

You can safely remove the QEWSD=39262 line. Those get added when testing DSNs from the command line. It will generate a warning about the license.

For the parallel job you may want to check the ulimit settings on your OS as those can limit available memory.

Rohit_ben · Post by **Rohit_ben** » Thu May 31, 2012 6:19 am

Raised the issue to IBM. They have asked us to lower the insert array size. We have increased the swap memory from 2 GB to 24 GB. We will be testing the job once again and monitor.