Oracle Connection Dead

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ashik_punar
Premium Member
Premium Member
Posts: 71
Joined: Mon Nov 13, 2006 12:40 am

Oracle Connection Dead

Post by ashik_punar »

Hi Everyone,

I have one job in which i am reading the data from a Sybase table and after doing the transformations on the data i am putting them in an Oracle table. Now the requirement is like this that i need to read all the Sybase table records before processing them, this is because the processing of each record depends on the last record that was processed. I mean suppose i have 2 records A & B, now the processing of record B depends on the outcome of record A's processing.

In the Sybase table i have nearly 10 million records and i need to read all these records before processing any. For source Sybase table i am using the ODBC stage and it is taking nearly 2 hours to read all these records and by the time i read & process the records, the target Oracle table's connection becomes dead and the job aborts. So the job runs for 2 hours and then aborts.For Oracle target i am using an ODBC stage.


Teh log entries for the job are as follows:

-------------------------------------------------------------------------------
Occurred: 2:33:56 AM On date: 12/11/2006 Type: Control
Event: Starting Job Build_FO_Load_job. (...)

Occurred: 2:33:57 AM On date: 12/11/2006 Type: Info
Event: Environment variable settings: (...)

Occurred: 2:33:57 AM On date: 12/11/2006 Type: Info
Event: Parallel job initiated (...)

Occurred: 2:33:57 AM On date: 12/11/2006 Type: Info
Event: main_program: Ascential DataStage(tm) Enterprise Edition 7.5.1A (...)

Occurred: 2:33:57 AM On date: 12/11/2006 Type: Info
Event: main_program: The open files limit is 2000; raising to 2147483647.

Occurred: 2:33:57 AM On date: 12/11/2006 Type: Info
Event: main_program: orchgeneral: loaded (...)

Occurred: 2:33:58 AM On date: 12/11/2006 Type: Info
Event: main_program: APT configuration file: /opt/Ascential/DataStage/Configurations/default.apt (...)

Occurred: 2:34:05 AM On date: 12/11/2006 Type: Warning
Event: footnotesttext_od: When checking operator: When binding output interface field "company_cd" to field "company_cd": Converting a nullable source to a non-nullable result; (...)

Occurred: 2:34:05 AM On date: 12/11/2006 Type: Warning
Event: ProcessError_FOData_xf: When checking operator: When binding input interface field "sequence_nr" to field "sequence_nr": Implicit conversion from source type "string" to result type "string[6]": Possi (...)

Occurred: 5:14:40 AM On date: 12/11/2006 Type: Warning
Event: ProcessError_FOData_xf: When checking operator: When binding input interface field "data_desc" to field "data_desc": Implicit conversion from source type "string" to result type "string[max=1024]": Po (...)

Occurred: 5:14:40 AM On date: 12/11/2006 Type: Fatal
Event: APT_CombinedOperatorController(1),0: [DataDirect][ODBC Oracle Wire Protocol driver][Oracle]Connection Dead.

Occurred: 5:14:40 AM On date: 12/11/2006 Type: Fatal
Event: BuildRTTagcount_od,0: Failure during execution of operator logic.

Occurred: 5:14:40 AM On date: 12/11/2006 Type: Info
Event: BuildRTTagcount_od,0: Input 0 consumed 1 records.

Occurred: 5:14:40 AM On date: 12/11/2006 Type: Fatal
Event: APT_CombinedOperatorController(1),0: Fatal Error: {0}

Occurred: 5:14:40 AM On date: 12/11/2006 Type: Fatal
Event: node_node1: Player 12 terminated unexpectedly.

Occurred: 5:14:45 AM On date: 12/11/2006 Type: Fatal
Event: main_program: Unexpected exit status 1 (...)

Occurred: 5:14:45 AM On date: 12/11/2006 Type: Fatal
Event: main_program: Step execution finished with status = FAILED.

Occurred: 5:14:45 AM On date: 12/11/2006 Type: Info
Event: main_program: Startup time, 0:19; production run time, 2:40:28.

Occurred: 5:14:45 AM On date: 12/11/2006 Type: Control
Event: Job Build_FO_Load_job aborted.

Occurred: 5:14:46 AM On date: 12/11/2006 Type: RunJob
Event: (Build_Main_Seq) <- Build_FO_Load_job: Job under control finished.

End of report.

------------------------------------------------------------------------------


If anyone is having any information about this problem please help me out in this.

Thanks a lot for all the help you have been providing.

Thanks in advance,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Oracle Connection Dead

Post by chulett »

ashik_punar wrote:Occurred: 5:14:40 AM On date: 12/11/2006 Type: Fatal
Event: APT_CombinedOperatorController(1),0: [DataDirect][ODBC Oracle Wire Protocol driver][Oracle]Connection Dead.
I would guess you may be running afoul of an idle timeout in the Oracle database. In other words, you connected to it and then didn't write anything to it until two hours later. Somewhere in that two hour window it closed the connection.

Your DBA should be able to determine if that is the case.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Otherwise, perform your Extraction and Transformation into a staging area (Data Set). Load Oracle from that (in a separate job) when it's all ready.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ashik_punar
Premium Member
Premium Member
Posts: 71
Joined: Mon Nov 13, 2006 12:40 am

Post by ashik_punar »

Hi Ray,

As advised by you i tried to do the same loading my data in to the datasets and then moving the same to the oracle table. but while moving the data to the datasets i am getting a Heap Size error.I have searched this forum for the solution of same and i came to know that my data and stack must have been set as 'unlimited'. Which is already set as the same.I am providing the output of the ulimit command:

time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) unlimited
memory(kbytes) 32768
coredump(blocks) 2097151
nofiles(descriptors) 2000


I am also posting the log of the job in which i got the heap size error


Occurred: 9:59:10 AM On date: 12/11/2006 Type: Info
Event: Join_seqnum_jn,0: The current soft limit on the data segment (heap) size (2147483645) is less than the hard limit (2147483647), consider increasing the heap size limit

Occurred: 9:59:10 AM On date: 12/11/2006 Type: Info
Event: Join_seqnum_jn,0: Current heap size: 555,339,600 bytes in 7,071 blocks

Occurred: 9:59:10 AM On date: 12/11/2006 Type: Fatal
Event: Join_seqnum_jn,0: Failure during execution of operator logic.

Occurred: 9:59:10 AM On date: 12/11/2006 Type: Info
Event: Join_seqnum_jn,0: Input 0 consumed 1 records. (...)

Occurred: 9:59:10 AM On date: 12/11/2006 Type: Info
Event: Join_seqnum_jn,0: Output 0 produced 0 records.

Occurred: 9:59:10 AM On date: 12/11/2006 Type: Fatal
Event: Join_seqnum_jn,0: Fatal Error: Throwing exception: APT_BadAlloc: Heap allocation failed.

Occurred: 9:59:11 AM On date: 12/11/2006 Type: Fatal
Event: node_node1: Player 6 terminated unexpectedly.

Occurred: 9:59:11 AM On date: 12/11/2006 Type: Fatal
Event: main_program: Unexpected exit status 1

Occurred: 9:59:11 AM On date: 12/11/2006 Type: Fatal
Event: buffer(1),0: Error in writeBlock - could not write 32

Occurred: 9:59:11 AM On date: 12/11/2006 Type: Fatal
Event: buffer(1),0: Failure during execution of operator logic.

Occurred: 9:59:11 AM On date: 12/11/2006 Type: Info
Event: buffer(1),0: Input 0 consumed 0 records.

Occurred: 9:59:11 AM On date: 12/11/2006 Type: Info
Event: buffer(1),0: Output 0 produced 0 records.

Occurred: 9:59:11 AM On date: 12/11/2006 Type: Fatal
Event: buffer(1),0: Fatal Error: APT_BufferOperator::writeAllData() write failed. This is probably due to a downstream operator failure.

Occurred: 9:59:11 AM On date: 12/11/2006 Type: Fatal
Event: node_node1: Player 9 terminated unexpectedly.

Occurred: 9:59:18 AM On date: 12/11/2006 Type: Fatal
Event: main_program: Unexpected exit status 1 (...)

Occurred: 9:59:18 AM On date: 12/11/2006 Type: Fatal
Event: main_program: Step execution finished with status = FAILED.

Occurred: 9:59:18 AM On date: 12/11/2006 Type: Info
Event: main_program: Startup time, 0:19; production run time, 22:34.

Occurred: 9:59:18 AM On date: 12/11/2006 Type: Control
Event: Job CopyOfBuild_FO_Load_job aborted.

End of report.


Please provide some inputs on this issue.If i need to set anythign else as unlimited then please guide me on the same also.

Thanks a lot for all the help that you have been providing.

Thanks in advance,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are you running out of disk space on your resource disk (where the Data Set data files will be placed)?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ashik_punar
Premium Member
Premium Member
Posts: 71
Joined: Mon Nov 13, 2006 12:40 am

Post by ashik_punar »

No there is nothing like that.I have checked the same.
johnthomas
Participant
Posts: 56
Joined: Mon Oct 16, 2006 7:32 am

Post by johnthomas »

since heap memory allocation is from physical memory , try to increase available physical memory . can you try using lookup stage
in place of join stage
JT
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Have you "checked the same" while the job is running?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ashik_punar
Premium Member
Premium Member
Posts: 71
Joined: Mon Nov 13, 2006 12:40 am

Post by ashik_punar »

Hi Ray,

I have checked the disk space while runnning the job, there is no problem with the same. But what i feel is that the ulimit was set to unlimited day before yesterday and i got the server restarted yesterday so that it gets reflected. When i am issuing the command from unix prompt with my UserId it is giving me the values as unlimited. But when i am putting the 'ulimit -a' command in before job ExecSH of a dummy job and executing the same using my UserId, i am getting the following values:

Build_Pricing_Dummy_Monthly_Job..BeforeJob (ExecSH): Executed command: ulimit -a
*** Output from command was: ***
time(seconds) unlimited
file(blocks) 2097151
data(kbytes) 131072
stack(kbytes) 32768
memory(kbytes) 32768
coredump(blocks) 2097151
nofiles(descriptors) 2000

This is the situation after the restart of the server. Can you please guide me why this is so? And can be done in order to get the new values reflected?

Thanks a lot for your continual help,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Take a look in the job log at the environment variables to determine the user under whose ID the job is actually executed. It is this user (reported as $USER) whose ulimit needs to be raised.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ashik_punar
Premium Member
Premium Member
Posts: 71
Joined: Mon Nov 13, 2006 12:40 am

Post by ashik_punar »

Hi Ray,
I have checked the log of the job. I am running my job with a UserId 'sidhartb' but in the job log i am getting LOGIN as 'dsadm' FOr my id the ulimit has been set as unlimited but for the dsadm id i think its not the same.I am posting the environment variables from the job log. Please have a look at that, after looking at the log i think for the dsadm user we need to set the values as unlimited.Please correct me if i am wrong on some pointThe log goes like this:

Environment variable settings:
_=/usr/bin/nohup
LANG=en_GB
LOGIN=dsadm
INFORMIXC=CC
APT_ORCHHOME=/opt/Ascential/DataStage/PXEngine
PATH=/data/Ascential/projects/ROCKVILLE/RT_BP1418.O:/opt/Ascential/DataStage/DSCAPIOp:/opt/Ascential/DataStage/RTIOperators:/opt/Ascential/DataStage/DSParallel:/data/Ascential/projects/ROCKVILLE/wrapped:/data/Ascential/projects/ROCKVILLE/buildop:/opt/Ascential/DataStage/PXEngine/user_osh_wrappers:/opt/Ascential/DataStage/PXEngine/osh_wrappers:/opt/Ascential/DataStage/PXEngine/bin:/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:/usr/java14/jre/bin:/usr/java14/bin:/opt/informix/product/csdk32bit/bin:/usr:/usr/vacpp:/usr/vacpp/bin
THREADLIB=POSIX
LC__FASTMSG=true
LOCPATH=/usr/lib/nls/loc

DSHOME=/opt/Ascential/DataStage/DSEngine
ODMDIR=/etc/objrepos

ODBCINI=/opt/Ascential/DataStage/DSEngine/.odbc.ini
HOME=/
INFORMIXDIR=/opt/informix/product/csdk32bit
ONCONFIG=onconfig.thomson
PWD=/opt/Ascential/DataStage/DSEngine
TZ=GMT0BST,M3.5.0/01:00:00,M10.5.0/02:00:00
INFORMIXSERVER=thomson_tcp
UDTHOME=/opt/Ascential/DataStage/ud41
UDTBIN=/opt/Ascential/DataStage/ud41/bin
LOGNAME=sidhartb
DS_USERNO=-16656
WHO=ROCKVILLE
TERM=
BELL=^G
FLAVOR=-1
TFMFBuild=Build
TFMFCurrent=Current
TFMFD34ACTDBDsnName=D34ACTDB_DEV
TFMFD34ACTDBDsnNameTemp=d34actdb
TFMFD34ACTDBPassword=temp_update
TFMFD34ACTDBPasswordTemp=newpass1
TFMFD34ACTDBUserName=d34act_update
TFMFD34ACTIndexDBDsnName=D34ACTINDEX_P3
TFMFD34ACTIndexDBPassword=timefeb28
TFMFD34ACTIndexDBUserName=tempuser1
TFMFDMSDsnName=DMS_P3
TFMFDMSPassword=timefeb28
TFMFDMSUserName=tempuser1
TFMFDelta=Delta
TFMFDistribute=Distribute
TFMFFrequency=WEEKLY
TFMFHome=/data/rockville
TFMFInput=Input
TFMFKPMGEmail=sethu.manickam@contractor.thomson.com
TFMFMigrationSECDsnName=secqa
TFMFMigrationSECPassword=secqa
TFMFMigrationSECUserName=secqa
TFMFNameChange=NameChange
TFMFOPID='DS_ADMIN'
TFMFOutput=Output
TFMFProfileTagYrNum=0
TFMFRefer=Refer
TFMFReportRecepientEmail=sethu.manickam@contractor.thomson.com
TFMFReportSenderEmail=joanne.duchez@thomson.com
TFMFSECDsnName=sec
TFMFSECPassword=welcome
TFMFSECUserName=hcl
TFMFSMTPAddress=smtp.datastream.com
TFMFShell=Shell
TFMFSourceFilePath=/data/rockville/Migration/
TFMFStage=Stage
TFMFSybase=Sybase
TFMFSybaseCommitSize=500
TFMFSybaseRepD34Text=ENABLE
TFMFTemp=Temp
TFMFTempTablesPrefix=TEMPDB.GUEST
TFMFVOYHome=/data/rockville/Voyager
APT_CONFIG_FILE=/opt/Ascential/DataStage/Configurations/default.apt
APT_MONITOR_MINTIME=10
APT_STRING_PADCHAR=0x20
DS_OPERATOR_BUILDOP_DIR=buildop
DS_OPERATOR_WRAPPED_DIR=wrapped
DSIPC_OPEN_TIMEOUT=30
DS_ENABLE_RESERVED_CHAR_CONVERT=0
DS_TDM_PIPE_OPEN_TIMEOUT=720
DS_TDM_TRACE_SUBROUTINE_CALLS=0
APT_COMPILEOPT=-O -c -qspill=32704
APT_COMPILER=/usr/vacpp/bin/xlC_r
APT_LINKER=/usr/vacpp/bin/xlC_r
APT_LINKOPT=-G
APT_OPERATOR_REGISTRY_PATH=/data/Ascential/projects/ROCKVILLE/buildop
LIBPATH=/data/Ascential/projects/ROCKVILLE/RT_BP1418.O:/opt/Ascential/DataStage/DSCAPIOp:/opt/Ascential/DataStage/RTIOperators:/opt/Ascential/DataStage/DSParallel:/opt/Ascential/DataStage/PXEngine/user_lib:/opt/Ascential/DataStage/PXEngine/lib:/data/Ascential/projects/ROCKVILLE/buildop:/opt/Ascential/DataStage/branded_odbc/lib:/opt/Ascential/DataStage/DSEngine/lib:/opt/Ascential/DataStage/DSEngine/uvdlls:/opt/Ascential/DataStage/DSEngine/java/jre/bin/classic:/opt/Ascential/DataStage/DSEngine/java/jre/bin:/opt/Ascential/DataStage/DSEngine/lib:/opt/informix/product/csdk32bit/lib:/opt/informix/product/csdk32bit/lib/esql:/opt/informix/product/csdk32bit/lib/cli:/usr/lib:/lib
NLSPATH=/usr/lib/nls/msg/%L/%N:/usr/lib/nls/msg/en_US/%N:/usr/lib/nls/msg/%L/%N.cat:/usr/lib/nls/msg/en_US/%N.cat
OSH_STDOUT_MSG=1
APT_ERROR_CONFIGURATION=severity, !vseverity, !jobid, moduleid, errorIndex, timestamp, !ipaddr, !nodeplayer, !nodename, opid, message

--------------------------------------------------------------------------------------

Also please guide me that when i am running the job with my UserId, how come the user id of the dsadm is shown there as running the process.

Thanks a lot for all your help,
durgaps
Participant
Posts: 74
Joined: Sat Jul 08, 2006 4:09 am
Location: Melbourne, Australia
Contact:

Post by durgaps »

Is this correct?

APT_CONFIG_FILE=/opt/Ascential/DataStage/Configurations/default.apt

Are you using this .apt file in your job. Cross-check this entry in your Job parameters window.
Durga Prasad
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

If you double click any log event, does the user id shows as dsadm?
Have you tried after the server restart?
Try restricing the number rows for testing purpose.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply