Page 1 of 1

Dataset write Failure

Posted: Fri Feb 29, 2008 9:48 pm
by just4geeks
Getting following fatal errors (in order) while loading data into datasets.

1. SRC_oe_order_header_all,0: Write to dataset on [fd 24] failed (Success) on node node1, hostname lxdscon.beckman.com
2. SRC_oe_order_header_all,0: Orchestrate was unable to write to any of the following files:
3. SRC_oe_order_header_all,0: /dstage1/Server/Datasets/Data_frm_OAGCRD.txt.dsadm.lxdscon.beckman.com.0000.0000.0000.5ba6.c9920787.0000.246beea0
4. SRC_oe_order_header_all,0: Block write failure. Partition: 0
5. SRC_oe_order_header_all,0: Failure during execution of operator logic.
8. SRC_oe_order_header_all,0: Fatal Error: File data set, file "/dstage1/store/Data_frm_OAGCRD.txt".; output of "SRC_oe_order_header_all": DM getOutputRecord error.
9. node_node1: Player 1 terminated unexpectedly.
10.main_program: APT_PMsectionLeader(1, node1), player 1 - Unexpected exit status 1.
11. main_program: Step execution finished with status = FAILED.

I read related previous posts in the forum and Did following research.

1. I was running the job as isadmin.

2. Checked permissions for folder where datasets are saved.
We have set following directories for datasets as well as scratch disk.

resource disk "/dstage1/Server/Datasets"
resource scratchdisk "/dstage1/Server/Scratch"


We have all the permissions on Server where all the datasets are stored. 'store' is another folder where we save output files. It also has full permissions.

drwxrwxrwx 4 root root 4096 Feb 12 14:56 Server
drwxrwxrwx 6 root root 4096 Feb 13 10:18 Projects
drwxrwxrwx 4 root root 4096 Feb 29 19:11 store

drwxrwxrwx 2 root root 4096 Feb 22 15:16 Scratch
drwxrwxrwx 2 root root 4096 Feb 29 19:11 Datasets


3. Checked available space using df comand. We have plenty of space left in '/dstage1' where data is stored.

[isadmin@lxdscon dstage1]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VG00-LogVol00 3.1G 175M 2.8G 6% /
/dev/cciss/c0d0p1 190M 13M 169M 7% /boot
none 3.8G 0 3.8G 0% /dev/shm
/dev/mapper/VG01-LogVol00 29G 13G 15G 47% /dstage1
/dev/mapper/VG00-LogVol01 6.0G 4.1G 1.7G 72% /home
/dev/mapper/VG00-LogVol05 10G 7.6G 1.9G 81% /opt
/dev/mapper/VG00-LogVol02 3.1G 54M 2.9G 2% /tmp
/dev/mapper/VG00-LogVol03 10G 2.5G 7.0G 26% /usr
/dev/mapper/VG00-LogVol04 3.1G 99M 2.9G 4% /var


4. Checked file size limits using ulimit -a command

[isadmin@lxdscon dstage1]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 1024
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 131071
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited


Any thoughts?

Thanks in Advance!

Posted: Fri Feb 29, 2008 11:07 pm
by kumar_s
Are you trying to write a dataset? If so change the extension from .txt to .ds.

Posted: Sat Mar 01, 2008 12:08 am
by just4geeks
kumar_s wrote:Are you trying to write a dataset? If so change the extension from .txt to .ds.
Thanks! but I did try with .ds extension earlier but the results were same . I just tried .txt out of curiosity.

Posted: Sat Mar 01, 2008 12:16 am
by kumar_s
Try writing a empty Dataset file with .ds extension with the current configuration.

Posted: Sat Mar 01, 2008 8:26 am
by just4geeks
kumar_s wrote:Try writing a empty Dataset file with .ds extension with the current configuration.
Did that again with same results:
1. SRC_oe_order_header_all,0: Write to dataset on [fd 23] failed (Success) on node node1, hostname lxdscon.beckman.com
2. SRC_oe_order_header_all,0: Orchestrate was unable to write to any of the following files:
3. SRC_oe_order_header_all,0: /dstage1/Server/Datasets/data_frm_oagcrd1.ds.dsadm.lxdscon.beckman.com.0000.0000.0000.1071.c9925ba0.0000.4bb0671b
4. SRC_oe_order_header_all,0: Block write failure. Partition: 0
5. SRC_oe_order_header_all,0: Failure during execution of operator logic.
6. SRC_oe_order_header_all,0: Fatal Error: File data set, file "/dstage1/store/data_frm_oagcrd1.ds".; output of "SRC_oe_order_header_all": DM getOutputRecord error.
7. node_node1: Player 1 terminated unexpectedly.
8. main_program: APT_PMsectionLeader(1, node1), player 1 - Unexpected exit status 1.
9. main_program: Step execution finished with status = FAILED.

Posted: Sat Mar 01, 2008 9:39 am
by kumar_s
Does your user id have enough previlages??
Try cd $DSHOME/bin
UV
from command prompt.

Else try with dsadm user id.

Posted: Sat Mar 01, 2008 12:44 pm
by just4geeks
kumar_s wrote:Does your user id have enough previlages??
Try cd $DSHOME/bin
UV
from command prompt.

Else try with dsadm user id.
As you can see from first post, I have all the privileges. I tried with dsadm as well as isadmin user ids. Please let me know if anyelse can be looked at. I can provide you other details as well.
where to try $dshome/bin command? I execute this command to invoke dssh for seeing/clearing locks etc. What do I do after invoking the cd $DSHOME/bin and UV.

Posted: Fri Apr 16, 2010 12:25 am
by Ananda
I faced same issue. Resolution was to delete datasets and create some space. Job then ran fine.

tCopy,0: Write to dataset on [fd 8] failed (Success) on node node1, hostname mphewddes001
tCopy,0: Orchestrate was unable to write to any of the following files:
tCopy,0: /node1/res/DS_C_MP_MSTR_PFL.ds.dsadm.dstaged1.hew.us.ml.com.0000.0000.0000.ffe.cd913cf2.0000.e4187cb3
tCopy,0: Block write failure. Partition: 0
tCopy,0: Failure during execution of operator logic.
tCopy,0: Fatal Error: File data set, file "/cedp_data/cedpor/mstr_pfl/datasets/DS_C_MP_MSTR_PFL.ds".; output of "APT_TransformOperatorImplV0S9_ext_C_MP_MSTR_PFL_tCopy in tCopy": DM getOutputRecord error.

Posted: Fri Apr 16, 2010 12:44 am
by ray.wurlod
Space: the final frontier.

:lol:

Posted: Fri Apr 16, 2010 1:25 am
by sureshreddy2009
I have one option for solution
Are you using same dataset name in two different jobs.

And already you loaded into one sequential file whose name as exaclty like what you mentioned in dataset like name.ds and you are giving same name in dataset now
Thanks

Posted: Fri Apr 16, 2010 1:58 am
by ArndW
Run the job to reproduce the error. Immediately go to UNIX and do an "ls -al" on your dataset descriptor file ("/dstage1/store/Data_frm_OAGCRD.txt") and then do a 'orchadmin ll /dstage1/store/Data_frm_OAGCRD.txt' as well. Does the data file in the error message, the one in /node1 and ending with a unique hex identifier, actually exist in the datasets directory? How big is it?