Dataset write Failure
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 644
- Joined: Sat Aug 26, 2006 3:59 pm
- Location: Mclean, VA
Dataset write Failure
Getting following fatal errors (in order) while loading data into datasets.
1. SRC_oe_order_header_all,0: Write to dataset on [fd 24] failed (Success) on node node1, hostname lxdscon.beckman.com
2. SRC_oe_order_header_all,0: Orchestrate was unable to write to any of the following files:
3. SRC_oe_order_header_all,0: /dstage1/Server/Datasets/Data_frm_OAGCRD.txt.dsadm.lxdscon.beckman.com.0000.0000.0000.5ba6.c9920787.0000.246beea0
4. SRC_oe_order_header_all,0: Block write failure. Partition: 0
5. SRC_oe_order_header_all,0: Failure during execution of operator logic.
8. SRC_oe_order_header_all,0: Fatal Error: File data set, file "/dstage1/store/Data_frm_OAGCRD.txt".; output of "SRC_oe_order_header_all": DM getOutputRecord error.
9. node_node1: Player 1 terminated unexpectedly.
10.main_program: APT_PMsectionLeader(1, node1), player 1 - Unexpected exit status 1.
11. main_program: Step execution finished with status = FAILED.
I read related previous posts in the forum and Did following research.
1. I was running the job as isadmin.
2. Checked permissions for folder where datasets are saved.
We have set following directories for datasets as well as scratch disk.
resource disk "/dstage1/Server/Datasets"
resource scratchdisk "/dstage1/Server/Scratch"
We have all the permissions on Server where all the datasets are stored. 'store' is another folder where we save output files. It also has full permissions.
drwxrwxrwx 4 root root 4096 Feb 12 14:56 Server
drwxrwxrwx 6 root root 4096 Feb 13 10:18 Projects
drwxrwxrwx 4 root root 4096 Feb 29 19:11 store
drwxrwxrwx 2 root root 4096 Feb 22 15:16 Scratch
drwxrwxrwx 2 root root 4096 Feb 29 19:11 Datasets
3. Checked available space using df comand. We have plenty of space left in '/dstage1' where data is stored.
[isadmin@lxdscon dstage1]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VG00-LogVol00 3.1G 175M 2.8G 6% /
/dev/cciss/c0d0p1 190M 13M 169M 7% /boot
none 3.8G 0 3.8G 0% /dev/shm
/dev/mapper/VG01-LogVol00 29G 13G 15G 47% /dstage1
/dev/mapper/VG00-LogVol01 6.0G 4.1G 1.7G 72% /home
/dev/mapper/VG00-LogVol05 10G 7.6G 1.9G 81% /opt
/dev/mapper/VG00-LogVol02 3.1G 54M 2.9G 2% /tmp
/dev/mapper/VG00-LogVol03 10G 2.5G 7.0G 26% /usr
/dev/mapper/VG00-LogVol04 3.1G 99M 2.9G 4% /var
4. Checked file size limits using ulimit -a command
[isadmin@lxdscon dstage1]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 1024
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 131071
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Any thoughts?
Thanks in Advance!
1. SRC_oe_order_header_all,0: Write to dataset on [fd 24] failed (Success) on node node1, hostname lxdscon.beckman.com
2. SRC_oe_order_header_all,0: Orchestrate was unable to write to any of the following files:
3. SRC_oe_order_header_all,0: /dstage1/Server/Datasets/Data_frm_OAGCRD.txt.dsadm.lxdscon.beckman.com.0000.0000.0000.5ba6.c9920787.0000.246beea0
4. SRC_oe_order_header_all,0: Block write failure. Partition: 0
5. SRC_oe_order_header_all,0: Failure during execution of operator logic.
8. SRC_oe_order_header_all,0: Fatal Error: File data set, file "/dstage1/store/Data_frm_OAGCRD.txt".; output of "SRC_oe_order_header_all": DM getOutputRecord error.
9. node_node1: Player 1 terminated unexpectedly.
10.main_program: APT_PMsectionLeader(1, node1), player 1 - Unexpected exit status 1.
11. main_program: Step execution finished with status = FAILED.
I read related previous posts in the forum and Did following research.
1. I was running the job as isadmin.
2. Checked permissions for folder where datasets are saved.
We have set following directories for datasets as well as scratch disk.
resource disk "/dstage1/Server/Datasets"
resource scratchdisk "/dstage1/Server/Scratch"
We have all the permissions on Server where all the datasets are stored. 'store' is another folder where we save output files. It also has full permissions.
drwxrwxrwx 4 root root 4096 Feb 12 14:56 Server
drwxrwxrwx 6 root root 4096 Feb 13 10:18 Projects
drwxrwxrwx 4 root root 4096 Feb 29 19:11 store
drwxrwxrwx 2 root root 4096 Feb 22 15:16 Scratch
drwxrwxrwx 2 root root 4096 Feb 29 19:11 Datasets
3. Checked available space using df comand. We have plenty of space left in '/dstage1' where data is stored.
[isadmin@lxdscon dstage1]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VG00-LogVol00 3.1G 175M 2.8G 6% /
/dev/cciss/c0d0p1 190M 13M 169M 7% /boot
none 3.8G 0 3.8G 0% /dev/shm
/dev/mapper/VG01-LogVol00 29G 13G 15G 47% /dstage1
/dev/mapper/VG00-LogVol01 6.0G 4.1G 1.7G 72% /home
/dev/mapper/VG00-LogVol05 10G 7.6G 1.9G 81% /opt
/dev/mapper/VG00-LogVol02 3.1G 54M 2.9G 2% /tmp
/dev/mapper/VG00-LogVol03 10G 2.5G 7.0G 26% /usr
/dev/mapper/VG00-LogVol04 3.1G 99M 2.9G 4% /var
4. Checked file size limits using ulimit -a command
[isadmin@lxdscon dstage1]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 1024
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 131071
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Any thoughts?
Thanks in Advance!
Attitude is everything....
-
- Premium Member
- Posts: 644
- Joined: Sat Aug 26, 2006 3:59 pm
- Location: Mclean, VA
-
- Premium Member
- Posts: 644
- Joined: Sat Aug 26, 2006 3:59 pm
- Location: Mclean, VA
Did that again with same results:kumar_s wrote:Try writing a empty Dataset file with .ds extension with the current configuration.
1. SRC_oe_order_header_all,0: Write to dataset on [fd 23] failed (Success) on node node1, hostname lxdscon.beckman.com
2. SRC_oe_order_header_all,0: Orchestrate was unable to write to any of the following files:
3. SRC_oe_order_header_all,0: /dstage1/Server/Datasets/data_frm_oagcrd1.ds.dsadm.lxdscon.beckman.com.0000.0000.0000.1071.c9925ba0.0000.4bb0671b
4. SRC_oe_order_header_all,0: Block write failure. Partition: 0
5. SRC_oe_order_header_all,0: Failure during execution of operator logic.
6. SRC_oe_order_header_all,0: Fatal Error: File data set, file "/dstage1/store/data_frm_oagcrd1.ds".; output of "SRC_oe_order_header_all": DM getOutputRecord error.
7. node_node1: Player 1 terminated unexpectedly.
8. main_program: APT_PMsectionLeader(1, node1), player 1 - Unexpected exit status 1.
9. main_program: Step execution finished with status = FAILED.
Attitude is everything....
-
- Premium Member
- Posts: 644
- Joined: Sat Aug 26, 2006 3:59 pm
- Location: Mclean, VA
As you can see from first post, I have all the privileges. I tried with dsadm as well as isadmin user ids. Please let me know if anyelse can be looked at. I can provide you other details as well.kumar_s wrote:Does your user id have enough previlages??
Try cd $DSHOME/bin
UV
from command prompt.
Else try with dsadm user id.
where to try $dshome/bin command? I execute this command to invoke dssh for seeing/clearing locks etc. What do I do after invoking the cd $DSHOME/bin and UV.
Attitude is everything....
I faced same issue. Resolution was to delete datasets and create some space. Job then ran fine.
tCopy,0: Write to dataset on [fd 8] failed (Success) on node node1, hostname mphewddes001
tCopy,0: Orchestrate was unable to write to any of the following files:
tCopy,0: /node1/res/DS_C_MP_MSTR_PFL.ds.dsadm.dstaged1.hew.us.ml.com.0000.0000.0000.ffe.cd913cf2.0000.e4187cb3
tCopy,0: Block write failure. Partition: 0
tCopy,0: Failure during execution of operator logic.
tCopy,0: Fatal Error: File data set, file "/cedp_data/cedpor/mstr_pfl/datasets/DS_C_MP_MSTR_PFL.ds".; output of "APT_TransformOperatorImplV0S9_ext_C_MP_MSTR_PFL_tCopy in tCopy": DM getOutputRecord error.
tCopy,0: Write to dataset on [fd 8] failed (Success) on node node1, hostname mphewddes001
tCopy,0: Orchestrate was unable to write to any of the following files:
tCopy,0: /node1/res/DS_C_MP_MSTR_PFL.ds.dsadm.dstaged1.hew.us.ml.com.0000.0000.0000.ffe.cd913cf2.0000.e4187cb3
tCopy,0: Block write failure. Partition: 0
tCopy,0: Failure during execution of operator logic.
tCopy,0: Fatal Error: File data set, file "/cedp_data/cedpor/mstr_pfl/datasets/DS_C_MP_MSTR_PFL.ds".; output of "APT_TransformOperatorImplV0S9_ext_C_MP_MSTR_PFL_tCopy in tCopy": DM getOutputRecord error.
If you don't fail now and again, it's a sign you're playing it safe.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 62
- Joined: Sat Mar 07, 2009 4:59 am
- Location: Chicago
- Contact:
I have one option for solution
Are you using same dataset name in two different jobs.
And already you loaded into one sequential file whose name as exaclty like what you mentioned in dataset like name.ds and you are giving same name in dataset now
Thanks
Are you using same dataset name in two different jobs.
And already you loaded into one sequential file whose name as exaclty like what you mentioned in dataset like name.ds and you are giving same name in dataset now
Thanks
Suresh Reddy
ETL Developer
Research Operations
"its important to know in which direction we are moving rather than where we are"
ETL Developer
Research Operations
"its important to know in which direction we are moving rather than where we are"
Run the job to reproduce the error. Immediately go to UNIX and do an "ls -al" on your dataset descriptor file ("/dstage1/store/Data_frm_OAGCRD.txt") and then do a 'orchadmin ll /dstage1/store/Data_frm_OAGCRD.txt' as well. Does the data file in the error message, the one in /node1 and ending with a unique hex identifier, actually exist in the datasets directory? How big is it?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>