Page 1 of 1

Write to dataset failed: File too large

Posted: Fri Feb 10, 2006 10:31 am
by lakshya
Hi-

One of our jobs is aborting when the dataset size reaches over 2 GB throwing the following error

CpyRecs,0: Write to dataset failed: File too large
The error occurred on Orchestrate node node2 (hostname XXX)

We have got the limits changed to maximum for the userid thro which we are running our jobs

Current settings:
$ ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 4194304
memory(kbytes) 999999999999
coredump(blocks) unlimited
nofiles(descriptors) 2000

The jobs are still aborting throwing the same error.

Did anyone face the same problem? Can you please suggest me if there is any fix for the same.

Please help me on this as we have several job designed in the same way,where we will be having datasets over 2 GB in size.

Thanks in advance

Posted: Fri Feb 10, 2006 10:45 am
by ArndW
I think you need to stop and & restart the DataStage server after making the ulimit change, have you done that?

Posted: Fri Feb 10, 2006 10:50 am
by lakshya
Hi Arndw-

Yes! That has been done after the limits were changed.

Thanks

Posted: Fri Feb 10, 2006 10:54 am
by ArndW
You also need to remove the dataset as well and re-create it.

Posted: Fri Feb 10, 2006 10:58 am
by lakshya
The dataset gets deleted from the nodes as soon as the the job aborts.If it completes,it writes to the processing folder.

Posted: Fri Feb 10, 2006 11:00 am
by ArndW
OK, put a "ulimit -a" external command into your job to make sure that the background process is getting the same limitations; perhaps one of your initialization scripts resets the limit.

Posted: Fri Feb 10, 2006 11:04 am
by gbusson
you also have to kill all the processes owned by the user who runs the jobs.
Otherwise the ulimit tab won't be updated!

Posted: Fri Feb 10, 2006 11:08 am
by lakshya
There are no processes hanging for the userid in question

Posted: Fri Feb 10, 2006 11:30 am
by lakshya
Arndw-

Can you please help me on where/how to add the "ulimit -a" external command into my job to make sure that the background process is getting the same limitations;

Thanks

Posted: Fri Feb 10, 2006 11:39 am
by ArndW
In the job properties you can specify a before-job subroutine and I believe one of the options is ExecSh or something similar to execute a UNIX shell command.

Posted: Fri Feb 10, 2006 12:36 pm
by lakshya
Hi-

I ran the job after adding the "ulimit -a" in a before subroutine am getting the following

XXX..BeforeJob (ExecSH): Executed command: ulimit -a
*** Output from command was: ***
time(seconds) unlimited
file(blocks) 4194303
data(kbytes) 131072
stack(kbytes) 32768
memory(kbytes) 65536
coredump(blocks) 2097151
nofiles(descriptors) 2000

So it is not taking the changed limits into the job and thats why it is aborting.

How can I fix this?

Thanks
Lakshman

Posted: Fri Feb 10, 2006 1:14 pm
by ArndW
Check the dsenv script in your project directory and also the DataStage startup script in /bin for ulimit settings.

Posted: Fri Feb 10, 2006 2:49 pm
by ray.wurlod
You changed your ulimit, but not the one for the ID under which DataStage processes run. That's why Arnd had you check via a before-job subroutine. The dsenv script is executed by all DataStage processes. However, on some UNIXes, only superuser can increase ulimit - you may need to ask your System Administrator to assist.

Posted: Tue Feb 14, 2006 7:10 am
by gbusson
maybe u've not set the impersonation mode!

Check it!
Otherwise, set the ulimit for dsadm.

Posted: Tue Feb 14, 2006 3:21 pm
by lakshya
Hi All-

Thank you very much for your inputs on this issue.

Atlast the jobs are able to create datasets with size more than 2 GB.

Earlier we had changed the ulimit settings to maximum for the ID thro which we run our jobs,but the jobs kept aborting giving the same error.
The jobs were being passed with the default values from the admin ID.

Now we have the administrator ID ulimit values set to maximum and restarted the job after bouncing the server and it worked.The jobs finished successfully for datasets more than 3 million rows in it.

Thanks again