Write to dataset failed: File too large

lakshya · Post by **lakshya** » Fri Feb 10, 2006 10:31 am

Hi-

One of our jobs is aborting when the dataset size reaches over 2 GB throwing the following error

CpyRecs,0: Write to dataset failed: File too large
The error occurred on Orchestrate node node2 (hostname XXX)

We have got the limits changed to maximum for the userid thro which we are running our jobs

Current settings:
$ ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 4194304
memory(kbytes) 999999999999
coredump(blocks) unlimited
nofiles(descriptors) 2000

The jobs are still aborting throwing the same error.

Did anyone face the same problem? Can you please suggest me if there is any fix for the same.

Please help me on this as we have several job designed in the same way,where we will be having datasets over 2 GB in size.

Thanks in advance

ArndW · Post by **ArndW** » Fri Feb 10, 2006 10:45 am

I think you need to stop and & restart the DataStage server after making the ulimit change, have you done that?

lakshya · Post by **lakshya** » Fri Feb 10, 2006 10:50 am

Hi Arndw-

Yes! That has been done after the limits were changed.

Thanks

ArndW · Post by **ArndW** » Fri Feb 10, 2006 10:54 am

You also need to remove the dataset as well and re-create it.

lakshya · Post by **lakshya** » Fri Feb 10, 2006 10:58 am

The dataset gets deleted from the nodes as soon as the the job aborts.If it completes,it writes to the processing folder.

ArndW · Post by **ArndW** » Fri Feb 10, 2006 11:00 am

OK, put a "ulimit -a" external command into your job to make sure that the background process is getting the same limitations; perhaps one of your initialization scripts resets the limit.

gbusson · Post by **gbusson** » Fri Feb 10, 2006 11:04 am

you also have to kill all the processes owned by the user who runs the jobs.
Otherwise the ulimit tab won't be updated!

lakshya · Post by **lakshya** » Fri Feb 10, 2006 11:08 am

There are no processes hanging for the userid in question

lakshya · Post by **lakshya** » Fri Feb 10, 2006 11:30 am

Arndw-

Can you please help me on where/how to add the "ulimit -a" external command into my job to make sure that the background process is getting the same limitations;

Thanks

ArndW · Post by **ArndW** » Fri Feb 10, 2006 11:39 am

In the job properties you can specify a before-job subroutine and I believe one of the options is ExecSh or something similar to execute a UNIX shell command.

lakshya · Post by **lakshya** » Fri Feb 10, 2006 12:36 pm

Hi-

I ran the job after adding the "ulimit -a" in a before subroutine am getting the following

XXX..BeforeJob (ExecSH): Executed command: ulimit -a
*** Output from command was: ***
time(seconds) unlimited
file(blocks) 4194303
data(kbytes) 131072
stack(kbytes) 32768
memory(kbytes) 65536
coredump(blocks) 2097151
nofiles(descriptors) 2000

So it is not taking the changed limits into the job and thats why it is aborting.

How can I fix this?

Thanks
Lakshman

ArndW · Post by **ArndW** » Fri Feb 10, 2006 1:14 pm

Check the dsenv script in your project directory and also the DataStage startup script in /bin for ulimit settings.

ray.wurlod · Post by **ray.wurlod** » Fri Feb 10, 2006 2:49 pm

You changed your ulimit, but not the one for the ID under which DataStage processes run. That's why Arnd had you check via a before-job subroutine. The dsenv script is executed by all DataStage processes. However, on some UNIXes, only superuser can increase ulimit - you may need to ask your System Administrator to assist.

gbusson · Post by **gbusson** » Tue Feb 14, 2006 7:10 am

maybe u've not set the impersonation mode!

Check it!
Otherwise, set the ulimit for dsadm.

lakshya · Post by **lakshya** » Tue Feb 14, 2006 3:21 pm

Hi All-

Thank you very much for your inputs on this issue.

Atlast the jobs are able to create datasets with size more than 2 GB.

Earlier we had changed the ulimit settings to maximum for the ID thro which we run our jobs,but the jobs kept aborting giving the same error.
The jobs were being passed with the default values from the admin ID.

Now we have the administrator ID ulimit values set to maximum and restarted the job after bouncing the server and it worked.The jobs finished successfully for datasets more than 3 million rows in it.

Thanks again