Page 1 of 1

sequential files - flush option?

Posted: Thu Jun 25, 2009 12:03 am
by jgreve
Is there a way to force a record by record flush for sequential files?
In c, I'd use fflush( seqfile );

I would like a better idea of the output rate - is it steady? bursty?
My job is buffering about 300 records between flushes,
and at 3 to 5 minutes per flush, that seems like a coarse
average.

I suppose get a better rate with a peek-stage and pulling time stamps
from the job log. *shrug* I'm just looking for ideas
("more than 1 way to do it" and all that).
John G.

Posted: Thu Jun 25, 2009 12:57 am
by ArndW
Buffering is always a tradeoff - by increasing the buffer sizes you also increase throughput but, in the case of failure, you will lose more data. Since DataStage jobs generally are all-or-nothing there is no benefit in force-writing data more frequently. But if you do require more frequent flushing to disk you can play with the APT settings for:

APT_BUFFER_MAXIMUM_MEMORY, APT_BUFFER_MAXIMUM_TIMEOUT, APT_BUFFER_DISK_WRITE_INCREMENT, APT_BUFFERING_POLICY

Posted: Thu Jun 25, 2009 10:35 am
by jgreve
Ahhh. That helps my perspective.
Thank you!

I was focusing on designer seqfile options, like this:
  • Output->Properties->Source->File=xyz.dat
    Output->Properties->Source->ReadMethod=...
ArndW wrote:APT_BUFFER_MAXIMUM_MEMORY,
APT_BUFFER_MAXIMUM_TIMEOUT,
APT_BUFFER_DISK_WRITE_INCREMENT,
APT_BUFFERING_POLICY