sequential files - flush option?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jgreve
Premium Member
Premium Member
Posts: 107
Joined: Mon Sep 25, 2006 4:25 pm

sequential files - flush option?

Post by jgreve »

Is there a way to force a record by record flush for sequential files?
In c, I'd use fflush( seqfile );

I would like a better idea of the output rate - is it steady? bursty?
My job is buffering about 300 records between flushes,
and at 3 to 5 minutes per flush, that seems like a coarse
average.

I suppose get a better rate with a peek-stage and pulling time stamps
from the job log. *shrug* I'm just looking for ideas
("more than 1 way to do it" and all that).
John G.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Buffering is always a tradeoff - by increasing the buffer sizes you also increase throughput but, in the case of failure, you will lose more data. Since DataStage jobs generally are all-or-nothing there is no benefit in force-writing data more frequently. But if you do require more frequent flushing to disk you can play with the APT settings for:

APT_BUFFER_MAXIMUM_MEMORY, APT_BUFFER_MAXIMUM_TIMEOUT, APT_BUFFER_DISK_WRITE_INCREMENT, APT_BUFFERING_POLICY
jgreve
Premium Member
Premium Member
Posts: 107
Joined: Mon Sep 25, 2006 4:25 pm

Post by jgreve »

Ahhh. That helps my perspective.
Thank you!

I was focusing on designer seqfile options, like this:
  • Output->Properties->Source->File=xyz.dat
    Output->Properties->Source->ReadMethod=...
ArndW wrote:APT_BUFFER_MAXIMUM_MEMORY,
APT_BUFFER_MAXIMUM_TIMEOUT,
APT_BUFFER_DISK_WRITE_INCREMENT,
APT_BUFFERING_POLICY
Post Reply