Page 1 of 1

Commit in Sequential file

Posted: Tue Mar 01, 2005 8:16 pm
by sumitgulati
Hi,

I have a job that loads data into a sequential file. The jobs takes close to 30 minutes to finish.

If I try to view the data in the sequential file from DataStage designer when the job is running I was able to see the records in the file until a few days ago. But I have noticed that from past few days if I try to view the sequential file data when the job processing records into it it says "data source is empty" even if the performance statistics for the link shows a few number of records went into the file. Another interesting thing is if I wait for some more records to be processed into the file and then try to view the data the data shows up.

I think the data does not physically go into the sequential file until the job is either finished or has processed atleast a certain number of records.

Is there any commit size (like transaction size for RDBMS) for sequential files also? If yes then where do we define it?

Thanks and Regards,
-Sumit

Posted: Tue Mar 01, 2005 8:36 pm
by chulett
No, afraid there's no such thing as a 'transaction size' or a commit for sequential files from DataStage that you could be seeing the result of.

What you may be seeing is a product of your disk subsystem, especially with 'enterprise storage' like EMC and the like. Depending on the settings and the amount of cache involved (which can be substantial), you may be right in thinking that it is not being flushed to disk until a certain number of records are involved... or a certain buffer size is reached or the system simply needs the space for something esle. When that happens, your cached / dirty information is flushed to disk and starts to show up when you 'View Data'. That's my take on it, anyways.

Person to talk to would be your SA, whomever is in charge of the physical disk. They should be able to shed more light on the issue.

Posted: Tue Mar 01, 2005 9:27 pm
by kcbland
There is no hope, this is an OS issue with buffered writes. Remember your C days and flushing? Your attempting to view data can have errant results, because data is writing in blocks, not "rows". Rows are a database concept, files are blocks of bytes. Odds are that the last row will be partial because it falls into the block issue. You'll just have to live with it.

Posted: Tue Mar 01, 2005 10:21 pm
by ray.wurlod
There are a lot of clever tricks built in to the Sequential File stage, including read-ahead and write-ahead buffers.

This means that there will be some time before you will see any rows actually appear. It seems to be 1000 rows before any rows are sync'ed.

These buffers can not be accessed or configured.

Posted: Wed Mar 02, 2005 3:32 am
by Sainath.Srinivasan
Try using a dbms or hashfile.

Posted: Wed Mar 02, 2005 7:54 am
by chulett
Better yet, don't worry about it. :wink: :lol: As noted, that's just the way it is.

Posted: Wed Mar 02, 2005 8:22 am
by roy
Hi,
(A shot in the dark)
on top of all this check your file's modified time, if it is not consistant with the job's log time of write to it it might have been truncated since, hence empty when you look at it.

please, post your solution eithere way,

Posted: Wed Mar 02, 2005 11:56 am
by sumitgulati
Thanks to all for your replies. I was expecting it to be an OS side issue.

Thanks again
-Sumit