Can two jobs append to the same sequential file simultaneous

vivek_rs · Post by **vivek_rs** » Thu Feb 24, 2005 12:58 am

Hi
I have a couple of jobs (in the range 4-10) appending to one sequential file simultaneously.
As of now, it seems to be working fine.
Are there any adverse effects that I am supposed to look out for during production?
TIA

chulett · Post by **chulett** » Thu Feb 24, 2005 1:24 am

It's not working fine... it can't be. Sequential files are by definition single writer type structures and cannot be (successfully) written to simultaneously by multiple processes. This is not a DataStage 'limitation' but the nature of the beast.

You either need to append to the file in a serial fashion, one job at a time or write to multiple files and then combine them after job. The latter is typically what is done in this circumstance.

vivek_rs · Post by **vivek_rs** » Thu Feb 24, 2005 1:26 am

Oh My God!
Thank you for that!!!

Can use a DataSet or something like that?
Something like Hash Files in Server canvas???

ArndW · Post by **ArndW** » Thu Feb 24, 2005 2:23 am

Hello Vivek,

the file set is, underneath the covers, just a collection of sequential files, so you will get the same limitations as with sequential files, unless you choose your partitioning algorithm in such a way so that each process (node) that writes to the file set actually writes to it's own file. That is most likely too much work and too error prone for your situation. I would also recommend what Craig suggested - write to different sequential files and, at the end of your data run, concatenate the files together (fast) or merge them on some key (somewhat slower).

If you really wish to have all processes write to the same file then you must use a RDMS file of your choosing; but this will inherently be slower.

memrinal · Post by **memrinal** » Thu Feb 24, 2005 2:28 am

Hi Vivek,
there is a round about way to do this. Put the two jobs in a container with the output going to the file earliar being the output of the containers. Now use these two containers in the same job and use a funnel stage to collect the output. from the funnel you can take the output to a sequential file. this way you wont face any problem.
Since the query was posted on PX forum i assumed you are working in PX

: :D Mrinal

vivek_rs · Post by **vivek_rs** » Thu Feb 24, 2005 2:48 am

Hi mrinal,
the thing is the number of jobs running at any given moment of time is not determined. So I cannot put them in a container and collect them.
The separate sequential files is working now.
Anyone has a better idea?

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Feb 24, 2005 3:00 am

Write into separate hash files with identical naming conventions - something like - YourSequentialFile : JobThatProduced - and concatenate all these files when you are using it.

If you have same number of processor and sequential files, you can try to reproduce a fileset header and move the data content in the corresponding DataSet directories to simulate a fileset. But this option depends on a variety of factors and not a quick and easy solution.

vigneshra · Post by **vigneshra** » Thu Feb 24, 2005 3:51 am

Hi Vivek,
As described in one of the previous post, it is a better idea to keep different names for the files and after all the jobs are executed, just merge all the files using a batch script. That's a neat and simple way of doing it. Any more ways?