Can two jobs append to the same sequential file simultaneous

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vivek_rs
Participant
Posts: 37
Joined: Thu Nov 25, 2004 8:44 pm
Location: Bangalore, Karnataka, India

Can two jobs append to the same sequential file simultaneous

Post by vivek_rs »

Hi
I have a couple of jobs (in the range 4-10) appending to one sequential file simultaneously.
As of now, it seems to be working fine.
Are there any adverse effects that I am supposed to look out for during production?
TIA
Regards,
Vivek RS
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It's not working fine... it can't be. Sequential files are by definition single writer type structures and cannot be (successfully) written to simultaneously by multiple processes. This is not a DataStage 'limitation' but the nature of the beast.

You either need to append to the file in a serial fashion, one job at a time or write to multiple files and then combine them after job. The latter is typically what is done in this circumstance.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vivek_rs
Participant
Posts: 37
Joined: Thu Nov 25, 2004 8:44 pm
Location: Bangalore, Karnataka, India

Post by vivek_rs »

Oh My God!
Thank you for that!!!

Can use a DataSet or something like that?
Something like Hash Files in Server canvas???
Regards,
Vivek RS
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Hello Vivek,

the file set is, underneath the covers, just a collection of sequential files, so you will get the same limitations as with sequential files, unless you choose your partitioning algorithm in such a way so that each process (node) that writes to the file set actually writes to it's own file. That is most likely too much work and too error prone for your situation. I would also recommend what Craig suggested - write to different sequential files and, at the end of your data run, concatenate the files together (fast) or merge them on some key (somewhat slower).

If you really wish to have all processes write to the same file then you must use a RDMS file of your choosing; but this will inherently be slower.
memrinal
Participant
Posts: 74
Joined: Wed Nov 24, 2004 9:13 pm

Post by memrinal »

Hi Vivek,
there is a round about way to do this. Put the two jobs in a container with the output going to the file earliar being the output of the containers. Now use these two containers in the same job and use a funnel stage to collect the output. from the funnel you can take the output to a sequential file. this way you wont face any problem.
Since the query was posted on PX forum i assumed you are working in PX

: :D Mrinal
vivek_rs
Participant
Posts: 37
Joined: Thu Nov 25, 2004 8:44 pm
Location: Bangalore, Karnataka, India

Post by vivek_rs »

Hi mrinal,
the thing is the number of jobs running at any given moment of time is not determined. So I cannot put them in a container and collect them.
The separate sequential files is working now.
Anyone has a better idea?
Regards,
Vivek RS
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Write into separate hash files with identical naming conventions - something like - YourSequentialFile : JobThatProduced - and concatenate all these files when you are using it.

If you have same number of processor and sequential files, you can try to reproduce a fileset header and move the data content in the corresponding DataSet directories to simulate a fileset. But this option depends on a variety of factors and not a quick and easy solution.
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Post by vigneshra »

Hi Vivek,
As described in one of the previous post, it is a better idea to keep different names for the files and after all the jobs are executed, just merge all the files using a batch script. That's a neat and simple way of doing it. Any more ways? :roll:
Vignesh.

"A conclusion is simply the place where you got tired of thinking."
Post Reply