Hi
I have a couple of jobs (in the range 4-10) appending to one sequential file simultaneously.
As of now, it seems to be working fine.
Are there any adverse effects that I am supposed to look out for during production?
TIA
Can two jobs append to the same sequential file simultaneous
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 37
- Joined: Thu Nov 25, 2004 8:44 pm
- Location: Bangalore, Karnataka, India
Can two jobs append to the same sequential file simultaneous
Regards,
Vivek RS
Vivek RS
It's not working fine... it can't be. Sequential files are by definition single writer type structures and cannot be (successfully) written to simultaneously by multiple processes. This is not a DataStage 'limitation' but the nature of the beast.
You either need to append to the file in a serial fashion, one job at a time or write to multiple files and then combine them after job. The latter is typically what is done in this circumstance.
You either need to append to the file in a serial fashion, one job at a time or write to multiple files and then combine them after job. The latter is typically what is done in this circumstance.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Hello Vivek,
the file set is, underneath the covers, just a collection of sequential files, so you will get the same limitations as with sequential files, unless you choose your partitioning algorithm in such a way so that each process (node) that writes to the file set actually writes to it's own file. That is most likely too much work and too error prone for your situation. I would also recommend what Craig suggested - write to different sequential files and, at the end of your data run, concatenate the files together (fast) or merge them on some key (somewhat slower).
If you really wish to have all processes write to the same file then you must use a RDMS file of your choosing; but this will inherently be slower.
the file set is, underneath the covers, just a collection of sequential files, so you will get the same limitations as with sequential files, unless you choose your partitioning algorithm in such a way so that each process (node) that writes to the file set actually writes to it's own file. That is most likely too much work and too error prone for your situation. I would also recommend what Craig suggested - write to different sequential files and, at the end of your data run, concatenate the files together (fast) or merge them on some key (somewhat slower).
If you really wish to have all processes write to the same file then you must use a RDMS file of your choosing; but this will inherently be slower.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Hi Vivek,
there is a round about way to do this. Put the two jobs in a container with the output going to the file earliar being the output of the containers. Now use these two containers in the same job and use a funnel stage to collect the output. from the funnel you can take the output to a sequential file. this way you wont face any problem.
Since the query was posted on PX forum i assumed you are working in PX
: :D Mrinal
there is a round about way to do this. Put the two jobs in a container with the output going to the file earliar being the output of the containers. Now use these two containers in the same job and use a funnel stage to collect the output. from the funnel you can take the output to a sequential file. this way you wont face any problem.
Since the query was posted on PX forum i assumed you are working in PX
: :D Mrinal
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Write into separate hash files with identical naming conventions - something like - YourSequentialFile : JobThatProduced - and concatenate all these files when you are using it.
If you have same number of processor and sequential files, you can try to reproduce a fileset header and move the data content in the corresponding DataSet directories to simulate a fileset. But this option depends on a variety of factors and not a quick and easy solution.
If you have same number of processor and sequential files, you can try to reproduce a fileset header and move the data content in the corresponding DataSet directories to simulate a fileset. But this option depends on a variety of factors and not a quick and easy solution.
Hi Vivek,
As described in one of the previous post, it is a better idea to keep different names for the files and after all the jobs are executed, just merge all the files using a batch script. That's a neat and simple way of doing it. Any more ways?
As described in one of the previous post, it is a better idea to keep different names for the files and after all the jobs are executed, just merge all the files using a batch script. That's a neat and simple way of doing it. Any more ways?
Vignesh.
"A conclusion is simply the place where you got tired of thinking."
"A conclusion is simply the place where you got tired of thinking."