Hi All,
I have a conceptual doubt.
When data flows from one stage (process) to another stage where it stores the intermediate data i.e after the 1st stage finishes reading and before 2nd stage starts reading.
Does it use any temporary data sets?
Also when buffering comes in to picture.
Thanks in Advance.
where data is stored between stages when the job is running
Moderators: chulett, rschirm, roy
In virtual datasets and in buffers (files) both in memory and in dataset/scratch space.
viewtopic.php?t=118718&highlight=virtual+dataset
viewtopic.php?t=118718&highlight=virtual+dataset
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
Realize that the vast majority of the time there's no... 'break'... between stages, they run in a serial fashion and a single record can go all the way through the job before the next one starts. So you don't typically have one stage finishing before the next stage even starts.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
There are two schemes of parallelism.
Pipeline parallelism (the one implied by the original question) and partition parallelism (the one implied by Craig's response to Oritech's unrelated question).
You can read about both in the Parallel Job Developer's Guide.
To address the original question, ideally the data are not stored anywhere, but remain resident in memory as they are "passed" from one operator to the next. If there is not enough memory, then the overflow lands on disk, either scratch disk as configured or paging disk, depending on a number of factors.
The only way that data are "stored" by a parallel job is if you have a stage type that causes the data to be stored.
Associated with each link is a "virtual Data Set" (a data set structure in memory). Each of these is managed as a configurable number of buffers (usually two) - one is being written to by the upstream, or producer, operator while the other is being read from by the downstream, or consumer, operator. Thresholds for switching buffers and/or buffers beginning to resist input are also configurable.
Pipeline parallelism (the one implied by the original question) and partition parallelism (the one implied by Craig's response to Oritech's unrelated question).
You can read about both in the Parallel Job Developer's Guide.
To address the original question, ideally the data are not stored anywhere, but remain resident in memory as they are "passed" from one operator to the next. If there is not enough memory, then the overflow lands on disk, either scratch disk as configured or paging disk, depending on a number of factors.
The only way that data are "stored" by a parallel job is if you have a stage type that causes the data to be stored.
Associated with each link is a "virtual Data Set" (a data set structure in memory). Each of these is managed as a configurable number of buffers (usually two) - one is being written to by the upstream, or producer, operator while the other is being read from by the downstream, or consumer, operator. Thresholds for switching buffers and/or buffers beginning to resist input are also configurable.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.