What is "persistent form" in this context?

splayer · Post by **splayer** » Mon Jun 19, 2006 2:33 pm

I am reading the Parallel Job's Developers Guide. For the Data Set stage,
I cam across this sentence: "The Data Set stage allows you to store data
in a persistent form, which can then be used by other DataStage jobs".

I also searched through this forum but did not get an answer.

What does "persistent form" mean?

seanc217 · Post by **seanc217** » Mon Jun 19, 2006 2:43 pm

It means it is stored on disk, instead of memory.

HTH.

kcbland · Post by **kcbland** » Mon Jun 19, 2006 3:33 pm

If you build a reference lookup set of data that multiple jobs could share, write it as a persistent dataset (.ds) so that multiple jobs can leverage it. Otherwise, all jobs that need to access that data must also have all the necessary logic to go gain the data, and then redundantly use resources to go gain the same data again (and potentially transform, sort, and partition it).

splayer · Post by **splayer** » Mon Jun 19, 2006 6:20 pm

Kenneth, so are you saying that we can choose when creating a data set whether we want it persistent or not? Where can we make that choice?

kcbland · Post by **kcbland** » Mon Jun 19, 2006 7:22 pm

From your manuals under the Start button:

You can use the Data Set stage:

The Data Set stage is a file stage. It allows you to read data from or
write data to a data set. The stage can have a single input link or a
single output link. It can be configured to execute in parallel or
sequential mode.
What is a data set? DataStage parallel extender jobs use data sets to
manage data within a job. You can think of each link in a job as
carrying a data set. The Data Set stage allows you to store data being
operated on in a persistent form, which can then be used by other
DataStage jobs. Data sets are operating system files, each referred to by
a control file, which by convention has the suffix .ds. Using data sets
wisely can be key to good performance in a set of linked jobs. You can
also manage data sets independently of a job using the Data Set
Management utility, available from the DataStage Designer, Manager,
or Director, see Chapter 57.

Or the Lookup File Set stage:

The Lookup File Set stage is a file stage. It allows you to create a lookup
file set or reference one for a lookup. The stage can have a single input link
or a single output link. The output link must be a reference link. The stage
can be configured to execute in parallel or sequential mode when used
with an input link.
When creating Lookup file sets, one file will be created for each partition.
The individual files are referenced by a single descriptor file, which by
convention has the suffix .fs.
When performing lookups, Lookup File stages are used in conjunction
with Lookup stages.

ray.wurlod · Post by **ray.wurlod** » Mon Jun 19, 2006 8:41 pm

splayer wrote:Kenneth, so are you saying that we can choose when creating a data set whether we want it persistent or not? Where can we make that choice?

You don't get any choice. If you create a Data Set - using a Data Set stage - then it's persistent.

Only running jobs create virtual Data Sets. You can inspect the generated OSH to determine the names of their control files, which names always end in ".v".

kumar_s · Post by **kumar_s** » Tue Jun 20, 2006 2:02 am

Its better you can search on 'Virtual Dataset' and have a look into it, so that you can easily find the difference and understand persistent dataset.