Question: Active/Passive Stage and Intermediate Files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
maffan76
Participant
Posts: 110
Joined: Tue Aug 23, 2005 5:27 am

Question: Active/Passive Stage and Intermediate Files

Post by maffan76 »

Hi,
I want to know the difference between Active Stage and Passive Stage.

Secondly every job makes some intermediate files is there ay wayt o remove those files automatically. e.g. when i go and check the project folder i see some files with the names of the links names. how can i remove those files automatically as i am working on very huge data set and these files take plenty of space on server.

Any tips and tricks regarding Intermediate files handling.

Thanks in Advance.
Regards,
Affan
"Questioning is Half Knowledge"
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

Affan,

I want to get rid of those files from the project directory too, so I'll be interested to see the posts you get. If there's no way to do it automatically, hopefully someone can tell us which ones are redundant so that they can be removed in a Post-Job routine.

As to active and passive stages:

This is an interesting one because Informatica uses the same terminology although I believe that they use them slightly differently to DS.

In DS, an Active Stage transforms the data (eg. Transformer, Aggregator), it must have both inputs and outputs to be meaningful. A passive stage may be a terminal point; it does nothing to the data except store it or supply it (eg. file, ODBC, Oracle OCI).

Of course a passive stage may have both inputs and outputs together, but this is really only a graphical representation; one file icon in designer with an input and output link may actually represent different files (ie. the input may write to file A and then read from file B for the output link). The only relationship implied is that file A must finish writing before file B is read.

Active/Passive in DataStage is initially confusing for Informatica programmers though. In Informatica, passive "stages" include sources and targets like DS, but also includes any "stage" (quotes because they are not actually called Stages in Informatica) where the data must collect through to the last row before being passed on. Informatica's Sorter and Aggregator are examples of Passive "stages" that would be defined as Active in DS.

I have never needed to use an aggregator in DS (let Oracle do the work with Materialized Views). When adding further insight to this thread, can someone tell me whether the DS Server Aggegator requires the data to be pre-sorted and - if so - does it pass through the groups as soon as the key changes.
Ross Leishman
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Passive stage - files, tables, ftp of a file, etc. Physical objects that are either sources to a operation or targets of an operation.

Active stage - an operation that manipulates the data, ie transformers which derive outputs from input and reference sources or aggregator/sort/collect/partition stages that alter the data stream at a row level.

As your your intermediate files, are you referring to temporary sort files and where are you finding them? Can you give an example. Maybe you're just not fully qualifying files and they're defaulting into your project directory? A little more help will help us.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Question: Active/Passive Stage and Intermediate Files

Post by chulett »

maffan76 wrote:Secondly every job makes some intermediate files is there ay wayt o remove those files automatically. e.g. when i go and check the project folder i see some files with the names of the links names.
Some points if you are seeing files with link names in a project directory...

1) Object names should never be left with their default name, people working under my tutolage get whacked for that. Are you meaning names like DSLink27 or - even worse - CopyofDSLink27? :?

2) The 'filename' that both a hashed file and a sequential file default to is the input link name to the stage. Again, these names should never be left defaulted as they are meaningless. I've seen times when people 'forget' and eventually fix them, but because they've run the job at least once the file gets created with the link name, and then sits there getting older and older. Delete them. Sequential files (and hashed files created in a 'directory path') are deleted from the operating system. Account based hashed files are deleted in a similar manner to which they were created - typically CREATE.FILE so remove them with the DELETE.FILE command from the Administrator.

3) Lastly, if these files are being created in the job's Project, then as noted you are using relative paths when you should always be (as a Best Practice) using absolute or 'fully qualified' paths. That way you have full control over exactly where they go, especially if you standardize on having a job parameter for every main source/target file directory that your jobs will be using. This makes it fairly trivial to move your working directories around if needed due to space issues or hardware changes. Just went through that, as a matter of fact. Took less than two minutes to change the job control file's parameter values from one directory set to another. This also revealed all of the people that hard-coded paths or didn't follow the parameter naming rules as their jobs suddenly 'blowed up'... quickly followed by a personal head thumping. :twisted:

Otherwise, as Ken notes, there may be temporary files left over from things like sorts or aggregators. We wouldn't be able to help with that unless you provide specific examples of the names and/or locations of these files.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply