Page 1 of 1

Question: Active/Passive Stage and Intermediate Files

Posted: Sat Sep 24, 2005 4:14 am
by maffan76
Hi,
I want to know the difference between Active Stage and Passive Stage.

Secondly every job makes some intermediate files is there ay wayt o remove those files automatically. e.g. when i go and check the project folder i see some files with the names of the links names. how can i remove those files automatically as i am working on very huge data set and these files take plenty of space on server.

Any tips and tricks regarding Intermediate files handling.

Thanks in Advance.

Posted: Sat Sep 24, 2005 4:55 am
by rleishman
Affan,

I want to get rid of those files from the project directory too, so I'll be interested to see the posts you get. If there's no way to do it automatically, hopefully someone can tell us which ones are redundant so that they can be removed in a Post-Job routine.

As to active and passive stages:

This is an interesting one because Informatica uses the same terminology although I believe that they use them slightly differently to DS.

In DS, an Active Stage transforms the data (eg. Transformer, Aggregator), it must have both inputs and outputs to be meaningful. A passive stage may be a terminal point; it does nothing to the data except store it or supply it (eg. file, ODBC, Oracle OCI).

Of course a passive stage may have both inputs and outputs together, but this is really only a graphical representation; one file icon in designer with an input and output link may actually represent different files (ie. the input may write to file A and then read from file B for the output link). The only relationship implied is that file A must finish writing before file B is read.

Active/Passive in DataStage is initially confusing for Informatica programmers though. In Informatica, passive "stages" include sources and targets like DS, but also includes any "stage" (quotes because they are not actually called Stages in Informatica) where the data must collect through to the last row before being passed on. Informatica's Sorter and Aggregator are examples of Passive "stages" that would be defined as Active in DS.

I have never needed to use an aggregator in DS (let Oracle do the work with Materialized Views). When adding further insight to this thread, can someone tell me whether the DS Server Aggegator requires the data to be pre-sorted and - if so - does it pass through the groups as soon as the key changes.

Posted: Sat Sep 24, 2005 7:49 am
by kcbland
Passive stage - files, tables, ftp of a file, etc. Physical objects that are either sources to a operation or targets of an operation.

Active stage - an operation that manipulates the data, ie transformers which derive outputs from input and reference sources or aggregator/sort/collect/partition stages that alter the data stream at a row level.

As your your intermediate files, are you referring to temporary sort files and where are you finding them? Can you give an example. Maybe you're just not fully qualifying files and they're defaulting into your project directory? A little more help will help us.

Re: Question: Active/Passive Stage and Intermediate Files

Posted: Sat Sep 24, 2005 8:19 am
by chulett
maffan76 wrote:Secondly every job makes some intermediate files is there ay wayt o remove those files automatically. e.g. when i go and check the project folder i see some files with the names of the links names.
Some points if you are seeing files with link names in a project directory...

1) Object names should never be left with their default name, people working under my tutolage get whacked for that. Are you meaning names like DSLink27 or - even worse - CopyofDSLink27? :?

2) The 'filename' that both a hashed file and a sequential file default to is the input link name to the stage. Again, these names should never be left defaulted as they are meaningless. I've seen times when people 'forget' and eventually fix them, but because they've run the job at least once the file gets created with the link name, and then sits there getting older and older. Delete them. Sequential files (and hashed files created in a 'directory path') are deleted from the operating system. Account based hashed files are deleted in a similar manner to which they were created - typically CREATE.FILE so remove them with the DELETE.FILE command from the Administrator.

3) Lastly, if these files are being created in the job's Project, then as noted you are using relative paths when you should always be (as a Best Practice) using absolute or 'fully qualified' paths. That way you have full control over exactly where they go, especially if you standardize on having a job parameter for every main source/target file directory that your jobs will be using. This makes it fairly trivial to move your working directories around if needed due to space issues or hardware changes. Just went through that, as a matter of fact. Took less than two minutes to change the job control file's parameter values from one directory set to another. This also revealed all of the people that hard-coded paths or didn't follow the parameter naming rules as their jobs suddenly 'blowed up'... quickly followed by a personal head thumping. :twisted:

Otherwise, as Ken notes, there may be temporary files left over from things like sorts or aggregators. We wouldn't be able to help with that unless you provide specific examples of the names and/or locations of these files.