Create Multiple Files Dynamically without using scripts

sampitke1 · Post by **sampitke1** » Wed Apr 30, 2008 3:22 pm

Hi

We are seeking a solution to create multiple output files depending on the input data. Our input row has a column "File Name" which has different values. Using that values we want to create out put files. If that file name already exists then write into the file else create file and write in to the file.

For this we are trying to avoid any scripts and routines. Can we achieve this using the available stages?

Also please let me know can we change the value of the parameter dynamically and using that as a file name?

Thank and Regards
Sam

jhmckeever · Post by **jhmckeever** » Wed Apr 30, 2008 6:13 pm

Why do you want to avoid using scripts? Is it because you don't want to have to manage a deliverable asset external to the DataStage environment?

Personally, I'd be inclined to develop the script and implement it in an External Target stage.
Your script would :
- Read from stdin,
- Determine the target filename from the input row (probably using the cut command) and,
- Append (using redirection. '>') the remainder of the input line to the specified file. The shell would create files as necessary.

Would this fulfill your requirements? There are other solutions, but this is fairly quick and easy to maintain/debug.

Can you elaborate on the 'parameter' you're suggesting might also provide a filename? How does this work alongside the filename column you discussed? The value of a parameter cannot be changed (easily).

John.

sampitke1 · Post by **sampitke1** » Thu May 01, 2008 1:20 pm

Hey Thanks for the reply and yes we can do it using the External Stage and the script.

I was thinking to take "FILE NAME" from input row in a PARAMETER and COMPARE it with PREVIOUS "FILE NAME" and depending on the result we can either create a file or write a file.

In short I was trying to use PARAMETER as a VARIABLE. However as you said it is very difficult job to change the parameter at run time.

I believe values of all the parameters is cashed at very eairly stage of the job execution.

Please correct me if I am wrong or missing something.

Thanks
Sam

[quote="jhmckeever"]Why do you want to avoid using scripts? Is it because you don't want to have to manage a deliverable asset external to the DataStage environment?

Personally, I'd be inclined to develop the script and implement it in an External Target stage.
Your script would :
- Read from stdin,
- Determine the target filename from the input row (probably using the cut command) and,
- Append (using redirection. '>') the remainder of the input line to the specified file. The shell would create files as necessary.

Would this fulfill your requirements? There are other solutions, but this is fairly quick and easy to maintain/debug.

Can you elaborate on the 'parameter' you're suggesting might also provide a filename? How does this work alongside the filename column you discussed? The value of a parameter cannot be changed (easily).

John.[/quote]

jhmckeever · Post by **jhmckeever** » Thu May 01, 2008 5:18 pm

That 'procedural programming' approach to the problem would be tackled by sorting by your filename column and using stage variables within a transformer. They are an ideal mechanism comparing neighbouring rows in sorted datasets. You could also create a 'keyChange' column in the preceding sort to identify the occurrence of a new filename. There's no parallel function, however, to 'create/append to file MyFile' - unless of course you wanted to craft one yourself in C++?

Because you don't know ...
- How many files you need to create (potentially one per input row), or
- Their respective names (until you've read each input row)
... you can't use the Sequential File stage. You wanted to specify the Sequential File's filename as a parameter and change this value on the fly using a transformer. I'd be typing a long time if I discussed my thoughts on that approach! Suffice it to say, effectively moving the functionality outside of DataStage and into the shell will solve your problem.

You ask if parameters are cached at the start of a job - 'substituted with their respective values' might be a better description. Almost like a text search and replace.

Think of a job parameter as read only - and think of the crazy mess we'd get into if multiple concurrent stages could each modify input parameters willy-nilly during a job execution. Then think about the consequences of a single stage running across separate partitions applying different values to the same parameter depending upon which partition it was running on ... !!!
Clearly, job parameters are best left as they are: Global (within the scope of their parent job) and immutable.

John.

ray.wurlod · Post by **ray.wurlod** » Thu May 01, 2008 5:33 pm

There is a very easy solution if you're prepared to use a server job. Make the target stage a Hashed File stage and choose Type 19 when creating it (or simply do nothing if the directory already exists). A "Type 19 hashed file" is actually a directory, and the record IDs (Keys) are file names in that directory.

jhmckeever · Post by **jhmckeever** » Thu May 01, 2008 5:37 pm

... need to get that parallel to server class sorted sometime ...

sampitke1 · Post by **sampitke1** » Fri May 02, 2008 10:57 am

Thanks for your valuable information.