Page 1 of 1

Dynamic Output File Names

Posted: Thu Feb 21, 2008 4:17 am
by wjfitzgerald
Hi,

I have spent the last day or so trawling previous posts but cannot find anything to satisfy my requirement. If any one can help I would really appreciate it.

I need to extract a large volume of data from a DB and then split the data in appropriately named files.

For the purposes of the exercise, the task can be simplified as follows:
1. extract all orders sorting by customer number
2. create a new output file for each customer, such that for customer C002 the file name would be EXT/CUST/C002

Obviously the "EXT/CUST/" is fine. I just cannot see how to get the C002 on to the end.

Thanks in advance for any suggestions help.

Posted: Thu Feb 21, 2008 4:37 am
by ArndW
There is no easy, builtin way to do this with DataStage. There are ways to code this that are more or less complicated. If you have a known and limited number of output files then you can hard-code the paths in a DataStage job.
If your list of customers/output files changes over time or is large then this method won't work and you will need to use another approach. This approach really depends more on your particular skill set or site standards than anything else.
I like DS/BASIC and would use a custom routine from a server job that reads the file and uses the OPENSEQ/WRITESEQ commands to direct the output depending on data.
A similarly designed routine written in C++ and called from a PX job could effectively do the same thing.
You can also use UNIX shell scripting to do this - either with perl, awk, sed or even iterations of simple cat & grep commands.

Posted: Thu Feb 21, 2008 4:41 am
by battaliou
Try a type 1 hashed file. Key the hashed file as customer number and call the hashed file EXT/CUST

Posted: Thu Feb 21, 2008 4:45 am
by wjfitzgerald
Any chance you could expand on the idea of the hash file approach, please? Fairly new to all this so not to clued in i'm afraid

Posted: Thu Feb 21, 2008 4:48 am
by ArndW
Martin - GREAT suggestion, that will work. The OP didn't mention if they are willing to use server but if so, using a type 1 or 19 hashed file and updating/appending records will work like a charm. The only downside is performance, but the coding will take only minutes.

Posted: Thu Feb 21, 2008 6:24 am
by ray.wurlod
Prefer type 19 if the names are going to exceed 14 characters in length.

Posted: Thu Feb 21, 2008 5:10 pm
by JoshGeorge
For Job Type: Parallel, a parallel (C++) routine which creates files dynamically will be the best way to do this using a single job. Aggregate / Collect all the records related to each customer in a transformer using stage variables (Append new line character for each record) and use a remove duplicate stage and pass each customer record to the final transformer stage which calls the parallel routine. You can pass the file name also dynamically. See if this POST helps.

Posted: Fri Feb 22, 2008 2:41 am
by wjfitzgerald
thanks, that looks very promising

Posted: Wed Mar 05, 2008 5:26 pm
by jdmiceli
Depending on your scripting skills, this would be a breeze in Perl. Extract the data to a sequential file ordered by customer number, then parse the rows by that field location into separate files. It would be wicked fast and could be fired off from a DS job if you wanted.

Just a thought!