Dynamic file dynamic column

pkothana · Post by **pkothana** » Mon Oct 27, 2003 10:40 pm

Hi,
It is required to dynamically pick columns (headers) from a particular file based on some condition. Again the file name is also dynamic ( "xyz"+ID+MMDDYY). ID and MMDDYY will keep on changing. I have to pick ID from the file name and attach it to the selected columns.

For example :
File Name : xyz_123_102803 (ID = 123 and date 10/28/03)
Columns : abc_I, sdf_I, sde_D (headers are dynamic i.e. in the next file we will have different headers)
I have to select abc_I and sdf_I columns.

Final output will be a file with following header names :
123abc_I, 123sdf_I (123 being the ID)

I was wondering if it is possible in Datastage PX. If yes, then how can we achieve this? There are so many questions arising here. How to pick a file with dynamic name (File path is fixed)? Then how can we specify dynamic column names in the file format? How can we select the columns based on column name format? How can we pick ID from the filename and attach the same to the columns?

Any help in this regard is highly appreciated.

Thanks a Lot in advance

Best Regards
Pinkesh

roy · Post by **roy** » Tue Oct 28, 2003 1:10 am

Hi,
I might have an idea but you'll have to try if it works.

since your talking dynamic names you might get it if the number of columns is fixed or at least is a finite number of options (prefered small number).

1. use parameter for file name.
2. build a routine that will build the target field names depending on your file name in the format of "abc_I, sdf_I, sde_D".

* the above is in a seperate job
which will pass the string of column names as a parameter to the next one

3. build a standard table definition with col1,col2,..... to corespond to the number of columns you have and use a user defined query that will do the insert itdelf using the parameter column names for insert.
i.e. insert into #TNAME#(#COLLIST#) values (?,?....)

since there is no real need that the fixed table definition must have the final column names this should work.

the only down side is that you need a job for every number of columns you might have and use the proper one for each run ( this could be done with a controll job using DS basic.

you did mention parallel so worst case you might need to enclose the job with a shared container.

there might be some other issues but at least I gave an option (not simple but an option)

IHTH (I Hope This Helps)

bigpoppa · Post by **bigpoppa** » Tue Oct 28, 2003 9:42 am

Another option is to use an outside UNIX script to pick out the ID and generate an Orchestrate Schema File prior to record processing. Orchestrate schema files are PX-specific metadata files that specify record- and field- related layout and type properties. Schema files can be imported into DataStage as Table Definitions. Also, I believe that you can still call PX stages by the old Orchestrate name/properties style, similar to what you see in score dumps and the generated osh.

E.g.

import
-schemafile GeneratedFile
-file NameOfInput
|
..downstream processing..

-BP

vmcburney · Post by **vmcburney** » Tue Oct 28, 2003 3:49 pm

I suggest you start with a sequence job that runs a routine that searches the directory for a file and returns the file name as Ans back to the sequence job. The sequence job then calls a server job and passes it three job parameters:

Code: Select all

PARAMETER DERIVIATION                                            EXAMPLE
FileName     GetFileName.ReturnedValue                        xyz_123_102803
FileID          FIELD(GetFileName.ReturnedValue, "_", 2)   123
FileDate       FIELD(GetFileName.ReturnedValue, "_", 3)   102803

The server job uses #FileName# as the input file. You can build any output file name you want using a combination of job parameters.
Now when it comes to column headings you can just treat the first row, the heading row, as a normal record and output it in the transformer with the ID added. Eg.
IF @INROWNUM=1 THEN FileID:FieldName ELSE FieldName
This derivation will set a field to be transformed to 123abc_I if it is the first row being processed (a heading), or left unchanged if it is not a heading row. This approach could get a bit tricky depending on what format your data is in. You may need to process your headings seperately, it really depends what your data looks like.

If you are reading and writing from text files and you turn off "First row is column headings" you do not need to define accurate column names within DataStage column definitions. You can call your columns Larry, Curly and Moe (although abc_I and sdf_Imay make more sense). You can then transform your column names.