Reading variable number of columns in a file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
craviraj
Participant
Posts: 17
Joined: Mon Jun 12, 2006 5:17 am
Location: Secunderabad

Reading variable number of columns in a file

Post by craviraj »

Hi,
I have a requirement to read a file in the format as below:

A,F1,F2,F3 (Comma delimited and column A's position is fixed with 3 columns following column A). 3 is an arbitrary number. It can vary from 1 to 25.

So, I do not know the number of fields that will come in the input file.

1) How can I read such a file?
-- Assumption Runtime Column Propogation may work to read the data, but how would I know the number of columns read?

2) Second scenario is the fields will not be in the specific order. For eg. I get the file with 4 fields one time as A,F1,F2,F3 then second time we may get records as A,F3,F1,F2. In such case, how can we identify the fields.

Note: First line will have the column names, but if the number of columns are not fixed and the order is not fixed - is there a way to read these files?

The first requirement is to read this file and Pivot it with the values we receive to read the file as
A1 F1 X1
A1 F2 X2
A1 F3 X3

assuming that the file contained only one record as shown below:
A,F1,F2,F3
A1,X1,X2,X3

Any help is highly appreciated.

Thanks,
Raviraj
-----------------------
R-Square
[Ravi Raj]
robjones
Participant
Posts: 24
Joined: Tue Nov 18, 2008 3:12 pm
Location: London

Post by robjones »

To split the data into columns, you can read the file as if it contained only one column, with all data for each record in that single column.

You can then pass this into a transform stage and use the field function (with the comma character as delimiter) in 26 stage variables to split the data into separate columns.

eg.

svA = Field(link.inputcolumn, ",", 1)
svData1 = Field(link.inputcolumn, "," , 2)
svData2 = Field(link.inputcolumn, "," , 3)
...
svData25 = Field(link.inputcolumn, "," , 26)

I do however think that you're going to need to identify a rule with which to determine how the split columns are then mapped to your F column outputs.

Can you provide examples of the data which you need to process?

It would be useful to see some rows like "A, F1, F2, F3" and "A, F3, F5, F7" to see if there's anything distinctive about each data column that you could base your decision upon.

Once you've finally got the data mapped into the correct columns, the action you wish to perform is known as a vertical pivot and is quite straight forward.

If you search this forum for "vertical pivot" you'll find many examples of how to do this.


Rob
craviraj
Participant
Posts: 17
Joined: Mon Jun 12, 2006 5:17 am
Location: Secunderabad

Reading variable number of columns in a file

Post by craviraj »

Thanks Rob for your response.

Some file formats would be: (First Line in each file is column name - please note that all the files will have the similar patterned name and the name will not help in identifying how many columns each file has)

1st example File:
A,F1,F2,F3 (Column Names)
A1,X1,X2,X3
A2,X4,X5,X6

2nd example file:
A,F1,F5,F6,F7
A1,X1,X2,X3,X4
A2,X5,X6,X7,X8
A3,X9,X0,X10,X11

3rd example file:
A,F5,F6,F2,F1,F3
A4,X6,X7,X4,X8
A5,X8,X9,X7,X3
A6, X1,X2,X3,X4

The Values starting with A and X are arbitrary values and can be replaced with any text. If the values have repeated it does not mean that same value would come in. The first line is column name and the names given can be treated as repeated if the same value is repeated.
-----------------------
R-Square
[Ravi Raj]
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I would use a column import.
Mamu Kim
craviraj
Participant
Posts: 17
Joined: Mon Jun 12, 2006 5:17 am
Location: Secunderabad

Reading variable number of columns in a file

Post by craviraj »

As far as I know, Column import stage also expects the columns in a fixed position.
-----------------------
R-Square
[Ravi Raj]
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Use Schema files. It would help in selecting the target columns dynamically but they should be defined in the schema file.

Regards
Sreeni
Post Reply