Random order of column names in file

Pravenai · Post by **Pravenai** » Thu Jun 02, 2016 11:29 am

Hi All,
We have a requirement wherein we load data from a flat file (csv) into a table.
However, the format of this source file is not fixed. Columns may not come in the same order always and we can also get additional columns in future.

Eg:
1st day my file will be:
EmpId, BirthDate, HireDate, DeptName, TerminationDate

2nd Day:
HireDate,DeptName,TerminationDate,BirthDate, EmpID, Salary

Is it possible to design a job that can handle such cases.

Regards,
PR

chulett · Post by **chulett** » Thu Jun 02, 2016 11:43 am

Wow, that's kind of... stupid.

What I would attempt in your shoes would be to NOT skip the first / header record as one would normally do but instead read it as data and then squirrel away in stage variables what column holds what data this run. And then try to unravel things in the transformer for all of the subsequent data records. Sounds a bit painful but doable, me thinks.

Pravenai · Post by **Pravenai** » Thu Jun 02, 2016 12:00 pm

RCP cannot handle column ordering?

chulett · Post by **chulett** » Thu Jun 02, 2016 12:17 pm

RCP? You can use RCP as long as all you have to do is move the data from the source to the target. If you need transformations or validations then RCP becomes an issue.

However, it seems to me that you could interrogate the file and use the first record to build the appropriate schema file to pass in run over run.

Pravenai · Post by **Pravenai** » Thu Jun 02, 2016 12:30 pm

Thanks Craig.

Please let me know if you see any issues for the below approach:

Job 1 -> Read the header information and create a schema file using it.
Job 2 -> Use this schema file and read data from the file using sequential file stage.

The challenge that I see here is, we may not be aware of all column's datatype when we create schema file. Do you foresee any issues if we define all columns as varchar and process it?

Regards,
PR

Pravenai · Post by **Pravenai** » Thu Jun 02, 2016 2:56 pm

It worked :D

I created a script which would fetch header information from the file and create a schema file out of it. Then I used this schema file to read the input file using sequential stage and RCP. Luckily, we do not need to do any transformation on these columns, so I guess weare good with RCP.

Thanks Craig.

Regards,
PR

chulett · Post by **chulett** » Thu Jun 02, 2016 3:31 pm