Random order of column names in file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Pravenai
Participant
Posts: 49
Joined: Mon Apr 06, 2009 1:32 am
Location: USA

Random order of column names in file

Post by Pravenai »

Hi All,
We have a requirement wherein we load data from a flat file (csv) into a table.
However, the format of this source file is not fixed. Columns may not come in the same order always and we can also get additional columns in future.

Eg:
1st day my file will be:
EmpId, BirthDate, HireDate, DeptName, TerminationDate

2nd Day:
HireDate,DeptName,TerminationDate,BirthDate, EmpID, Salary

Is it possible to design a job that can handle such cases.

Regards,
PR
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Wow, that's kind of... stupid. :?

What I would attempt in your shoes would be to NOT skip the first / header record as one would normally do but instead read it as data and then squirrel away in stage variables what column holds what data this run. And then try to unravel things in the transformer for all of the subsequent data records. Sounds a bit painful but doable, me thinks.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Pravenai
Participant
Posts: 49
Joined: Mon Apr 06, 2009 1:32 am
Location: USA

Post by Pravenai »

RCP cannot handle column ordering?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

RCP? You can use RCP as long as all you have to do is move the data from the source to the target. If you need transformations or validations then RCP becomes an issue.

However, it seems to me that you could interrogate the file and use the first record to build the appropriate schema file to pass in run over run.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Pravenai
Participant
Posts: 49
Joined: Mon Apr 06, 2009 1:32 am
Location: USA

Post by Pravenai »

Thanks Craig.

Please let me know if you see any issues for the below approach:

Job 1 -> Read the header information and create a schema file using it.
Job 2 -> Use this schema file and read data from the file using sequential file stage.

The challenge that I see here is, we may not be aware of all column's datatype when we create schema file. Do you foresee any issues if we define all columns as varchar and process it?

Regards,
PR
Pravenai
Participant
Posts: 49
Joined: Mon Apr 06, 2009 1:32 am
Location: USA

Post by Pravenai »

It worked :D

I created a script which would fetch header information from the file and create a schema file out of it. Then I used this schema file to read the input file using sequential stage and RCP. Luckily, we do not need to do any transformation on these columns, so I guess weare good with RCP.

Thanks Craig.

Regards,
PR
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

8)
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply