Hi All,
We have a requirement wherein we load data from a flat file (csv) into a table.
However, the format of this source file is not fixed. Columns may not come in the same order always and we can also get additional columns in future.
Eg:
1st day my file will be:
EmpId, BirthDate, HireDate, DeptName, TerminationDate
2nd Day:
HireDate,DeptName,TerminationDate,BirthDate, EmpID, Salary
Is it possible to design a job that can handle such cases.
Regards,
PR
Random order of column names in file
Moderators: chulett, rschirm, roy
Wow, that's kind of... stupid.
What I would attempt in your shoes would be to NOT skip the first / header record as one would normally do but instead read it as data and then squirrel away in stage variables what column holds what data this run. And then try to unravel things in the transformer for all of the subsequent data records. Sounds a bit painful but doable, me thinks.
What I would attempt in your shoes would be to NOT skip the first / header record as one would normally do but instead read it as data and then squirrel away in stage variables what column holds what data this run. And then try to unravel things in the transformer for all of the subsequent data records. Sounds a bit painful but doable, me thinks.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
RCP? You can use RCP as long as all you have to do is move the data from the source to the target. If you need transformations or validations then RCP becomes an issue.
However, it seems to me that you could interrogate the file and use the first record to build the appropriate schema file to pass in run over run.
However, it seems to me that you could interrogate the file and use the first record to build the appropriate schema file to pass in run over run.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Thanks Craig.
Please let me know if you see any issues for the below approach:
Job 1 -> Read the header information and create a schema file using it.
Job 2 -> Use this schema file and read data from the file using sequential file stage.
The challenge that I see here is, we may not be aware of all column's datatype when we create schema file. Do you foresee any issues if we define all columns as varchar and process it?
Regards,
PR
Please let me know if you see any issues for the below approach:
Job 1 -> Read the header information and create a schema file using it.
Job 2 -> Use this schema file and read data from the file using sequential file stage.
The challenge that I see here is, we may not be aware of all column's datatype when we create schema file. Do you foresee any issues if we define all columns as varchar and process it?
Regards,
PR
It worked :D
I created a script which would fetch header information from the file and create a schema file out of it. Then I used this schema file to read the input file using sequential stage and RCP. Luckily, we do not need to do any transformation on these columns, so I guess weare good with RCP.
Thanks Craig.
Regards,
PR
I created a script which would fetch header information from the file and create a schema file out of it. Then I used this schema file to read the input file using sequential stage and RCP. Luckily, we do not need to do any transformation on these columns, so I guess weare good with RCP.
Thanks Craig.
Regards,
PR