Page 1 of 1

Variable Column Set in File

Posted: Wed Jun 04, 2014 9:35 am
by eli.nawas_AUS
I have a file coming in which may have different column sets on different days. It has a header row, so it is possible to determine which columns exist on a given day, but I have not been able to find a way of getting the incoming file stage (HDFS stage) to understand the header and map the input columns to job columns. Is this possible?

Posted: Wed Jun 04, 2014 5:26 pm
by ssnegi
you can have a seperate job for each day. In the sequencer you can use the Nested Condition stage to call the jobs based on the day.

Posted: Wed Jun 04, 2014 7:38 pm
by ray.wurlod
Create a preliminary job that reads the header file and creates a schema file (so that the Sequential File stage can use RCP to read the file) and also creates a Modify stage specification to translate from today's columns to the "official" job columns. Provide the Modify stage specification to the real job as a job parameter, perhaps from the User Status area of the preliminary job.