load sequential file with varying number of columns

roblew · Post by **roblew** » Thu Jan 04, 2007 1:51 pm

Hello,

We've got a problem with loading a tab delimited sequential file with records with varying number of columns.

For example, the target table has 10 columns. The sequential file may have 10 columns in the first row, 5 columns in the second, and 8 columns in the third row.

Sequential_File_0,0: Delimiter for field "Field007" not found; input: {2 0 0 6 0 4 3 0}, at offset: 42

After seeing the above warning, We realized that the some rows are being rejected. Is there a way to load this type of sequential file, so that a row with 5 columns would just load into the first five appropriate column mappings in the target, and leave the rest to their defaults?

Yes, we have confirmed that the load works successfully if we just add the appropriate number of tab characters at the end of the rows which have less than 10 columns. But, we are looking for any other solution. Any suggestions?

kcbland · Post by **kcbland** » Thu Jan 04, 2007 2:00 pm

How could this possibly work? Data is positional, with DML you're preparing a cursor and feeding values to a cursor. With bulk loading, you're loading a fixed file. You can't do what you're asking. You need to separate your output data into different subsets of rows, each with like record type. Each load will require the specific column definitions relating to the record type.

roblew · Post by **roblew** » Thu Jan 04, 2007 2:15 pm

thanks for the reply, kcbland. To do what I'm asking, I would need some functionality to either place nulls (or a default value) into the missing columns, or completely ignore the missing columns. The reason I think this could work, is because the load would assume that each column does correlate to the correct mapping in the resulting target record type.

For example, If we're loading into this:
col1 varchar2(6)
col2 varchar2(6)
col3 varchar2(6)

And the 2 rows I want to load are:
value1<tab>value2<tab>value3<tab>
value1<tab>value2<tab>

If the mapping was set to load straight through (1st column->col1, 2nd column->col2, 3rd column->col3), I would want some way to have the second row load successfully, by maybe having datastage automatically add in the missing tab delimiters (insert into table(col1,col2,col3) values('value1','value2',NULL)), or by removing the column from the DML for the records which are lacking all of the columns (insert into table(col1,col2) values('value1','value2')).

I know I'm asking for something that I don't think DataStage can do, but I just wanted to see if anyone knows of a solution short of "separate your output data into different subsets of rows, each with like record type".

DSguru2B · Post by **DSguru2B** » Thu Jan 04, 2007 2:18 pm

Run a shell on the file. Which checks for the number of tabs, if less than 10, then it will fill in the missing ones. It would be simple if the missing columns are towards the end. If they are in between then more manipulation is required.

roblew · Post by **roblew** » Thu Jan 04, 2007 2:21 pm

ok, thanks. Yes, that's what I was trying to explain... All of the missing columns are always at the end. All other columns should always be in the correct order, which will load properly according to the mapping.

Are there any other ways to do this using any DataStage functionality?

DSguru2B · Post by **DSguru2B** » Thu Jan 04, 2007 2:29 pm

It can but it wont be pretty. What i can think of is read the record as a single column, count the number of tabs using count() or dcount() and depending upon that concatenate the remaining tabs. Write it out as a single column again.

chulett · Post by **chulett** » Thu Jan 04, 2007 2:48 pm

It would be simple in a Server job... does this really need to be PX?

ray.wurlod · Post by **ray.wurlod** » Thu Jan 04, 2007 4:03 pm

The message stated "delimiter not found". Check that the field delimiter in the table definition (record schema) matches that found in the file. You need to get that right before trying anything else.

Is there any information in the first part of the record indicating its record type (format, number of columns)? If so, you may be able to read the record as a single string, then direct rows into different processing streams, parsing them with a Column Import stage or a Transformer stage.