Page 1 of 1

load sequential file with varying number of columns

Posted: Thu Jan 04, 2007 1:51 pm
by roblew
Hello,

We've got a problem with loading a tab delimited sequential file with records with varying number of columns.

For example, the target table has 10 columns. The sequential file may have 10 columns in the first row, 5 columns in the second, and 8 columns in the third row.

Sequential_File_0,0: Delimiter for field "Field007" not found; input: {2 0 0 6 0 4 3 0}, at offset: 42

After seeing the above warning, We realized that the some rows are being rejected. Is there a way to load this type of sequential file, so that a row with 5 columns would just load into the first five appropriate column mappings in the target, and leave the rest to their defaults?

Yes, we have confirmed that the load works successfully if we just add the appropriate number of tab characters at the end of the rows which have less than 10 columns. But, we are looking for any other solution. Any suggestions?

Posted: Thu Jan 04, 2007 2:00 pm
by kcbland
How could this possibly work? Data is positional, with DML you're preparing a cursor and feeding values to a cursor. With bulk loading, you're loading a fixed file. You can't do what you're asking. You need to separate your output data into different subsets of rows, each with like record type. Each load will require the specific column definitions relating to the record type.

Posted: Thu Jan 04, 2007 2:15 pm
by roblew
thanks for the reply, kcbland. To do what I'm asking, I would need some functionality to either place nulls (or a default value) into the missing columns, or completely ignore the missing columns. The reason I think this could work, is because the load would assume that each column does correlate to the correct mapping in the resulting target record type.

For example, If we're loading into this:
col1 varchar2(6)
col2 varchar2(6)
col3 varchar2(6)

And the 2 rows I want to load are:
value1<tab>value2<tab>value3<tab>
value1<tab>value2<tab>

If the mapping was set to load straight through (1st column->col1, 2nd column->col2, 3rd column->col3), I would want some way to have the second row load successfully, by maybe having datastage automatically add in the missing tab delimiters (insert into table(col1,col2,col3) values('value1','value2',NULL)), or by removing the column from the DML for the records which are lacking all of the columns (insert into table(col1,col2) values('value1','value2')).

I know I'm asking for something that I don't think DataStage can do, but I just wanted to see if anyone knows of a solution short of "separate your output data into different subsets of rows, each with like record type".

Posted: Thu Jan 04, 2007 2:18 pm
by DSguru2B
Run a shell on the file. Which checks for the number of tabs, if less than 10, then it will fill in the missing ones. It would be simple if the missing columns are towards the end. If they are in between then more manipulation is required.

Posted: Thu Jan 04, 2007 2:21 pm
by roblew
ok, thanks. Yes, that's what I was trying to explain... All of the missing columns are always at the end. All other columns should always be in the correct order, which will load properly according to the mapping.

Are there any other ways to do this using any DataStage functionality?

Posted: Thu Jan 04, 2007 2:29 pm
by DSguru2B
It can but it wont be pretty. What i can think of is read the record as a single column, count the number of tabs using count() or dcount() and depending upon that concatenate the remaining tabs. Write it out as a single column again.

Posted: Thu Jan 04, 2007 2:48 pm
by chulett
It would be simple in a Server job... does this really need to be PX? :?

Posted: Thu Jan 04, 2007 4:03 pm
by ray.wurlod
The message stated "delimiter not found". Check that the field delimiter in the table definition (record schema) matches that found in the file. You need to get that right before trying anything else.

Is there any information in the first part of the record indicating its record type (format, number of columns)? If so, you may be able to read the record as a single string, then direct rows into different processing streams, parsing them with a Column Import stage or a Transformer stage.