load sequential file with varying number of columns

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
roblew
Charter Member
Charter Member
Posts: 123
Joined: Mon Mar 27, 2006 7:32 pm
Location: San Ramon

load sequential file with varying number of columns

Post by roblew »

Hello,

We've got a problem with loading a tab delimited sequential file with records with varying number of columns.

For example, the target table has 10 columns. The sequential file may have 10 columns in the first row, 5 columns in the second, and 8 columns in the third row.

Sequential_File_0,0: Delimiter for field "Field007" not found; input: {2 0 0 6 0 4 3 0}, at offset: 42

After seeing the above warning, We realized that the some rows are being rejected. Is there a way to load this type of sequential file, so that a row with 5 columns would just load into the first five appropriate column mappings in the target, and leave the rest to their defaults?

Yes, we have confirmed that the load works successfully if we just add the appropriate number of tab characters at the end of the rows which have less than 10 columns. But, we are looking for any other solution. Any suggestions?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

How could this possibly work? Data is positional, with DML you're preparing a cursor and feeding values to a cursor. With bulk loading, you're loading a fixed file. You can't do what you're asking. You need to separate your output data into different subsets of rows, each with like record type. Each load will require the specific column definitions relating to the record type.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
roblew
Charter Member
Charter Member
Posts: 123
Joined: Mon Mar 27, 2006 7:32 pm
Location: San Ramon

Post by roblew »

thanks for the reply, kcbland. To do what I'm asking, I would need some functionality to either place nulls (or a default value) into the missing columns, or completely ignore the missing columns. The reason I think this could work, is because the load would assume that each column does correlate to the correct mapping in the resulting target record type.

For example, If we're loading into this:
col1 varchar2(6)
col2 varchar2(6)
col3 varchar2(6)

And the 2 rows I want to load are:
value1<tab>value2<tab>value3<tab>
value1<tab>value2<tab>

If the mapping was set to load straight through (1st column->col1, 2nd column->col2, 3rd column->col3), I would want some way to have the second row load successfully, by maybe having datastage automatically add in the missing tab delimiters (insert into table(col1,col2,col3) values('value1','value2',NULL)), or by removing the column from the DML for the records which are lacking all of the columns (insert into table(col1,col2) values('value1','value2')).

I know I'm asking for something that I don't think DataStage can do, but I just wanted to see if anyone knows of a solution short of "separate your output data into different subsets of rows, each with like record type".
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Run a shell on the file. Which checks for the number of tabs, if less than 10, then it will fill in the missing ones. It would be simple if the missing columns are towards the end. If they are in between then more manipulation is required.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
roblew
Charter Member
Charter Member
Posts: 123
Joined: Mon Mar 27, 2006 7:32 pm
Location: San Ramon

Post by roblew »

ok, thanks. Yes, that's what I was trying to explain... All of the missing columns are always at the end. All other columns should always be in the correct order, which will load properly according to the mapping.

Are there any other ways to do this using any DataStage functionality?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

It can but it wont be pretty. What i can think of is read the record as a single column, count the number of tabs using count() or dcount() and depending upon that concatenate the remaining tabs. Write it out as a single column again.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It would be simple in a Server job... does this really need to be PX? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The message stated "delimiter not found". Check that the field delimiter in the table definition (record schema) matches that found in the file. You need to get that right before trying anything else.

Is there any information in the first part of the record indicating its record type (format, number of columns)? If so, you may be able to read the record as a single string, then direct rows into different processing streams, parsing them with a Column Import stage or a Transformer stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply