Problems reading a file with large columns

rwierdsm · Post by **rwierdsm** » Wed Jun 14, 2006 6:03 am

All,

I'm having all kinds of fun with my first EE job.

I'm trying to load an input file with the following structure:

NUMBER			char(10)
CATEGORY			varchar(25)
MISC_ARRAY1			varchar(5000)
RB_APPROVALS		varchar(100)
RB_APPROVALS_BY		varchar(500)
RBC_LOCATIONS		varchar(200)
RBC_LOCATIONS_TASK	varchar(500)
RBC_EC_START_DATE		varchar(100)
RBC_EC_END_DATE		varchar(100)
RBC_EC_EVENT		varchar(100)
RBC_RISKQUESTION		varchar(2500)
BACKOUT_METHOD		long varchar(32700)
DESCRIPTION			long varchar(32700)

and getting the following errors

Code: Select all

Input_File,0: Failure during execution of operator logic.
Input_File,0: Internal Error: (length <= APT_PMcontrolService::maxMessageSize()):processmgr\errorroute.C: 354: length = 99806, maxMessageSize = 65520
node_node1: Player 1 terminated unexpectedly.
main_program: Unexpected exit status 1main_program: Step execution finished with status = FAILED.

Since the long varchar fields have embedded terminators and other fun stuff, the job has been built like this:

Seq >>> XFM >>> Column Import >>> Seq

The sequence reads the whole record as a single column, the column import splits the record into component columns.

Is the job bombing out on the input file being too long

Can someone point me in the right direction?

Rob W

rwierdsm · Post by **rwierdsm** » Wed Jun 14, 2006 1:06 pm

It appears that there were two things going on.

In the initial read, I was reading the whole input as single column. I had defined a character string as a row delimiter and then the same string as the column delimiter, this is what DS was unhappy about.

When the job aborted, the data was too big for the message handler, giving me the messages above. Once I cut the incoming data down to a more managable size, I started getting the error that DS was REALLY unhappy about, debugged that, bumped the incoming data back up again. Now DS is happy (until the next thing....)

Rob

DSguru2B · Post by **DSguru2B** » Wed Jun 14, 2006 1:59 pm

What is the next error you got exactly- after you ran the job with a managable size of data.

rwierdsm · Post by **rwierdsm** » Wed Jun 14, 2006 2:35 pm

Hi, Guru,

Here it is.

Code: Select all

Copy_of_Input_File,0: Error reading on import: 
Copy_of_Input_File,0: Consumed more than 100000 bytes looking for record delimiter; aborting
Copy_of_Input_File,0: Import error at record 1.
Copy_of_Input_File,0: Operator's runLocally() failed.
APT_CombinedOperatorController,0: Operator's runLocally() failed.
APT_CombinedOperatorController,0: Operator terminated abnormally: runLocally did not return APT_StatusOk
main_program: Step execution finished with status = FAILED.

Seems that setting the record delimiter and the single column delimiter (to the same string) made DS unhappy.

Rob

ray.wurlod · Post by **ray.wurlod** » Wed Jun 14, 2006 3:32 pm

Typically, with delimited files, the record delimiter is the line terminator. Did you set the "final delimiter" property; if so, to what?

rwierdsm · Post by **rwierdsm** » Thu Jun 15, 2006 6:28 am

Ray,

A brief history....

This is the file I have posted about earlier in the Server forum (when I was trying to solve our design issue in Server!).

In short, we have a file with huge columns, 32k, and embedded weirdness, including terminators of all flavours and potentially any character you can think of.

The source guys have cooked up a field delimiter {~^|^~}. We are trying to determine a record delimiter, currently experimenting with ||^^||.

The sequence stage is reading the whole record (there will be multiple record types in the extract we receive), looking for ||^^||. We have set the record delimiter string property to ||^^||. I have not set the final delimiter property for the record. When I received the error above, I also had the Column property of delimiter string to ||^^||. Once I removed the column property, DS was much happier.

The idea here is to read the whole file with the sequential file stage, read the first 3 characters to determine record type, split the stream into the 40+ record types and then deal with each record type individually, using the {~^|^~} delimiter to break records down to fields. In this prototype we were able to break down to fields with the column import stage.

Now on to see if the EE TeraData MultiLoad stage deals with the data as smoothly as the one in server did!

Sounds like fun!

Rob

ray.wurlod · Post by **ray.wurlod** » Thu Jun 15, 2006 3:58 pm

I believe only single-character delimiters are handled. Have you tried a non-printing delimiter, such as Ctrl-Y (025)?

rwierdsm · Post by **rwierdsm** » Mon Jun 19, 2006 6:39 am

I believe only single-character delimiters are handled.

In the end, it was OK with the 6 character record delimiter string ||^^||, so long as I didn't also define the column delimiter to the same characters (at this point of the flow, I'm treating the whole input record as a single column). Later on, it is also OK with the column delimiter {~^|^~} when I break it up into the individual columns.

The initial error I talk about at the top of this thread happened because I had both record delimiter and field delimiter set to ||^^||. The job bombed out, but didn't like the amount of data, so gave an error message about MaxMessageSize instead. When I removed the big columns, the real error was revealed.

I get the feeling I'm being as clear as mud on this. If anyone is interested in clarification, I will post more info.

Don't forget to wear Orange on Wednesday! Holland vs Argentina!

Rob W.

DSXchange

Problems reading a file with large columns

Problems reading a file with large columns

Re: Problems reading a file with large columns