Problems reading a file with large columns

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Problems reading a file with large columns

Post by rwierdsm »

All,

I'm having all kinds of fun with my first EE job.

I'm trying to load an input file with the following structure:

Code: Select all

NUMBER			char(10)
CATEGORY			varchar(25)
MISC_ARRAY1			varchar(5000)
RB_APPROVALS		varchar(100)
RB_APPROVALS_BY		varchar(500)
RBC_LOCATIONS		varchar(200)
RBC_LOCATIONS_TASK	varchar(500)
RBC_EC_START_DATE		varchar(100)
RBC_EC_END_DATE		varchar(100)
RBC_EC_EVENT		varchar(100)
RBC_RISKQUESTION		varchar(2500)
BACKOUT_METHOD		long varchar(32700)
DESCRIPTION			long varchar(32700)
and getting the following errors

Code: Select all

Input_File,0: Failure during execution of operator logic.
Input_File,0: Internal Error: (length <= APT_PMcontrolService::maxMessageSize()):processmgr\errorroute.C: 354: length = 99806, maxMessageSize = 65520
node_node1: Player 1 terminated unexpectedly.
main_program: Unexpected exit status 1main_program: Step execution finished with status = FAILED.
Since the long varchar fields have embedded terminators and other fun stuff, the job has been built like this:

Seq >>> XFM >>> Column Import >>> Seq

The sequence reads the whole record as a single column, the column import splits the record into component columns.

Is the job bombing out on the input file being too long :?:

Can someone point me in the right direction?

Rob W
Rob Wierdsma
Toronto, Canada
bartonbishop.com
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Re: Problems reading a file with large columns

Post by rwierdsm »

It appears that there were two things going on.

In the initial read, I was reading the whole input as single column. I had defined a character string as a row delimiter and then the same string as the column delimiter, this is what DS was unhappy about.

When the job aborted, the data was too big for the message handler, giving me the messages above. Once I cut the incoming data down to a more managable size, I started getting the error that DS was REALLY unhappy about, debugged that, bumped the incoming data back up again. Now DS is happy (until the next thing....)

Rob
Rob Wierdsma
Toronto, Canada
bartonbishop.com
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

What is the next error you got exactly- after you ran the job with a managable size of data.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

Hi, Guru,

Here it is.

Code: Select all

Copy_of_Input_File,0: Error reading on import: 
Copy_of_Input_File,0: Consumed more than 100000 bytes looking for record delimiter; aborting
Copy_of_Input_File,0: Import error at record 1.
Copy_of_Input_File,0: Operator's runLocally() failed.
APT_CombinedOperatorController,0: Operator's runLocally() failed.
APT_CombinedOperatorController,0: Operator terminated abnormally: runLocally did not return APT_StatusOk
main_program: Step execution finished with status = FAILED.
Seems that setting the record delimiter and the single column delimiter (to the same string) made DS unhappy.

Rob
Rob Wierdsma
Toronto, Canada
bartonbishop.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Typically, with delimited files, the record delimiter is the line terminator. Did you set the "final delimiter" property; if so, to what?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

Ray,

A brief history....

This is the file I have posted about earlier in the Server forum (when I was trying to solve our design issue in Server!).

In short, we have a file with huge columns, 32k, and embedded weirdness, including terminators of all flavours and potentially any character you can think of.

The source guys have cooked up a field delimiter {~^|^~}. We are trying to determine a record delimiter, currently experimenting with ||^^||.

The sequence stage is reading the whole record (there will be multiple record types in the extract we receive), looking for ||^^||. We have set the record delimiter string property to ||^^||. I have not set the final delimiter property for the record. When I received the error above, I also had the Column property of delimiter string to ||^^||. Once I removed the column property, DS was much happier.

The idea here is to read the whole file with the sequential file stage, read the first 3 characters to determine record type, split the stream into the 40+ record types and then deal with each record type individually, using the {~^|^~} delimiter to break records down to fields. In this prototype we were able to break down to fields with the column import stage.

Now on to see if the EE TeraData MultiLoad stage deals with the data as smoothly as the one in server did!

Sounds like fun! :)

Rob
Rob Wierdsma
Toronto, Canada
bartonbishop.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I believe only single-character delimiters are handled. Have you tried a non-printing delimiter, such as Ctrl-Y (025)?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

I believe only single-character delimiters are handled.
In the end, it was OK with the 6 character record delimiter string ||^^||, so long as I didn't also define the column delimiter to the same characters (at this point of the flow, I'm treating the whole input record as a single column). Later on, it is also OK with the column delimiter {~^|^~} when I break it up into the individual columns.

The initial error I talk about at the top of this thread happened because I had both record delimiter and field delimiter set to ||^^||. The job bombed out, but didn't like the amount of data, so gave an error message about MaxMessageSize instead. When I removed the big columns, the real error was revealed.

I get the feeling I'm being as clear as mud on this. If anyone is interested in clarification, I will post more info.

Don't forget to wear Orange on Wednesday! Holland vs Argentina!

Rob W.
Rob Wierdsma
Toronto, Canada
bartonbishop.com
Post Reply