error in px

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
samsuf2002
Premium Member
Premium Member
Posts: 397
Joined: Wed Apr 12, 2006 2:28 pm
Location: Tennesse

error in px

Post by samsuf2002 »

Hi
i am running a job in parallel it is getting aborted and giving this fatal "Consumed more than 100000 bytes looking for record delimiter; aborting" i am using SQLserver my job contains change capture and lookup.

Plz help
sam
hi sam here
samsuf2002
Premium Member
Premium Member
Posts: 397
Joined: Wed Apr 12, 2006 2:28 pm
Location: Tennesse

Post by samsuf2002 »

delimeter i am using is pipe
hi sam here
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are you sure that you've specified the delimiter character correctly in the job design? (Remember that there are three delimiters; field delimiter, final delimiter and record delimiter.)

The message suggests that the job has read 100000 bytes without even finding one delimiter character. I'd give up too! Do you have extremely large VarChar data types?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

Is anyone aware of a setting that will bump up the 100000 bytes to a higher amount?

I need to import records that may have several 32k fields, possibly going over the 100000 byte limit.

Yikes :!:
Rob Wierdsma
Toronto, Canada
bartonbishop.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you get the delimiter right it shouldn't matter, unless one field itself will exceed that limit. Double-eek!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

I'm reading in the whole record as one string and breaking it up into many fields later on. It is entirely possible that one field(record) will be longer than 100000 bytes.

I've found an evironment variable that is supposed to allow for a bigger limit

APT_MAX_DELIMITED_READ_SIZE

I'll let you know how it works.

Rob W
Rob Wierdsma
Toronto, Canada
bartonbishop.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Reading one field will not be searching for a field delimiter, so you should be fine.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

We set APT_MAX_DELIMITED_READ_SIZE to 150000 from it's default of 100000 and the job read in our singular file.
Rob Wierdsma
Toronto, Canada
bartonbishop.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Thanks for letting us know.

My reaction (yuk) is unchanged.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

ray.wurlod wrote:Thanks for letting us know.

My reaction (yuk) is unchanged.
Can't disagree, but it's what I'm stuck with!

Rob W.
Rob Wierdsma
Toronto, Canada
bartonbishop.com
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

Further to this thread....

We are now approaching the end of our testing phase and have needed to revisit the values for APT's mentioned above, so I'd like to document our findings.

To briefly summarize, we have a file that we read in as a single wide column, splitting into component columns later in processing. As some of these component columns can be quite wide, up to 32k, the single wide column approaches 200k bytes in length.

To accommodate this file, we are reading with a sequential stage. The DS job needed to have two environment variable set:
  • - APT_DEFAULT_TRANSPORT_BLOCK_SIZE
    - APT_MAX_DELIMITED_READ_SIZE
During our latest tests, the source application sent us a file where all fields are filled in to the max size. Our initial settings for APT_MAX_DELIMITED_READ_SIZE of 200000 were insufficient, we received the following log message:

Code: Select all

Input_File,3: Consumed more than 100000 bytes looking for record delimiter; aborting
Don't know why the log message referred to 100000 bytes when our setting was for 200000 bytes and this actually caused us to suspect that DS was ignoring the setting, however, we continued to play with the values.

After many runs we found that we needed to increase our setting for this value to 550000 (much bigger than our actual rows size - don't know why!)

When the message above was cleared, we found we were getting the following log entry:

Code: Select all

Input_File,3: Fatal Error: Virtual dataset; output of "Input_File": Record too big to fit in a block; length requested: 133173
This prompted us to add an entry for APT_DEFAULT_TRANSPORT_BLOCK_SIZE and set it to 262144 (256k).

This seems to have made DataStage quite happy.

Rob W.
Rob Wierdsma
Toronto, Canada
bartonbishop.com
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

Very recently I came across the same issue regarding dataset record size and increased the APT_DEFAULT_TRANSPORT_BLOCK_SIZE to resolve it. What I found was even if I don't have records that long while running the job with test data (because in anticipation of putting field lengths correctly to handle actual data I had specified varchars of big length) datastage allocates the total length in memory. So a varchar is not actually a varchar since internally it is all C data types. And even worse, the job execution time varies drastically because with longer data lengths it runs into buffering which slows things down.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Transport blocks apply only for fixed-length records and only between player processes. I guess by the time you're in a virtual Data Set you effectively do have fixed-length records, because bounded VarChar is stored as prefix bytes and full allocation of character space.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply