the record is too big to fit in a block

dodda · Post by **dodda** » Thu Apr 23, 2009 3:32 pm

Hello

I have a job design which reads a seqeuntial file where we are reading each record as a single line and breaking those records into multiple columns with column import stages and we are building the XML chunks for every record and finally we are joining all those chunks to produce a single big xml file.

When i tried to run the job with less data my job is working fine. but when iuse more data the datstage abends the error from the datastage log in as below

APT_CombinedOperatorController(14),0: Internal Error: (!(stat.statusBits() & APT_DMStatus::eRecordTooBig)):api/dataset_rep1.C: 1685: Virtual data set.; output of "inserted tsort operator {key={value=CustomerNumber, subArgs={asc, cs}}}": the record is too big to fit in a block;
the length requested is: 142471.
Traceback: msgAssertion__13APT_FatalPathFPCcRC11APT_UStringPCci() at 0xd47ffd70
putRecordToPartition_grow__14APT_DataSetRepFUi() at 0xd6d28a38
putRecord_nonCombined__14APT_DataSetRepFb() at 0xd6d25d94
putRecord__16APT_OutputCursorFv() at 0xd6f1a208
writeOutputRecord__17APT_TSortOperatorFv() at 0xd4c0c46c
runLocally__30APT_CombinedOperatorControllerFv() at 0xd6f49c28
run__15APT_OperatorRepFv() at 0xd6e8720c
runLocally__14APT_OperatorSCFv() at 0xd6e73bbc
runLocally__Q2_6APT_SC8OperatorFUi() at 0xd6efea44
runLocally__Q2_6APT_IR7ProcessFv() at 0xd6f818c8

Is there any setting that i need to add? while building the xml chunks i used Longvarchar as datatype with lengh being empty.

I have gone through forums and it was discussed that APT_DEFAULT_TRANSPORT_BLOCK_SIZE variable needs to be set.

If so where should i define this environmental variable in administrator?
there are Reporting environmental variables, Operator specific,compiler specific and User defined env varibales.

In which section should i define that value and to which value should i set that?

Thanks for your help

ray.wurlod · Post by **ray.wurlod** » Thu Apr 23, 2009 4:43 pm

It doesn't matter where you set it, though the Administrator client only lets you create environment variables in the User Defined folder.

dodda · Post by **dodda** » Fri Apr 24, 2009 8:06 am

Hello Ray,

When i tried to add APT_DEFAULT_TRANSPORT_BLOCK_SIZE environmental variable through administrator it says the variable already exists. But when i looked in the list of variables it is not there. Is there a way that this variable can be configured.

Thanks
dodda

ray.wurlod · Post by **ray.wurlod** » Fri Apr 24, 2009 3:19 pm

Look better.

It's in the Parallel folder (not in any of its sub folders).

Default default value is 131072.

rcanaran · Post by **rcanaran** » Thu Nov 26, 2009 3:20 pm

I've reviewed the following posts :
http://dsxchange.com/viewtopic.php?p=23680046
viewtopic.php?t=109868
viewtopic.php?t=109896
viewtopic.php?t=126730

The job is creating XML using parallel xml output stages. The first stage creates individual xml records (chunks) and the stage where its failing is and xml output that aggregates all rows based on a key.

I've tried setting the following in the JOB parameters of a parallel job (DS 7.5.1, on AIX) :

$APT_MAX_TRANSPORT_BLOCK_SIZE 268435456
$APT_MIN_TRANSPORT_BLOCK_SIZE 268435456
$APT_DEFAULT_TRANSPORT_BLOCK_SIZE 268435456

For max transport block size, the help text indicates that the max is 1,048,576. Which I would think would mean its the max for the default block size as well. But if I code 300000000 or the default block size, director issues a WARNING that it is setting the default to 268435456, which is the max. Don't know what the REAL max is for these environment variables. NO Warning is issued for using the value 268435456 for all 3 variables.

But I still get a fatal error :
APT_CombinedOperatorController(1),0: Fatal Error: File data set, file "{0}".; output of "xmoPymntAggr": the record is too big to fit in a block; the length requested is: 500048.

I can take out the MIN and MAX block sizes, but I still get the same error.

ray.wurlod · Post by **ray.wurlod** » Thu Nov 26, 2009 3:50 pm

It may not be transport blocks you need to change. These are only used for internal transmission of fixed length records. It may be buffer sizes you need to tune. Be very, very careful with these - tuning them through environment variables affects every buffer (link) in the job. It may be better, if possible, to tune buffering per-link.

rcanaran · Post by **rcanaran** » Thu Nov 26, 2009 4:11 pm

Thanks Ray.

In the xml output stage, Output tab, advanced, cahnged buffering from automatic. Increase max buffer by adding 0 on the end (10 x) to from 3145728 to 31457280. Did the same for disk write increment. Changed from 1048576 to 10485760. Dies in exactly the same place. Same message.

Will try more tuning tomorrow.

rcanaran · Post by **rcanaran** » Fri Nov 27, 2009 9:17 am

Ran into the low watermark message as per viewtopic.php?t=128506.

Is there any way for me to see what values DS is actually using at run time?

ray.wurlod · Post by **ray.wurlod** » Fri Nov 27, 2009 2:25 pm

There are a few more reporting/tracing hooks (some enabled via environment variables) than are documented. Your official support provider should be able to guide you.

rcanaran · Post by **rcanaran** » Fri Nov 27, 2009 4:13 pm

No parameter setting seemed to work. I was building detail xml chunks in one xml output stage and aggregating in the next (copy stage in between). I've done this before to control the way the xml is formed and haven't encountered this problem before.

I'm still verifying that the generated xml conforms to the schema (requires several more chunks to be built first), but, it appears that I can work around this by aggregating in the first stage. I also set all parameters and variables back to default. (all block size environment variables were deleted and link buffer parameters were reset to default).

I don't know if this applies to the original poster, but so far, this works for me.

rcanaran · Post by **rcanaran** » Mon Nov 30, 2009 12:49 pm

The problem returns when I try to send the output to another stage or to a Dataset Stage. Sending the output to a Sequential File Stage is what solved the problem. What APPEARED to be the solution from the previous post was coincidental as that output was written to a sequential file. As soon as I changed it back to a dataset, the problem reappeared.

So far, adjusting the max/min and default transport block sizes didn't seem to help. Neither did adjusting the buffer parameters on the link. I even tried using NO BUFFER on the link, but this didn't help.

gpatton · Post by **gpatton** » Mon Nov 30, 2009 1:31 pm

You cannot write records to a datasets which have bigger length than the blocksize of the dataset which is by default 128K. You can change that though by setting APT_PHYSICAL_DATASET_BLOCK_SIZE.

rcanaran · Post by **rcanaran** » Mon Nov 30, 2009 1:40 pm

Thanks. That seemed to work.