Job using dataset files is slower than sequential files

ArndW · Post by **ArndW** » Fri Feb 09, 2007 3:21 am

splayer wrote:...Both versions output the file to the same folder...

No, probably just your dataset descriptor and sequential file are in the same directory. You should try writing your sequential file to the directory pointed to by the resource disk setting in your APT_CONFIG file. That way you are comparing performance on the same disk partition and removing potentional I/O differences from the equation.

patonp · Post by **patonp** » Fri Feb 09, 2007 8:13 am

I've read in another post that datasets containing bounded-length varchar fields can grow to be be quite large as they allocate almost the full amount of space defined, even when only a few characters of the varchar field are actually used.

Is the total size of your datasets much larger than your sequential file? (i.e. could the I/O involved writing out a larger set of files be the cause of your performance discrepancy?)

splayer · Post by **splayer** » Fri Feb 09, 2007 11:32 am

Arndw, when I changed my resourcedisk path to the Datasets folder where datasets are created, the time taken for sequential file was the same as for data set files, 69secs. So depending on the folder, performance varies. Do you have any idea as to why it might be?

I would think that dataset version should at least be a few seconds faster.

ArndW · Post by **ArndW** » Fri Feb 09, 2007 11:52 am

splayer - look at the filesystems and options used for your two partitions.

splayer · Post by **splayer** » Fri Feb 09, 2007 1:33 pm

This is my job:
SeqFile --> Modify --> SurrogateKeyGenerator --> Transformer --> TargetFile(Dataset stage)

To answer balajisr's question, partitioning is set to Auto throughout.

ray.wurlod · Post by **ray.wurlod** » Fri Feb 09, 2007 3:45 pm

Arnd means to look at the hardware. For example, is one directory on local disk and the other in a SAN?

kumar_s · Post by **kumar_s** » Fri Feb 09, 2007 11:03 pm

That particular directory can be of different mount point, which might have network congestion.

splayer · Post by **splayer** » Sat Feb 10, 2007 10:31 pm

Ray, pardon my ignorance about hardware but what does SAN stand for?

DSguru2B · Post by **DSguru2B** » Sat Feb 10, 2007 10:40 pm

Storage Area Network

chulett · Post by **chulett** » Sat Feb 10, 2007 10:53 pm

Not to be confused with NAS.

ray.wurlod · Post by **ray.wurlod** » Sun Feb 11, 2007 4:54 am

DSguruji wrote:Storage Area Network

More usually Storage ARRAY Network - an array of storage devices (disks) connected with intelligent controllers so that they can be managed as a single entity or partitioned as required.