PX with Sequential File

Cr.Cezon · Post by **Cr.Cezon** » Thu Mar 29, 2007 8:30 am

if i have a parallel job and I use a sequential file to read data and to write data, i have no better performance than using a server job, isn't?

regards,
Cristina

DSguru2B · Post by **DSguru2B** » Thu Mar 29, 2007 9:24 am

How many nodes are you using? and on how many cpus? What about the disk utilization. A lot of factors involved.

ray.wurlod · Post by **ray.wurlod** » Thu Mar 29, 2007 3:55 pm

What's happening in between reading and writing? Parallel processing may assist if some heavy duty transformation is occurring.

You can allocate multiple readers to the file - particularly effective if the file has fixed width records - and achieve parallelism in reading.

Unfortunately the operating system limits what we can do at the other end - "one file, one writer" is the rule here.

Cr.Cezon · Post by **Cr.Cezon** » Fri Mar 30, 2007 4:52 am

I have 2 nodes in configuration file.
I have 4 cpus.

if a haven't do a lot of transformatios y better use parallel than server.

I think that if you the job imports data to memory , do transforms , and write data in seq, becouse of working in sequential mode, is no better performance.

regards,
Cristina

vmcburney · Post by **vmcburney** » Fri Mar 30, 2007 5:25 am

If you don't mind support both server and parallel jobs on your site the server one will be a low fuss method - less warning messages for sequential file data. Parallel job becomes better if you have a sort requirement or you have more than a couple stages between the input and output.

2 nodes and 4 CPUs? Shouldn't you at least have one node per CPU?

ray.wurlod · Post by **ray.wurlod** » Fri Mar 30, 2007 6:09 am

Two nodes is exactly right for a development environment. If it works on two it works on more. Number of CPUs is irrelevant - sufficiently complex jobs will use all of them even with two nodes, in an SMP ("shared everything") environment.

Use two readers per node when reading the file. For larger files you will notice (a) parallelism in the Sequential File stage (check that this occurs) and (b) faster completion time.

Cr.Cezon · Post by **Cr.Cezon** » Fri Mar 30, 2007 6:48 am

What iexactly nodes refers to?
I think that it is for pationating process in n process in s.o. and for use diferrents memory zones.
If you hav more or less cpus it doesn't matter, isn't?

ray.wurlod · Post by **ray.wurlod** » Fri Mar 30, 2007 5:02 pm

A "processing node" is a logical concept - a subset of your available processing power and resources such as memory and disk. The degree of parallelism is determined by the number of nodes defined in the current configuration file.

It is unrelated to the number of CPUs - it may be less, it may be slightly more. The number you choose will be a function of the resources demanded by the composed version of your job design, and the number of jobs that you might want to run at once.

Cr.Cezon · Post by **Cr.Cezon** » Tue Apr 17, 2007 5:01 am

please Ray,
i can't see your entire message.
could you please tell me another time.

Regards,
Cristina

ray.wurlod wrote:A "processing node" is a logical concept - a subset of your available processing power and resources such as memory and disk. The degree of parallelism is determined by the number of nodes defined in ...

ray.wurlod · Post by **ray.wurlod** » Tue Apr 17, 2007 5:08 am

For only a few cents per day you can purchase a premium membership that allows you to see the premium posts in full, and helps to fund the bandwidth required to sustain DSXchange. Maybe your employer would buy it for you. There is a link on the home page to a page on which corporate discounts can be found.