PX with Sequential File

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

PX with Sequential File

Post by Cr.Cezon »

if i have a parallel job and I use a sequential file to read data and to write data, i have no better performance than using a server job, isn't?

regards,
Cristina
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

How many nodes are you using? and on how many cpus? What about the disk utilization. A lot of factors involved.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What's happening in between reading and writing? Parallel processing may assist if some heavy duty transformation is occurring.

You can allocate multiple readers to the file - particularly effective if the file has fixed width records - and achieve parallelism in reading.

Unfortunately the operating system limits what we can do at the other end - "one file, one writer" is the rule here.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

Post by Cr.Cezon »

I have 2 nodes in configuration file.
I have 4 cpus.

if a haven't do a lot of transformatios y better use parallel than server.

I think that if you the job imports data to memory , do transforms , and write data in seq, becouse of working in sequential mode, is no better performance.

regards,
Cristina
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

If you don't mind support both server and parallel jobs on your site the server one will be a low fuss method - less warning messages for sequential file data. Parallel job becomes better if you have a sort requirement or you have more than a couple stages between the input and output.

2 nodes and 4 CPUs? Shouldn't you at least have one node per CPU?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Two nodes is exactly right for a development environment. If it works on two it works on more. Number of CPUs is irrelevant - sufficiently complex jobs will use all of them even with two nodes, in an SMP ("shared everything") environment.

Use two readers per node when reading the file. For larger files you will notice (a) parallelism in the Sequential File stage (check that this occurs) and (b) faster completion time.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

Post by Cr.Cezon »

What iexactly nodes refers to?
I think that it is for pationating process in n process in s.o. and for use diferrents memory zones.
If you hav more or less cpus it doesn't matter, isn't?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A "processing node" is a logical concept - a subset of your available processing power and resources such as memory and disk. The degree of parallelism is determined by the number of nodes defined in the current configuration file.

It is unrelated to the number of CPUs - it may be less, it may be slightly more. The number you choose will be a function of the resources demanded by the composed version of your job design, and the number of jobs that you might want to run at once.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

Post by Cr.Cezon »

please Ray,
i can't see your entire message.
could you please tell me another time.

Regards,
Cristina
ray.wurlod wrote:A "processing node" is a logical concept - a subset of your available processing power and resources such as memory and disk. The degree of parallelism is determined by the number of nodes defined in ...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

For only a few cents per day you can purchase a premium membership that allows you to see the premium posts in full, and helps to fund the bandwidth required to sustain DSXchange. Maybe your employer would buy it for you. There is a link on the home page to a page on which corporate discounts can be found.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply