Page 1 of 1

Difference between Cluster & Grid Architecture

Posted: Sat Jan 08, 2011 9:24 am
by ppgoml
Hi all,

maybe this is a stupid question,howerver,I get a little confused with datastage cluster environment and datastage grid environment. Could someone tell me if they are the same concept or what are the differences between them.

I know datastage engine could be distributed to other machines to build an MPP environment, is that a datastage cluster or a grid?

Thanks for clarifying it to me.

Posted: Sat Jan 08, 2011 3:37 pm
by ray.wurlod
As far as DataStage is concerned, MPP = cluster, which has a fixed number of machines. A grid has a dynamically variable number of machines, with available resources being managed by some form of grid management software.

Posted: Sat Jan 08, 2011 4:10 pm
by daignault
A Cluster is a group of Datastage servers where The admin/designer designs the APT file to use the compute node resources available.

A Grid uses software to auto-gen the APT file to use a subset of resources based on compute node use.

If you have a group of 25 compute nodes, the designer would write his APT file to use 4 compute nodes. If those nodes are busy, Datastage would still try and use those nodes. Under the GRID software, it will look at the utilisation of the compute nodes on the grid and pick which machines are underused and dispatch the job to these nodes.

In addition you can have different software running on the same grid compute nodes...such as SAS, DStage, etc. So a SAS batch job could be dispatched to 2 of the compute nodes which would remove them for contention for a Datastage job..

Cheers,

Ray D

Posted: Sat Jan 08, 2011 4:24 pm
by lstsaur
DataStage in a grid environment, APT_Configurations file is generated dynamically that means a job was run on node2 and node4 yesterday, but you don't know this same job will be running on which nodes today. Grid enablement toolkit wouldn't work in a clustered environment since nothing is shared.

Posted: Sat Jan 08, 2011 9:33 pm
by ppgoml
Thanks for all your input. It's much clear to me now.

Posted: Mon Mar 28, 2011 3:35 pm
by Terala
lstsaur wrote:DataStage in a grid environment, APT_Configurations file is generated dynamically that means a job was run on node2 and node4 yesterday, but you don't know this same job will be running on which nodes today. .
lstsaur : how are DataSets and filesets managed when a job runs on different nodes everytime depending on the available compute nodes?

Posted: Mon Mar 28, 2011 4:35 pm
by ray.wurlod
Data Set and File Set descriptor files include the configuration with which they were written. When it comes time to read them, a virtual "read only" descriptor is created from this. (You can accomplish the same thing using the -x option with orchadmin command.)

Posted: Mon Mar 28, 2011 5:41 pm
by lstsaur
Also remember in a grid (grid computing), EVERYTHING is shared.