Grid Configuration

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sudeepmantri
Participant
Posts: 54
Joined: Wed Oct 25, 2006 11:07 pm
Location: Hyderabad

Grid Configuration

Post by sudeepmantri »

Hi,

Can anyone of you let me know What is GRID configuration in Datastage and how does it help to improve the overall performance of a DS Job (Parallel Extender).

To the extent I know, It enables DSENV to select a configuration file dynamically but does it really impact on the performance of the job?

Thanks in advance,
Sudeep
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No real performance difference over an equivalently configured cluster of machines. However, the grid management software means that you can get more processors if they are free of other tasks, and can also manage events such as losing one of the machines completely - it simply does not schedule tasks onto that machine. In a cluster you have a fixed configuration file and would get fatal errors at startup if you lost one of the machines.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Something from IOD 2006 here. Note this is a direct link to a pdf document rather than a web page.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Hi Craig,
Went through the Doc:
There is a tab for GRID in Datastage Adminstrator. Is there a seperate plugin to make this enabled?
Regards
Sreeni
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

A cluster has to maintain a heartbeat between machines too. This can cause issues if the machines are really paging. A Grid is much easier to expand. Cluster is very expensive to expand. So even though they look similar. Grid has advantages over a cluster in ETL. The opposite may be true for database servers.
Mamu Kim
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Ok - I understand the heartbeat issue. But why is one more expensive than another? Isn't it basically a license of IIS in either case?
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Kim,
You are aboslutely right, GRID is so easy to expand and it provides horizontal scalability. It's the lowest cost solution for high performance.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Cluster is a tightly coupled group of servers if you will. Grid is usually a bunch of PCs running linux. Bang for the buck is always on the PC side because PCs are manufactured for millions of people. Clusters are a lot less volume as far as sales. The cost of R&D for clusters has to be spread across fewer sales. Clusters are more complex as far as hardware and software.

There is a trade off though. It takes more admins to run a grid in theory. Most companies do not have the same ratio of admins to servers on a grid. It does not make sense either economically or phsycially. Grids are usually in a rack. One fails it goes offline until it can be replaced. They are just plug and play at this point. You unplug the bad CPU and plug in a good one. They sort of slide in and out. I have not worked on a grid this is all from listening to others explain how they work. The software to manage a grid is very complex. So in some ways a grid is a much more complex solution. The software hides this complexity so it is not hard to admin. IBM is sort of shooting themselves in the foot because it is so much more of a cost effective solution that they will eventually hurt their own sales. Right now most companies are afriad of grids. Lots more computers to manage. So the admin group has to buy into the solution. It looks like more work for them so most say no. When the companies can afford the clusters then they usually buy them. At least that is my perception of the problem.
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Andy are you talking about cost of DataStage or the overall cost?
Mamu Kim
Post Reply