Page 1 of 1

Grid Configuration

Posted: Wed Jun 23, 2010 12:31 am
by sudeepmantri
Hi,

Can anyone of you let me know What is GRID configuration in Datastage and how does it help to improve the overall performance of a DS Job (Parallel Extender).

To the extent I know, It enables DSENV to select a configuration file dynamically but does it really impact on the performance of the job?

Thanks in advance,
Sudeep

Posted: Wed Jun 23, 2010 1:20 am
by ray.wurlod
No real performance difference over an equivalently configured cluster of machines. However, the grid management software means that you can get more processors if they are free of other tasks, and can also manage events such as losing one of the machines completely - it simply does not schedule tasks onto that machine. In a cluster you have a fixed configuration file and would get fatal errors at startup if you lost one of the machines.

Posted: Wed Jun 23, 2010 7:14 am
by chulett
Something from IOD 2006 here. Note this is a direct link to a pdf document rather than a web page.

Posted: Wed Jun 23, 2010 9:23 am
by Sreenivasulu
Hi Craig,
Went through the Doc:
There is a tab for GRID in Datastage Adminstrator. Is there a seperate plugin to make this enabled?
Regards
Sreeni

Posted: Wed Jun 23, 2010 9:57 am
by kduke
A cluster has to maintain a heartbeat between machines too. This can cause issues if the machines are really paging. A Grid is much easier to expand. Cluster is very expensive to expand. So even though they look similar. Grid has advantages over a cluster in ETL. The opposite may be true for database servers.

Posted: Wed Jun 23, 2010 12:21 pm
by asorrell
Ok - I understand the heartbeat issue. But why is one more expensive than another? Isn't it basically a license of IIS in either case?

Posted: Wed Jun 23, 2010 2:00 pm
by lstsaur
Kim,
You are aboslutely right, GRID is so easy to expand and it provides horizontal scalability. It's the lowest cost solution for high performance.

Posted: Wed Jun 23, 2010 3:05 pm
by kduke
Cluster is a tightly coupled group of servers if you will. Grid is usually a bunch of PCs running linux. Bang for the buck is always on the PC side because PCs are manufactured for millions of people. Clusters are a lot less volume as far as sales. The cost of R&D for clusters has to be spread across fewer sales. Clusters are more complex as far as hardware and software.

There is a trade off though. It takes more admins to run a grid in theory. Most companies do not have the same ratio of admins to servers on a grid. It does not make sense either economically or phsycially. Grids are usually in a rack. One fails it goes offline until it can be replaced. They are just plug and play at this point. You unplug the bad CPU and plug in a good one. They sort of slide in and out. I have not worked on a grid this is all from listening to others explain how they work. The software to manage a grid is very complex. So in some ways a grid is a much more complex solution. The software hides this complexity so it is not hard to admin. IBM is sort of shooting themselves in the foot because it is so much more of a cost effective solution that they will eventually hurt their own sales. Right now most companies are afriad of grids. Lots more computers to manage. So the admin group has to buy into the solution. It looks like more work for them so most say no. When the companies can afford the clusters then they usually buy them. At least that is my perception of the problem.

Posted: Wed Jun 23, 2010 3:10 pm
by kduke
Andy are you talking about cost of DataStage or the overall cost?