Going from cluster environment to Grid

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
Druid_Elf
Participant
Posts: 32
Joined: Thu Aug 28, 2008 5:53 am

Going from cluster environment to Grid

Post by Druid_Elf »

Hi all,
We are currently starting a project to go to a complete new infosphere environment. Our current environment is 8.1 running in a cluster (on solaris) and our goal is to end with a grid implementation running on 8.7 (on RHEL, including new hardware).

Now I've read the webinar and the redbook on grid implementation (sg247625) but I still have some questions and I was hoping someone could provide the answer:

1. You have a Grid Enablement Toolkit, which coordinates all activities, yet what I gather from the redbook is that you also have to have a resource Manager. Now is ganglion enough to act as a resource manager or can it only act as a resource monitor ? If it can only be used as resource monitor, what is a good open source resource manager ? SGE ?

2. While the webinar and certain parts of the redbook let me to believe there is no impact on existing jobs (for the grid part), an other section explain how every jobs needs to be changed to include 3 APT parameters (APT_GRID_ENABLE, APT_GRID_COMPUTENODES, APT_GRID_PARTITIONS). Can anyone confirm if their is impact to existing jobs and in what extend ?

If anyone knows some good links to guides on migrating to grid (except for the webinar and the redbook) any help will be greatly appreciated.

And thanks to all who can provide some insight :)
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Hi,

I'm in an 8.1 / 8.5 Grid envirnment. We use Platform LSF, not SGE or Load Lever.

I believe Ganglia is your grid monitor, not the engine to submit jobs.

The Grid Enablement Toolkit will provided the link between Datastage and Grid. All you are doing is basically dyncamically creating an APT file in a load balanced fashion.

Your compute nodes need all of the tools your environment needs. Informix CLI, Oracle client tools, Teradata, DB2, etc...

SSH keys if you use them.

Don't skimp on your core count on the head node. Enforce standards up from with your dev team to avoid using the head node as a number cruncher (zip) or data mover (ftp). Farm that off onto the grid.

All of your compute nodes and head node need to have access to the same file system (Network Attached Storage).

You want to set up your job submittion to span servers when you select more than one APT_GRID_COMPUTENODES, meaning a value of 2 should be spread accross 2 servers. Otherwise you would just increase Partitions.

If you have more questions, send me a note. I don't mind helping out.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

1) Ganglion is simply a resource monitor, as indicated by Paul. SGE is supported as a resource manager, plus a few others which should be listed in the Redbook.

2) The addition of the three parameters listed allow you to request the required Grid resources at the job level rather than relying solely on project-level settings. All part of proper computing resource management principles. They do not impact job operation other than being used to generate the config file used to run the job. As you are currently using a cluster for Information Server, you and your developers should already be aware of the effects of job design and requested cluster resources on your job performance. In this regard, a grid is very similar to a cluster (in fact is built on top of a cluster).

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

In a grid environment, EVERYTHING is shared. Cluster env. shares NOTHING.
APT_CONFIG file is generated dynamically in a grid env. In a cluster env., it's back to using static APT_CONFIG file.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

To clarify, in a cluster environment you CAN share most everything, although not everyone sets up an IS cluster that way. An IS grid is a form or type of cluster...probably a more accurate way to state it.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Druid_Elf
Participant
Posts: 32
Joined: Thu Aug 28, 2008 5:53 am

Post by Druid_Elf »

Thank you all for the information. Does anyone know if there is a good opensource resource manager ?

We currently don't have any experience with the ones listed in the redbook. In the redbook it is mentioned that every tool has some additional options, but that the core functionality for running jobs in grid is the same for all.
Is this true ?
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

You don't just want any open source product.

I would recommend load leveler, Platform LSF or SGE, because I know they have special code incorporated into the Grid Enablement Toolkit.

To be honest, I'm probably the only shop using Platform at the moment. IBM Load Leveler is of course the popular choice since everyone likes the taste of that cool-aid. :O
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Those resource managers listed in the Redbook are the only ones supported by the Grid Toolkit. Core functionality != user/code interface. Each resource manager has a unique interface (functions/commands) which must be supported.

LSF is just another flavor of the coolaid ;)

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Now it is since you bought it.
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Back in Ascential time, PBS Pro. is recommended by Ascential. Now since IBM bought out the Ascential, so IBM Load Leveler is the one that IBM recommends. Personally I like the PBS Pro. because it's very intuitive and its Accounting feature which is exactly what I need to do the cost charge-back work. I learned how to use the product within 3 days.

Which product is the best? It depends on who has the best sales person a lot.
Post Reply