DataStage 11.3 with GRID on on RHEL Virtual Machines

bgeddam · Post by **bgeddam** » Mon Feb 29, 2016 2:08 pm

Experts,
Have any of you implemented grid using RHEL Virtual server as your both conductor and compute nodes?
1. What are the down side of implementing VMs for Compute nodes?
2. DO we need Dedicated network for Compute nodes?
3. What kind of NFS you have implemented? NAS, GPFS etc

Thanks in advance

[Note - Subject corrected to 11.3 - Andy]

ray.wurlod · Post by **ray.wurlod** » Mon Feb 29, 2016 4:22 pm

Welcome aboard.

I will be implementing a similar topology some time in the next three months, though a smaller installation so compute and conductor nodes will probably be on the same VM. We're still designing this.

Please note that there is no version 11.4. There is 11.3 and 11.5.

bgeddam · Post by **bgeddam** » Mon Feb 29, 2016 4:55 pm

Yes, its DS version 11.3.

Are you going to have multiple Compute nodes and one compute node will also act as Conductor node?

any cautions while implementing Grid? as we are migrating 1000s of jobs from SMP to Grid.

Thanks
Bhupal

PaulVL · Post by **PaulVL** » Mon Feb 29, 2016 5:45 pm

1000s of jobs means nothing to grid if they are all run sequentially.

Just because you have a lot of jobs, doesn't mean you need to go grid. (1000 is not a lot in my book).

Concurrency and growth are the two big reasons for GRID.

How many compute nodes and what are their size?

Have you thought of 1 big host instead of a grid of smaller ones?

Few people here will tell you what you need or do not need. They will mostly ask you to ask yourself some questions, because we do not know your environment or SLA requirements.

We can recommend settings that are obvious like "Use a 10Gb network since it's faster than a 1gb Network and you will be pumping data left and right.."
Add lots of RAM to your boxes since it's cheap and you don't have to PVU license it.

Obviously dedicated network/hw is best. A big glass pipe between your ETL box and the database would always be nice. But cost of setting up the perfect environment is a big tradeoff.

I'm not a big fan of VMs on an ETL environment. That's just me. I like VMs for many other things, but speed ain't one of them.

Vms will also pose a problem on your grid if you shift the compute node to a new IP address (yet the hostname stays the same) Platform LSF cashes IP addresses to speed things up. So... compute nodes on VMs might mess you if they change IP addresses under the covers.

But... that's not to say that a big VM grid can't be made.

I know of folks running datastage grid off of VMs. works.
Head Node is HW, Compute Nodes are VMs.

bgeddam · Post by **bgeddam** » Mon Feb 29, 2016 8:57 pm

One Big Server: This is what we have now. One heavy job caused some file system issues that eventually brought down the entire engine causing impact all Business units (BUs) using this environment.

Now BUs want their own Engine so that so will not get impacted due to some run away job from other BUs.

If not grid, how do we manage Engine resources effectively?

ray.wurlod · Post by **ray.wurlod** » Mon Feb 29, 2016 9:33 pm

No problem. Give them their own Engine, managed from the one Services tier as the other Engine(s).

Use Workload Management to prevent overload on any machine (whether you stay with one, or choose multiples).

I don't believe that a grid implementation will solve the particular problem that you have just indicated.

qt_ky · Post by **qt_ky** » Tue Mar 01, 2016 9:14 am

I'm not a big fan of virtual machines. You are going to pay a performance penalty.

Check out this white paper.

Performance Characteristics of IBM InfoSphere Information Server 8.7 Running on VMware vSphere 5.0 on Intel Xeon Processors

http://www.intel.in/content/dam/www/pub ... ere5-paper

PaulVL · Post by **PaulVL** » Tue Mar 01, 2016 10:05 am

You have to understand how a Grid + DataStage works and then see if that setup would solve your issue (as Ray said, it might not).

In a grid there are 1 (or more) Head Nodes that initiate jobs and build an APT file on the fly. The Hosts that are used in the APT file will run the OSH code and crunch the numbers, but the job is still coordinated on the Engine.

If you have multiple projects, they still execute on the Head Node and as such may still impact each other.

Setting up a NEW environment and thus separating the kids, is not grid.

Shared Compute nodes still run the risk of impacting anyone who is currently on that host. If the host goes down, simply rerunning the job will assign it to a new host because the down box will not be a candidate.

If your fundamental error is a file system issue... then you need to debug this stuff more because GRID is not the cure all for any issue.

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Tue Mar 01, 2016 12:38 pm

1) There's a performance penalty for using VM's. It is somewhere between 5% and 10% on average. Historically it used to be 10%, but now I think its closer to 5%. What that means is that you might need more hardware to get the same amount of processing power. In terms of hardware costs, that's not a big deal. In terms of software licensing, it can be millions of dollars to add PVU's. May or may not be a big deal depending on your licensing and how much "headroom" you have.

2) Be careful when configuring your VM's. Some VM's have the ability to "shift" resources around dynamically when required. According to IBM's licensing, if there is the potential for all CPU's to be dynamically added to the VM, then you owe licensing for the whole server, whether you've ever used it or not. Again, could cost you big-time in terms of licensing and penalties after an audit.

3) I just worked with a customer to put in this "Business Unit" separation for the very same reason. The solution gets very complex very quickly. For true isolation they need to have:
- separate head nodes
- separate compute nodes
- separate file systems
- separate project locations
- Security ironed out - do any of your developers cross Business Units? If so, what are the implications of them having access to both platforms.

Per what was said earlier, "just going grid" isn't going to be enough. Almost all shared resources need to be examined and configured for isolation.

sendmkpk · Post by **sendmkpk** » Tue Mar 15, 2016 3:17 pm

Just wanted to share what we have decided to go with -- any thoughts on this is much appreicated.

1) Currentley we have 2 HN's and 6 CN's 8 Core each Physcial Server with GTK and LSF 9.1.3 on 8.7
2) HN2 is acting like NFS Server were SAN is shared to rest of the complete cluster. Also HN2 also has non DATASTAGE Tools like IA/BG/MWB.
3) Running on HP UX with HP Servcice Gaurd acting as a Failover.

We have decided to go with following architecture.-

1) Complete VM's on latest Blade with Intel X5 V3 1 HN 10 core and 4 CN with 18 Core each VM where each of CN VM is located on a different Physcial HOST.
2) Total 5 Physcial blades will have one VM each 1HN and 4 CN on each with RHEL.
3) DB will XMETA on a different Soalris VM's
4) The above all is completey DATASTAGE ALONE.
5) For IGC TOOLS we will create totally different VM's with only those tools they will share the same physcial host as HEADNODE VM and they will also be VM on RHEL.

Storage wise we debatting should we go with NAS or SAN. NAS is being suggested there is a need for NFS we can use NAS Applicance for that.

Also we are not builing any active passive as Our folks from VMWARE Team suggesting if a HN VM goes down they will fire up a new one within a very short time.

Also we have 15k jobs running every time just to get an idea of the load.
-
Questions - 1) NAS or SAN in this architecture is well suited
2) IS VMWARE Failover reailable from Infosphere Grid context
3) What are the best practices for directlories Isolcation projects/datasets to get better I/O should they all be in a different physcial drives.. etc/

Really appreciate any thoughts overall.

Thanks
Sri

PaulVL · Post by **PaulVL** » Tue Mar 15, 2016 5:12 pm

Other than I have NEVER seen a datastage head node execute 15K jobs concurrently... 15K per day is more like it I would imagine.

So how did you address your important issue of separation of business assets because the kids didn't play nice together?

sendmkpk · Post by **sendmkpk** » Tue Mar 15, 2016 6:09 pm

Its not just Head node we have 6 compute node where these job runs.

Thanks
Sri