Tuesday, February 17, 2015

Anglia Ruskin University - Our VDI - what we want to do next

As I said in my earlier blog, we have VDI aspirations.  (Hosted Virtual Desktop is the term Gartner tend to use)

This particular blog talks about our intention to push our VDI use cases further.  This means we need more hardware and software - which is discussed in outline below.

Our current VDI delivers a large range of applications but..   What we do not do terribly well at the moment is provide heavy graphics and processor dependant applications and content.

As a consequence, there are a number of niche and not so niche applications that we can't provide to our students (or staff) on our VDI environment.  Of course, this means we are stuck with thick clients for these applications.  So, we're left with students needing to visit specific physical locations to access these more specialist applications.  We want to fix this if we can.   Being able to access applications from almost any device at any time is a significant gain. 

We're also the entrepreneurial university of the year and maybe we can put a package of software together that might help budding business entrepreneurs - web site software, maybe some free hosting? what else?

Specifically, we're talking about applications like the following (I'll add a more complete list at the end later):

Google Earth, Cry Engine, 3D Studio Max, Autodesk Maya, Adobe After Effects CS6, Adobe Photoshop CS6, Epic, Adobe Premiere Pro CC, AutoDesk AutoCAD Architecture 2014, Autodesk Structural Detailing 2015, AutoDesk Inventor, ArcGIS 10, World Wide Telescope, Adobe Audition CC, Adobe Edge Animate CC 2014.1, Adobe Fireworks CS6 (included with CC), Adobe Flash Professional CC 2014, Adobe Illustrator CC 2014.

We probably need more fast storage to run these new applications too but that'll be the topic of a further blog post. 

So, we think we need to complement our environment with NVidia graphics processors.  With the new VMWare vGPU profiles we can hopefully use both Grid K1 and K2 processors.  Particular users will be assigned particular vGPU profiles - meaning we can support a terrific range of graphics heavy applications.  However, the bulk of our users will run on the lesser K1 chips - with the K2 processors being dedicated to heavy use and specialist users. 

 The relevant NVidia graphic boards look like:


A list of certified applications for the Grid cards is here.

Some capacity planning is obviously vital. 

Should we host everything in our main Cambridge data centre?

If we did this then we'd cut down on hardware and complexity.   The disadvantage is clearly around disaster recovery and perhaps end user performance.  But, we do have 10Gbit links with about 10ms latency.  It might be fine.

We would engineer our environment in Cambridge to have no single point of failure.  But, if we lost both our resilient 10Gbit comms links then we'd be in trouble.  But, for a limited number of users at Chelmsford.  This might be a reasonable risk to accept.

The Capacity Requirements

We think the total user community who will benefit from this new graphics capable technology is pretty small.  In Cambridge - probably about 100 concurrent users.   Chelmsford - probably 50-60 concurrent users.  This is going to be a high estimate.

Of the total number who could use the new graphic capability perhaps 20 users in Cambridge and 10 users in Chelmsford would make use of the more top end 'Designer/Power User' capability - i.e. K2 Grid.

What do we need to do to verify these estimates? - clearly identify the applications that need the capacity.  I imagine we will not really be able to do this until we create a working proof of concept.  But, applications like ArcGIS are certified with shared K2 but not K1.

What would the NVidia Grid capacity look like?

Each K2 K220Q supports 16 users (with 512MB graphics memory).  Two = 32.  We can probably get by with Two or three K2s as long as we have at least two separate servers to support them.

Each K1 K140Q supports 32 users (with 512MB graphics memory).  If we need to support about 120 concurrent users then we need four K1s - with at least three being in separate servers.

You'll notice I've assumed we only need 512MB of graphics memory.  This is a guess.

That means we need at least five extra servers to support the new cards.

Our ideal architecture (if it works) would be to also insert an Apex 2800 card into each server - to get extra PCOIP acceleration.

What about the servers?   

The question is - can we still use our existing chassis based HP Gen 8 (perhaps Gen 9) blades? 

Our belief is that we at least need a PCI expansion blade to fit in the K1/K2s and Apex 2800 for our existing Gen 8 BL460c blades.  However, the BL460s are not supported by NVidia.   More VMWare View K1/K2 information here.

The likelihood is the BL460c is a dead end.  We ought to plan on that basis - running an unsupported VDI environment is not a terribly good idea.

This is where we need help from HP and nVidia - can we do this?  or do we need to move to another server architecture?

The nVidia server compatibility list (click here) suggests only a chassis based or specialist workstation WS460c (I'll discount the SL250 as a HPC focused solution).  VMware certification for WS460c here.  This would be a pity - as it would undermine our 'standard blade' approach.  But, they're still blades and we could still utilise our c7000s - so perhaps not so bad.  The Apex 2800s are also supported in the WS460s - so looking feasible.

But, will a K2 (and or K1) and Apex fit in the same server?

So, if it all works out this would mean five new WS460s.

So, HP and NVidia (and helpful solution providers) what's the best approach?

What about our VMWare Desktop Virtualisation Software?   

All of this will be dependant on an upgrade to our existing VMWare View Horizon 5.2 environment - to the latest version 6.


What have we discovered?

13/3/15 - Good news

Well, happily, we have had the above approach informally validated by a helpful supplier.  Even better, they have a hosted POC environment we can use to validate that some of our applications work well.  Testing will take place in the next couple of weeks.

Also, there looks to be some good news on the graphics front and more improvements.

So, firstly, the Apex 2800 card can fit with a K1 or K2 in the WS460c.  Hurrah.  We get the benefits of PCOIP acceleration and graphics from the Ks.  But, we appear to need vSphere ESXi v6 (due out soon).  Apparently the associated new pGPU driver claims to double the density of users on the K1 and K2 Grid cards.  Which would be good news.











Anglia Ruskin University - Onwards and Upwards for our VDI - What we do now

So, Virtual Desktop Infrastructure.  We've been a pretty early adopter with an initial implementation for our students and staff starting about four years ago - 2011.

This blog is intended to provide a basis for Anglia Ruskin IT Services discussion (and wider) but also information for suppliers who might be able to help us with our VDI aspirations - more of this to follow.

I've outlined, at a high level, our technology in use.  The next blog entry will detail what we want to do next with VDI - future use cases.

Our technology is centred around VMWare View (Horizon), HP blade servers, Violin Storage and zero clients.   Augmented by AppV for application virtualisation and Profile Unity for easy profile and application setting management.

It's been tremendously successful with an appreciable uplift in our student experience.  Essentially, it means that students (and our staff) get a good consistent experience across our various locations and devices.  Our total combined staff and student user population is about 35,000.  

To get a bit more specific:

End devices.

Our end devices are mainly zero client with about 950 at our Cambridge Campus and 840 in Chelmsford - roughly 1800 total.   They're a mixture of Tera 1 and Tera 2 supporting our main PCOIP protocol.

We've recently been looking at generation three LG 23" All-In-One V Series as a possible replacement for some of our older end devices

Servers

Our Cambridge Campus

We're using a combination of HP BL460c G7 and Gen 8 servers spread over two HP c7000 blade chassis.  18 blades (all with 192GB memory) - 6 Gen eight and 12  G7.

Our Gen 8 blades also have a Teradici Apex 2800 Tera 2 card installed - to provide better performance.

We used a rough and ready capacity planning rule of thumb of 50 VMs per host.  That means we ought to have capacity for about 900 concurrent VMs in Cambridge.  This is putting aside any performance uplift me might get from our newest Gen 8 blades.  We've also installed Apex 2800 PCOIP acceleration cards in all the Gen 8s.  This ought to mean 50 per blade is a very safe performance bet and this has mostly been true.   We do get the occasional slow down in specific VMs - but it has been hard to establish where the bottleneck might be (despite investigation).  Saying that, the other 99% of the time we have great performance.

On top of that, we have four servers in our VDI management cluster - all G7s with 192GB of memory.

Our two c7000 Cambridge chassis have 5 spare slots in each - 10 total.

Our Chelmsford Campus

Chelmsford is pretty similar - but with a slightly lower capacity.

14 servers in total - 5 Gen 8 with TeraDici Apex 2800 offload card and 9 G7s all HP BL460c and 192GB of memory.

Our VDI management cluster comprises three G7s.

Theoretically, using our rough rule of thumb explained earlier, this means we can support 700 concurrent users in Chelmsford.

Our two c7000 Chelmsford chassis also have 5 spare slots in each - 10 in total.

Our client VMs.

We run Windows 7 Enterprise SP1 with 4GB of memory.  Over 100 applications mostly streamed using AppV.

Other software versions.

VMWare Sphere 5.1, VMWare View Horizon 5.2.

A note on end user experience.

It's worth saying upfront that our emphasis is on providing a PC like end user experience.  Our rationale isn't about stuffing our servers with as many VMs as possible to maximise value for money.  Our objectives are far more focused on delivering the best possible experience for our students and staff.  Of course, there are limits to this.  But, if we compromise end user experience too much then we'd be better off providing PCs.