The Great Exchange 2016 Crash

I often get frustrated when facing I.T. outages (e.g. home broadband down) with the lack of detail in the information put out.  So this is an attempt to provide an overview of the recent Exchange outage, with the aim of providing an idea of the scale of the problem and why it’s been taking so long to restore service.

Background

ISS I.T. Services have been trying to migrate off our aging Exchange 2010 environment onto a new Exchange 2016 environment, with the eventual aim of moving the majority of users to Office 365.  In order to do this new HPE Server Hardware was purchased in a “Best Practice” configuration for the new Exchange 2016 environment.  These were setup, added to the Exchange environment and migration of students proceeded without any problems.    Migration of the staff accounts then followed and reached about 15% of staff before the problems started.

The Crash

On 23rd October, one of the 4 servers crashed and rebooted with a “Blue Screen of Death”.  The reboot triggered an automatic “repair” of the file systems on the server – normal for servers these days.  However, a combination of this repair and the reboot resulted in damage to some of the database mailbox stores which prevented them mounting.

Normally, when this happens the Exchange system flips over to a secondary copy of the database (for fault tolerance we keep two copies and a nightly backup).  However, it seems that the database damage replicated to the copy somehow and hence both “live” copies became unavailable, this resulted in a large number of mailboxes being unavailable to end users.

Whilst trying to recover the databases from backups, other servers in the 4 server cluster also “Blue Screened” damaging further databases.  We also ran into issues when trying to restore databases from backups…

schodingers-backup

Our 3rd Party Support company, having been pulled in to help, did a “Root Cause Analysis” and the conclusion was that the Disk Systems on the servers could not keep up.    This lack of performance was picked up by Exchange’s “Health Service” as a problem which triggered the “Blue Screen” in order to protect itself. This resulted in the disk corruption and hence the database corruption.  Once recovery has completely we will be looking at the Systems to determine the exact cause of the disk performance problem.

The Recovery

Having now got a number of damaged databases all on servers which could best be described as “delicate” when hit with lots of disk traffic, ourselves and our support company looked to try and recover the data to our old 2010 service – a “rollback”.

Several methods have been used involving backups, database repairs, restores, mailbox moves etc.  Some of these have proven extremely slow – for example two restores for student mailboxes (approx. 4000 mailboxes each) have taken over a week and are still going – these users are those who will be missing a week or two of data.  Some of the new methods have had more success and since they are working with the recovered damaged databases are recovering all data. The Disk I/O bottleneck limits the number of activities we can progress at any time.

Given the time take to get the old data back, affected users will had a “dial tone” mailbox – i.e. Empty – which enables them to send and receive mail which will receive recovered mail as and when we can.

The Future

Once everything is rolled back to Exchange 2010 we will be looking at options going forward, such as a revised 2016 rebuild and Office 365, engaging some hardware independent Microsoft consultants to help us make the right decisions.

Posted in Uncategorized | Leave a comment

Setting up iPads for kids

So finally gave in to the inevitable and replaced the cheapo android tablets with ipad minis for the kids. Now at least the grown ups get their I.T. Back!

So then was the issue of setting things up. Issues to work around include:

Since written Apple have seen fit to create a “Family Sharing” solution which enables the creation of real kid’s accounts and share purchases – a good step in the right direction.  Also now suggestions that Amazon will do the same for Kindle!

 

Apple (and everything else don’t allow users to be under 13 – U.S. Federal rules?)

Don’t want kids on my account as FaceTime, games centre, itunes, etc all get shared and potentially confused.

So I eventually set up separate apple accounts for each of them as a parent – I.e. Using my details, etc. I used the trick in Gmail which allows adding additional strings after a + sign. E.g. Paul+string@gmailaccount.com. This still sends all the mail to paul@gmailaccount.com, but is different from an AppleId view point. These accounts have no credit card info, they are topped up as required by gifting itunes and apps etc from my itunes accounts and since the kids don’t have the passwords, they are just my addition accounts.

I’ve hooked them up to meraki.cisco.com a mobile device management site which allows me to remotely lock down the device – disable FaceTime, Safari, force password for purchases etc., as required,. Thx Cisco!

So I now turn FaceTime on and off remotely, and let them play with it between themselves. Pity there’s no whitelist option. And at least the kids headphones work to keep the noise down!

Posted in Uncategorized | Leave a comment

vSphere 5.1 VSA some observations

I’ve been looking into vSphere 5.1’s Essentials Plus free Virtual Storage appliance (VSA). So first a few gotchas – first it requires four NICs before it’ll install, and it seems to take over all spare storage, leaving no local storage for VMs which you don’t what to run with HA – e.g. if you have two DNS servers – HA is irrelevant, as you have fault tolerance at the Application layer.

Haven’t tried this, but I suspect you’d just need to divide your local storage into two chunks – I can do this using my HP Smart Array Controller, so the single disk set presents as multiple virtual disks – one for VSA and one for normal use. Then you install ESXi as normal, create a VMFS partition on the VSA disk before installing the VSA. Once the install is completed, you should be able to put VMFS on the “normal” disk.

I’ve also taken a brief look inside the VSA – thinking it was likely a DRDB based solution, but it actually seems to be is a SUSE 11 VM using MD on top of iSCSI to provide the mirrored volumes. So if you’ve got some linux skills then creating a homebrew version which you have more control over is going to be fairly easy.

Posted in Virtualisation, VMware | Tagged , | Leave a comment

Microsoft Campus Licences Renewed

We just renewed our Microsoft Campus licence.  In particular we have now licensed each node in our vSphere cluster for Windows Server and System Centre Suite in their respective Data Centre Editions.  This means we all allowed to install any of the following on these nodes:

  • Windows Server (any version, any edition, 2003, 2008, 2012, Standard, Enterprise, etc.)
  • Hyper V
  • System Centre Operations Manager 2012
  • System Centre Virtual Machine Manager 2012
  • System Centre Data Protection Manager 2012
  • System Centre App Controller 2012
  • System Centre Configuration Manager 2012
  • System Centre Service Manager 2012

Also, we have licensed the Campus for the Education Desktop (OS + Office) with the Enterprise CALs.  So CALs for all the above products are covered for client use on our PC estate.  Finally we have licensed 6 nodes for SQL Server Enterprise Edition which allows us to mint any number of SQL Server VMs on those nodes – subject to physical memory CPU constraints obviously!

This leaves us some interesting quandaries.   Specifically how much should we pay for products which are better in some ways than the Microsoft equivalents listed above.  Currently we pay extra for:

  • Novell Zenworks: App Controller + Configuration Manager
  • RMS Helpdesk: Service Manager
  • VMware vSphere Enterprise: Hyper-V + Virtual Machine Machine
  • Veeam (back for vSphere) -Data Protection Manager
  • VMware View:  Included in Windows Server 2012
  • VMware ThinApp:  App Controller

Together this duplication of licencing runs to in excess of £30,000 a year.  Add to this the fact that we are reaching the levels of virtualisation that would require us to purchase VMware Operations Manager, Chargeback and vCloud – all of which appear to be covered already by Virtual Machine Manager and Operations Manager.

So how much should we pay for “best of breed”, if indeed these additional products are best of breed any more?

 

Posted in Microsoft, Uncategorized, VMware, Windows Server | Leave a comment

iPad Arrived

So I have joined iPad land, having been an Android phone user. Will I find it annoying, useful, or both? Well I’m using the WordPress App to create this blog with a wireless keyboard – so it’s a laptop today!

More Galleries | Leave a comment