Body
When fleshing out their departments' business continuity and disaster recovery plans, many of our colleagues often ask about what their plans should include with respect to services that operate on the server and data storage infrastructure that OIT manages. University administration and OIT staff consider it an integral part of our department's mission to provide for these contingencies in a robust, scalable, and cost-effective manner. As a result, OIT continuously invests significant labor and capital resources toward maintaining a number of data protection and disaster recovery capabilities for all services that operate on OIT-managed infrastructure. This document aims to provide a brief overview of how those capabilities apply to various IT services at UAH.
OIT Help Desk
The OIT Help Desk will operate, either in an on-campus or remote capacity, to the fullest extent possible while ensuring the safety of employees, subject to the specific constraints imposed by a disaster or other emergency situation. Efforts to restore UAH IT infrastructure from a destructive event will prioritize the communication and other supporting technologies that allow the Help Desk to provide technical assistance and guidance to the campus community throughout the process of returning to normal operations.
Email
UAH uses Google Mail, which operates entirely on Google's worldwide server infrastructure. Natural disasters or similar destructive events localized to the UAH campus or even a larger portion of the surrounding area should not significantly affect email functionality for the university.
Single Sign-On
Single Sign-On (SSO) serves as the "gatekeeper" to a wide array of cloud-hosted services and operates in a redundant manner between UAH's on-campus infrastructure and secondary site, discussed in further detail below. Natural disasters or similar destructive events affecting the UAH campus will not ordinarily affect SSO functionality.
Canvas and Other Cloud-Hosted Services
Canvas and most other cloud-hosted "software as a service" products run on publicly-available infrastructure built and maintained by Amazon (AWS), Microsoft (Azure), and other large multinational corporations. Natural disasters or similar destructive events affecting the UAH campus will not ordinarily affect the operation of these services but a number of technical factors and design decisions on the part of the companies that own both the product (e.g. Canvas) and the infrastructure on which it operates (e.g. Microsoft's Azure cloud) will affect their level of resilience. Outages affecting vast swaths of public cloud providers (and thus all the services that run on them) are not unheard of and planning for such a contingency bears some consideration.
File Sharing: Google Drive
Google Drive operates on substantially the same global infrastructure as Google Mail, so it will generally show similar resilience against destructive events affecting the UAH campus.
File Sharing: Windows File Shares
OIT operates Windows file shares on university-owned servers that physically reside on the UAH campus. Several layers of protection apply to these files:
- Built-in "Previous Versions" functionality: If someone accidentally deletes a file on a Windows file share, they can generally recover it quickly and easily on their own using the "Previous Versions" functionality built into Windows. OIT staff will gladly assist with this process too, of course.
- Traditional backups: OIT's enterprise backup system takes backups of most files stored on Windows file shares every weeknight, with retention going back approximately a month.
- Storage snapshots: OIT's Storage Area Network (SAN) creates hourly point-in-time snapshots of all production data, maintaining them for two days, and one snapshot per day for another week after that.
- All capabilities described in "Other On-Premises Services" below also apply for Windows File Shares.
Banner
Banner, UAH's ERP system, primarily consists of a large single database and 20+ interconnected applications that interface with it. Given the widespread importance of its role and the underlying data, OIT maintains some more-robust protections for Banner:
- The database behind Banner is periodically synchronized to a separate copy that resides at UAH's disaster recovery site. This database copy backs most institutional reporting, which results in effectively-continuous integrity checking.
- Banner application servers are backed up nightly and backups are retained for at least two weeks. The Banner database performs its own independent nightly backups and retains them for a month. OIT's storage systems then further retain those backups off-site for a longer duration.
- All capabilities described in "Other On-Premises Services" below also apply for Banner.
Other On-Premises Services
OIT provides the following data protection and disaster recovery mechanisms for all production services that run on OIT-operated server and storage infrastructure. If you have entered into an agreement with OIT to operate a service for your department and have not explicitly discussed different mechanisms, these all apply for the service in question:
- Traditional backups: OIT's enterprise backup system performs nightly backups for all production servers. Retention periods vary according to operational needs, service criticality, and availability of other data protection mechanisms.
- Data replication: OIT's Storage Area Network (SAN) continuously replicates all production data to the university's disaster recovery site and retains snapshots at 5-minute intervals for the last hour.
- Storage snapshots: OIT's SAN creates hourly point-in-time snapshots of all production data, maintaining them for two days, and one snapshot per day for another week after that.
In the event of a natural disaster or similar destructive event that renders on-campus IT infrastructure inoperable, OIT staff will restore production services as quickly as possible at the disaster recovery site. The exact nature of the specific failure will dictate how long this work takes, but a full recovery should generally take 4-6 hours.