Disaster Recovery Process
Information Technology Statement of Intent
This document delineates our policies and procedures for technology disaster recovery, as well as our process-level plans for recovering critical technology platforms and the telecommunications infrastructure. This document summarizes our recommended procedures. In the event of an actual emergency situation, modifications to this document may be made to ensure physical safety of our people, our systems, and our data.
Our mission is to ensure information system uptime, data integrity and availability, and business continuity.
Corporate management has approved the following policy statement:
- The company shall develop a comprehensive IT disaster recovery plan.
- A formal risk assessment shall be undertaken to determine the requirements for the disaster recovery plan.
- The disaster recovery plan should cover all essential and critical infrastructure elements, systems and networks, in accordance with key business activities.
- The disaster recovery plan should be periodically tested in a simulated environment to ensure that it can be implemented in emergency situations and that the management and staff understand how it is to be executed.
- All staff must be made aware of the disaster recovery plan and their own respective roles.
- The disaster recovery plan is to be kept up to date to take into account changing circumstances.
The principal objective of the disaster recovery program is to develop, test and document a well-structured and easily understood plan which will help the company recover as quickly and effectively as possible from an unforeseen disaster or emergency which interrupts information systems and business operations. Additional objectives include the following:
- The need to ensure that all employees fully understand their duties in implementing such a plan
- The need to ensure that operational policies are adhered to within all planned activities
- The need to ensure that proposed contingency arrangements are cost-effective
- The need to consider implications on other company sites
- Disaster recovery capabilities as applicable to key customers, vendors and others
Disaster Recovery Plan for QIS Risk System
The goal of this Disaster Recovery (DR) plan is to maintain IT Operations and Support services for maximum uptime of the QIS Risk system. This document describes the main practices used to ensure availability. Where relevant, the strategy entails maintaining data backups and fully mirrored duplicate software sites that enable instantaneous switching between the live site and the backup site. Copies of this Plan have been distributed to QIS Risk employees who exercise a role in the Disaster Recovery Process and are available to Clients upon request.
The DR Team includes multiple highly skilled technical staff with intimate knowledge of the system. Professionals are distributed across different urban centers.
QIS Risk infrastructure runs primarily on Amazon Technologies Inc. (AWS) servers in Virginia, US. Many of the DR Processes described here implement best practices for application uptime on their infrastructure.
Monitoring & Client Notification
The QIS Risk system is constantly monitored to ensure uptime. Downtime on multiple subsystems is tracked by a third-party service and the DR Team is notified of outages by email within 5 minutes.
The QIS Risk system also includes extensive internal logging capabilities that captures relevant details of its execution state. Logs determined to indicate failure states are reported immediately via Telegram to the DR team members and dealt with on a best effort basis.
These monitoring capabilities allow QIS Risk Clients to experience a high application uptime. Clients are encouraged to report experienced outages to firstname.lastname@example.org and/or their dedicated Telegram group for a prompt resolution. Consult your Master Service Agreement for further details on Support Services.
For the purpose of Disaster Recovery, database backups are taken daily and retained for the appropriate retention period. Backups are tested regularly to ensure their ability to restore the system state at its snapshot time. Backups are kept on AWS infrastructure following their relevant server location redundancy best practices.
Alternate Recovery Infrastructure
The QIS Risk system runs on commodity hardware using a highly componentized architecture. QIS Risk runs multiple simultaneous application environment that can be switched over at the DNS level. Spinning up a new environment is also a documented process that takes no more than 60 minutes to execute. In the event of an outage requiring a complete reset of the application environment, the DR Team is technically capable and has the discretion to resolve outages in whatever way is most effective given the incident root cause.
Disruptions of Data Suppliers
QIS Risk relies on external providers for Market Data and Client Portfolio Information. The QIS Risk system is designed for resiliency in the event upstream data feed outages and with the intent of minimizing disruptions to Users. For those external data sources, extensive monitoring is in place to identify outages and guarantee SLAs are met. The QIS Risk system UI includes several metadata indicating to clients there are delays in the upstream vendor feeds when applicable. QIS Risk will at its discretion notify Clients of potentially disruptive interruptions in upstream feeds.