Sign In

Communications of the ACM

Communications of the ACM

Trusted Recovery

Recent exploits by hackers have drawn attention to the importance of defending against potential information warfare. Defense and civil institutions rely so heavily on their information systems and networks that attacks that disable them could be devastating. Yet, as hacker attacks have demonstrated, protective mechanisms are fallible. Features and services that must be in place to carry out needed, legitimate functions can be abused by being used in unexpected ways to provide an avenue of attack. Further, an attacker who penetrates one system can use its relationships with other systems on the network to compromise them as well. Experiences of actual attacks have led to the recognition of the need to detect and react to attacks that succeed in breaching a system's protective mechanisms.

To protect a system against information warfare, it is of course necessary to take steps to prevent attacks from succeeding. At the same time, however, it is important to recognize that not all attacks can be averted at the outset. Attacks that succeed to some degree are unavoidable, and comprehensive support for identifying and responding to attacks is required [1]. Information warfare defense must consider the whole process of attack, response, and recovery. This requires a recognition of the multiple phases of the information warfare process. Prevention is just one phase; we explain others and then focus on the oft-neglected recovery phase. The goal of defense is to keep available as many of the critical system elements as possible in the face of information warfare attacks. It is undesirable to use recovery techniques that require halting system operations for repair, for denial of service may be the attacker's objective, especially if it occurs at a critical time. Once a bad system element has been detected, it is essential to proceed quickly with repairs while allowing applications to continue operating even if some of the elements have been damaged by an attack.

Back to Top

Phases of Information Defense

Information warfare attack and defense are continuous processes, and defensive approaches must consider the entire process. From the attacker's point of view, a classic military cycle of intelligence gathering, planning, and execution is apt. The attacker observes the system and gathers data from any available sources to determine the system's vulnerabilities and find the most critical functions or data to target—this information is used to plan the means of attack and the resulting plan is carried out. The attacker then gathers further information from any new vantage points established (such as system information available once initial access has occurred), assesses the impact of the attack on the system so far, and plans further actions. As part of this cycle, an attacker may also attempt to anticipate the responses that will be made by defenders and either act to counter them or even take actions specifically designed to instigate a defensive response that would have side effects damaging to the system's operational function. For example, the attacker might supply a counterfeit source IP address in packets that carry out a noticeable attack in order to provoke a response that shuts down service to the host at that address.

The defender must also attempt to anticipate and block possible means of attack, detect those that occur, and respond in a way that limits damage, maintains system availability for its critical functions, and allows recovery of full operating capabilities to proceed. The defender's cycle of activities can be divided into the following phases:

  • Prevention. The defender puts protective measures into place.
  • Attack detection. The defender observes symptoms of a problem and determines that an attack may have taken place or may be in progress. The defender gathers further information to diagnose whether the symptoms are due to unusual but legitimate system activity or to an attack, and if there is an attack, what type. The defender can gather information by changing monitoring thresholds, deploying additional sensors, or using specialized analytical tools.
  • Damage assessment and containment. The defender examines the system to determine the extent of any damage the attack may have caused, including failed functions and corrupted data. The defender takes immediate action to try to eliminate the attacker's access to the system and to isolate or contain the problem to prevent further spread.
  • Recovery. The defender may reconfigure to allow operation to continue in a degraded mode while recovery proceeds. This may involve cutting back on noncritical services to maximize the ability to continue critical services, for example. The defender then recovers corrupted or lost data and repairs or reinstalls failed system functions to reestablish a normal level of operation.
  • Fault treatment. To the extent possible, the weaknesses exploited in the attack are identified, and steps are taken to prevent a recurrence.

These phases correspond loosely to a typical protect-detect-react cycle. We have broken down reaction into two phases, and identified fault treatment explicitly as a phase, as the fault tolerance literature does. Fault treatment relates closely both to reaction and to prevention. Considerable effort is devoted specifically to the prevention phase when a system is first developed and put into place, and when new releases or other significant changes occur. During times when the system is operating in a steady state, the fault treatment and prevention phases can be viewed as taking place simultaneously.

Reaction might also include some form of counterattack, which would involve a loop similar to that of the attacker. We do not consider that further here, however, since we are concerned only with the defense aspect of information warfare in this article.

Back to Top

Trusted Recovery

Recovery methods have been studied extensively by researchers in the fault tolerance and database areas. In the fault tolerance area, two types of errors are considered: errors that are anticipated and those that are unanticipated [5]. In the case of anticipated errors, an accurate prediction or assessment of the damages can be made; if this is not possible, errors are said to be unanticipated. To recover from anticipated errors, forward recovery methods are used. Since the errors have been foreseen, either contingency update instructions can be specified or a means of deriving an acceptably correct value can be formulated. Forward recovery methods have two limitations. First, these methods are usually very system specific. Second, the success of these methods depends on how accurately damage from faults can be predicted and assessed. To recover from unanticipated errors, backward recovery is considered to be the only viable approach. This requires that the entire state be replaced by a prior state that is consistent. Clearly, this approach is less than optimal because it requires that the system be temporarily halted. As observed earlier, denial of service may be the attacker's objective, particularly if the attacker can cause stoppage to occur at a critical time.

Database management systems (DBMS) provide a rich set of recovery facilities [4]; however, they mostly rely on backward recovery methods to restore the database to a consistent state. There are several limitations to the backward recovery methods used in DBMS, especially in the face of malicious attacks. First, if a transaction is aborted, the transaction isolation property supports recovery, in a sense, by ensuring that it can be backed out [4] without affecting other transactions. The isolation property does not help, however, in the case of malicious transactions, because they appear to the DBMS to be ordinary transactions and complete normally. Undo/redo logs support recovery when the system fails with a number of uncompleted transactions in progress, but such recovery methods do not apply when transactions complete successfully but create bad data. Now, suppose that some time after a malicious transaction has been committed, the bad data it created is discovered through some means (perhaps a user has noticed it). Meanwhile, other innocent transactions may have read the bad data, based their computations on it, and unwittingly then written bad data of their own to other items. The only general mechanism available to remove the effects of one or more prior, successfully committed transactions is backward recovery, which rolls the database back to a previously established checkpoint. However, the use of this mechanism poses a dilemma, because the penalty for doing so is that all other, valid work that has been accomplished since the checkpoint was taken is also lost.

Types of recovery. Recovery methods can be formalized around three recovery models: HotStart, WarmStart, and ColdStart. HotStart is primarily a forward error recovery method, and ColdStart is primarily a backward error recovery method, but each of the three models incorporates both forward and backward error recovery to some degree.

The HotStart model is appropriate for attacks in which the system can or must respond transparently to the user. Suppose an attacker introduces a corrupt binary executable at a particular site and uses that executable to launch an availability, trust, or integrity attack. The attack can be handled with a HotStart model if two conditions hold. First, the attack must be detected early enough that damage is confined to the executable. Second, a hot standby of the executable—an uncorrupted standby, preferably at a different location—must be available to take over. The hot standby effects a recovery transparent to the user, even though the system is in a degraded state. It is still necessary to identify the path by which the adversary introduced the corrupt binary, disable that path, and restore the proper binary from a back-up store.

The defender must respond in a way that limits damage, maintains system availability, and allows recovery of full operating capabilities.

Sometimes it is not possible to hide the effects of an attack from the users, and in these cases a WarmStart model is desirable. Damage can be confined such that key services are available, trustworthy, and reliable. Nonetheless, the users are aware of the attack because the system is visibly degraded. The exact level of service depends on the extent of the attack. Some functionality may be missing, untrustworthy, and/or based on incorrect information. Key mechanisms for managing WarmStarts are checkpoints for quick recovery and audit trails for intercepting the attacker.

A WarmStart response to an availability attack results in nontransparent but automated recovery from confined damage. A WarmStart response to a trust attack means that some system operations—but not others—can be trusted while the response to the attack is under way. A WarmStart response to an integrity attack means that some system functionality—but not all—is enabled.

The ColdStart model is appropriate for the most severe attacks. The chief difference from the WarmStart model is that the attacker succeeds in halting the delivery of system services. The goal of the ColdStart recovery is to bring the system back up as quickly as possible to a usable, trustworthy, and consistent state. Policies and algorithms are required to support efficient ColdStarts. Compensation for unrecoverable components—for example, leaked information—is also crucial.

Our focus responds to the need to defend against subtle corruption of information.

Recovery methods. In this section, we list several methods that could be used to deal with some aspect of recovery. Each of these methods can be investigated in relation to the three recovery models: HotStart, WarmStart, and ColdStart.

Redundancy: The most fundamental technique for recovery is redundancy. This means that either an information element is stored redundantly somewhere in the system or it can be reconstructed from some other elements stored in the system. Such redundancy might take the form of backups at geographically distributed locations, alternative algorithms, compensation methods for unrecoverable objects, and audit trails for tracking system access and usage.

Redundancy can be useful for all three types of recovery. For an example of HotStart recovery, suppose an attack has been detected that has damaged an executable. A hot standby of the executable—an uncorrupted standby, possibly at a different location—can take over. Derived data attributes provide an example of WarmStart. Unlike "normal" attributes, derived attributes have attribute evaluation rules attached to them; evaluation rules describe how the values of these attributes are to be derived from other values. These other values do not have to be in the system; they could come from the outside. Recovery logs provide an example of ColdStart.

Backward recovery: In the case of errors for which no corrective compensating action can be determined or where the extent of damage cannot be determined, backward error recovery must be done. Backward recovery uses database mechanisms such as the undo/redo log to erase recent transactions and restore the database to a prior state [4].

Backward recovery methods can be used to achieve not only ColdStarts, but HotStarts and WarmStarts as well. Suppose we have determined a collection of transactions to be malicious (these transactions may be all generated at a particular site or executed by a single suspicious user). If we can identify the extent of damage caused by these malicious transactions, we can take immediate steps to confine the damage (see the following discussion). We use the log to undo the changes by the malicious transactions, and redo the changes by the normal transactions. This would require augmenting the database log to capture the data elements read by transactions; exactly how this is to be accomplished is to be investigated.

Static partitioning of information elements: Designing the database and its applications so that transactions can touch data only in a single region limits the extent to which damage can spread and allows applications that use other partitions to proceed normally while one is under repair. Since this may be impractical for many databases, a more flexible alternative is to define boundaries of regions, identify triggers or propagated updates that cross those boundaries, and limit the bandwidth or conditions under which data may flow across.

Forward recovery: In some cases, detected errors can be corrected through forward error recovery. These are cases in which either the particular type of error has been foreseen and contingency update instructions specified or a means of deriving an acceptably correct value is known. If the semantics of the application support forward recovery, compensating transactions can anticipate error scenarios [2, 3]. For items that are replaced periodically through normal processing, the error may be corrected merely by waiting until the next replacement occurs.

Versioning: In a concept borrowed from concurrent engineering, it is possible that maintaining trees of versions, in which versions are inter-transaction checkpoints, would allow more graceful restoration of a consistent state. If the current database state were found to be unsound, a different branch could be followed. This type of versioning would be tied closely to states of the database applications. Further exploration is needed to determine whether it offers advantages in an information warfare context.

Dynamic partitioning of information elements: The goal in dynamic partitioning is to use recovery methods to identify information elements that can be taken out of use, repaired, and reintegrated for use dynamically. This technique is essential for HotStarts.

Countermeasure transactions: Countermeasure transactions are transactions specifically designed to detect and/or repair damage. An attack might be detected by a large variety of means. Some are internal to the database, such as an integrity constraint violation detection via the firing of an action rule in an active database. Others are external to the database, such as an alert officer noticing that an abnormally high number of aircraft are scheduled to refuel at a particular tanker. Also, damage might be repaired by a drastic action such as reset of the entire database to a prior state or a simple approach such as merely waiting for good data to overwrite bad data. Many of these countermeasures can be modeled as transactions on the information system. The benefit of doing so is that the power of the transaction model can be used to implement fault tolerance across the system as a whole.

Back to Top


In this article we have identified the phases of activity with which information warfare defenders must be concerned. While much work in detection and reaction focuses on catching illegal entry into the system, concentrating on the operating system and networking levels, we have focused on defending and repairing damage to the information maintained within the system. This focus responds to the need to defend against subtle corruption of information intended to degrade a system's ability to perform its mission in a manner unknown to its users and operators. It also responds to the need for defense against attacks by real and apparent insiders (the latter being, for example, attackers who have succeeded in co-opting the identities of authorized users). To this end, we have presented several defense and recovery strategies. A number of techniques for performing recovery or for structuring systems to facilitate containment and recovery have been described here. Further research is needed in all of these areas, particularly with regard to incorporating application-specific knowledge into detection and recovery from attacks.

Back to Top


1. Ammann, P. and Jajodia, S. and McCollum, C.D. and Blaustein, B.T. Surviving informationwarfare attacks on databases. In Proceedings of the IEEE Symposium on Security and Privacy (Oakland, CA, 1997), 164–174.

2. Ammann, P. and Jajodia, S. and Ray, I. Applying formal methods to semantic-based decomposition of transactions. TODS 22, 2 (June), 215–254.

3. Garcia-Molina, H. and Salem, K. Sagas. In Proceedings of the ACM SIGMOD International Conference on Management of Data. (San Francisco, 1987), 249–259.

4. Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, San Mateo, CA, 1993.

5. Lee, P.A. and Anderson, T. Fault Tolerance: Principles and Practice, 2d. ed., 1990.

Back to Top


Sushil Jajodia ( is a principal scientist at the MITRE Corporation and a professor and chair of the Department of Information and Software Engineering and the director of the Center for Secure Information Systems at George Mason University in Fairfax, VA;

Catherine D. McCollum ( is a principal engineer at the MITRE Corporation in McLean, VA.

Paul Ammann ( is an associate professor of Information and Software Engineering at George Mason University in Fairfax, VA;

©1999 ACM  0002-0782/99/0700  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 1999 ACM, Inc.


No entries found