Malicious Data and Computer Security

W. Olin Sibert

InterTrust Technologies Corporation
460 Oakmead Parkway
Sunnyvale, CA 94086


Traditionally, computer security has focused on containing the effects of malicious users or malicious programs. However, as programs become more complex, an additional threat arises: malicious data. This threat arises because apparently benign programs can be made malicious, or subverted, by introduction of an attacker's data--data that is interpreted as instructions by the program to perform activities that the computer's operator would find undesirable. A variety of software features, some intentional and some unwitting, combine to create a software environment that is highly vulnerable to malicious data. This paper catalogs those features, discusses their effects, and examines potential countermeasures. In general, the outlook is depressing: as the economic incentives increase, these vulnerabilities are likely to be exploited more frequently; yet effective countermeasures are costly and complex.

1. Introduction

This paper addresses the increasing vulnerability of computer systems, particularly personal computers (PCs), to attacks based on malicious data: that is, attacks employing information that appears to represent input for an application program such as a word processor or spreadsheet, but that actually represents instructions that will be carried out by the computer without the knowledge or approval of the computer's operator. This vulnerability comes from two sources: program features that intentionally treat data as instructions and program flaws that allow data to act as instructions despite the program designer's intentions.

A system that has been subverted by such an attack is, in effect, under the control of a malicious program. Protection against such programs has been the focus of traditional computer security measures: file access control, user/supervisor state, etc. Such measures permit a program's activities to be contained to a limited set of computer resources for which the program's operator is authorized. However, as computers (particularly PCs) are used more and more as extensions of their operators (i.e., as agents), the scope of authorization is greatly increased: a malicious program might, for example, cause a financial transaction using electronic commerce software that is indistinguishable by any automated means from a transaction the operator would have authorized--except that there was no such authorization. This increasing difficulty of identifying which computer activities are permissible and which are not increases the risk from all types of attacks.

The potential scope of malicious program activity in the PC environment is enormous. On one end of the spectrum are traditional "damage" attacks: virus propagation, destruction of data, compromise of other systems on a network. Another familiar attack involves disclosure: of passwords, of personal data, and so forth; but also of non-computer data such as credit card account numbers; see [1] and [2] for a detailed discussion of such a scenario and of how the disclosed data can be returned untraceably to the attacker. On the other end of the spectrum are "agency" attacks, in which a computer is made to perform actions of which its operator is wholly unaware, such as electronic purchases, transfers of "digital cash," forged E-mail, and so on.

The two types of vulnerability from malicious data--intentional and unwitting--are quite different and require different approaches to remedy. The unwitting flaws can be fixed (although fixing them is rarely simple), but the intentional mechanisms represent a tension between a system designer's desire to provide features and a user's need for safety.

Furthermore, it is fundamentally difficult to distinguish between data and programs. Although many of the vulnerabilities discussed here rely on supplying actual machine instructions to be executed by hardware, others employ instructions that a program intentionally interprets (such as the PostScript language). Drawing a strict line between data and programs is not sufficient.

Section 1 of this paper introduces the concepts and discusses some potential effects. Section 2 catalogs a variety of intentional mechanisms that can be exploited using malicious data; section 3 describes unwitting mechanisms (i.e., flaws) with the same effect. Finally, section 4 discusses some solution approaches, of which disappointingly few seem to be effective.

2. Intentional Vulnerabilities

With the best of intentions, software developers are responsible for blurring the distinctions between programs and data. Most of the mechanisms cataloged in this section share a common characteristic: they provide a useful capability when used in a benign environment, but they were designed with little or no consideration as to how they might be employed by a hostile party (with the notable exception of Sun's Java language; also see section 4.2).

These mechanisms either permit arbitrary files to be modified, or allow arbitrary programs to be executed, or both. The fundamental property they share is an assumption that the operations that are performed should be performed just as if the user had entered them directly at the keyboard: that is, they are executed within a "user environment" that is shared by all other activities the user performs. The difference is that these packages perform the operations without the user's explicit consent, and often without the user's knowledge. Although some of these features are undocumented, documentation is not the issue: it is simply unreasonable to expect a user to scour a 500-page manual looking for potential security risks before using a program.

Some of the risks posed by these mechanisms can be reduced or eliminated by isolation techniques or by requiring user confirmation. Such solutions, however, reduce the utility of the features and increase complexity for the user. Always requesting a confirmation is little better than never doing so: none but the most paranoid of users will think about it before answering "OK."

2.1. Examples

The following list identifies some of the intentional risks posed by common computer systems and applications:

3. Unwitting Vulnerabilities

The previous section dealt with purposeful software features that provide an opportunity to introduce malicious programs. On the one hand, it is unfortunate that those features were designed with little attention to risk; on the other hand, it is good that they can be identified, for it is possible to imagine countermeasures that would contain them.

There is another class of attacks that does not have those properties: attacks based on program flaws or inadequate design. Here, the designers did not intentionally create a problem; rather, by failing to provide sufficiently robust software, they unintentionally enabled the problem to occur.

Such unintended risks depend on the same basic properties as the intentional ones: programs run in a user environment that is shared by other programs. To date, the exploitations of these risks have involved primarily multiuser systems, where the environment being attacked is privileged. However, privilege is not necessary for these attacks to be useful; they can introduce malicious software into the environment of an unprivileged user just as effectively.

3.1. Examples

A few examples of these attacks include:

3.2. Scope of Vulnerability

These examples represent the tip of the iceberg. What sort of programs are vulnerable to such attacks? Any program that misbehaves when given bad input data is a potential victim. If it crashes or dumps core when given bad input, it can probably be made to misbehave in a predictable manner, too. If a program's internal data structures can be damaged by invalid input, this often indicates that its control flow can be affected as well--potentially leading to the ability to execute caller-supplied instructions.

Indeed, software developers typically make no claims that any application programs are bulletproof when faced with invalid input data, because such misbehavior is seen only as an inconvenience to users--after all, "garbage in, garbage out." The risk that it would serve as a way to introduce malicious software into the user environment is rarely, if ever, considered.

Examples of such program misbehaviors include:

Although none of these program behaviors is known to the author to have been exploited, the possibility clearly is present, and further investigation is warranted.

The basic problem is that increasingly complex and illdefined data semantics are difficult to process, so it is no surprise that application software fails when presented with bogus input data. Software that responds correctly to all incorrect input is far harder to create than software that simply responds correctly to correct input.

Application software development contrasts with the design philosophy of network protocols, where a basic assumption is that all possible bit sequences will be encountered, so all must be handled reasonably. It is partly for this reason that implementations of complex network protocols often have a long development period before they are truly robust.

3.3. Exploitation Techniques

The known exploitations for invalid input data are known primarily because they were used to breach system integrity in multiuser systems. These attacks are more difficult to construct than those that exploit known software features. They require constructing executable programs "by hand," tailored to run in a largely unknown environment. Although doing so is awkward, it is by no means beyond the abilities of a moderately sophisticated attacker.

The most fruitful exploitation technique seems to be buffer overflow: provide more data than a program expects, so that it will overwrite internal storage for program variables or addresses, and the program will misbehave in a deterministic--and possibly controllable--manner. Another technique involves providing data with out-of-range values. Such inputs can cause calculated branches to go to unintended destinations, or can cause values to be stored outside of array bounds. All these vulnerabilities offer the potential to cause a transfer to the attacker's supplied executable code, from which point the attacker can do anything that the attacked program can do.

It is important to note that malicious data representing machine instructions does not require arbitrary binary values. For example, the Intel 80x86 opcode set and the MS-DOS executable file format permit a valid executable program to be constructed entirely from printable ASCII characters. Such a program can perform arbitrary actions when executed, yet it requires no special transfer mechanism--it can be delivered as ordinary unformatted E-mail. The first known example of such a program[9] contains a small executable header that decodes the rest of the program text--transferred in UNIX uuencode format--into a memory buffer, then transfers to it.

Of course, a successful exploitation is quite difficult. It is necessary first to understand how the program is misbehaving, then to determine what input data will create predictable misbehavior, then to craft input data that contains an appropriate attacking program. The analysis stages require an understanding of the software that comes most readily from source code, but as most personal computer applications are not distributed in source code form, techniques such as disassembly and emulation are required. Experimentation plays a critical role, also.

4. Solutions

Solving the problems posed by unsafe or malicious data requires fundamentally different techniques from traditional computer security approaches, because the objective is different. Traditional approaches focus on isolation and protection of resources: that is, on preventing activity whose nature is known in advance. Protection from malicious data, on the other hand, requires distinguishing among program activities that are in accord with the operator's intent and those that the operator would not want to occur. This problem--of divining the operator's intent--seems unlikely to be solved.

Addressing the malicious data problem seems instead to require a return to fundamentals:

Aside from these techniques--which would represent a fundamental change in commercial software development--there are relatively few external, systemlevel techniques that offer much hope for improvement. The problem of safe execution of mutually suspicious programs remains a difficult problem in computer system design [10]. Even if such solutions were readily available, it is unclear whether users could be expected to exercise the necessary discipline to protect themselves. After all, it is not unreasonable to expect that computer systems, like other complex appliances, should be safe to use without detailed understanding of their internal operations.

4.1. External Solutions

This section briefly discusses some of the solution techniques that can be applied externally to contain or reduce the effects of malicious data:

4.2. Internal Solutions

In the long term, internal solutions seem to offer more hope for addressing these problems:

5. Conclusions

The general outlook for malicious data as a computer security problem is unclear. The potential vulnerabilities are legion, but exploitation poses great practical difficulties. Unfortunately, defense also poses great difficulties, and as the economic incentive for creating malicious software increases, it seems likely that attackers will attempt to exploit these vulnerabilities.

The most effective technical solutions appear to require pervasive change in the way that computer software is built. The near-term alternatives all involve giving up many of the "general-purpose tool" properties that make personal computers so effective in the first place.

6. References

[1] Garfinkel, Simson, "Program shows ease of stealing credit card information," San Jose Mercury News, 29 January 1996

[2] Sibert, Olin, "Risks (and lack thereof) of typing credit card numbers" Risks-Forum Digest, volume 17, issue 69, 7 February 1996, available by anonymous FTP from in /risks/17/risks-17.69

[3] Computer Emergency Response Team, CERT Advisory CA-95:10, 31 August 1995, available by anonymous FTP from in /pub/cert_advisories/CA-95:10.ghostscript

[4] Hoffmann, Lance J., Rogue Programs: Viruses, Worms, and Trojan Horses, Van Nostrand Reinhold, 1990

[5] Ferbrache, David, A Pathology of Computer Viruses, Springer Verlag, 1992

[6] Computer Incident Advisory Capability United States (Department of Energy), CIAC Alert G10a: Winword Macro Viruses, available from bulletins/g-10a.shtml

[7] Eichin, Mark W., and Rochlis, Jon A., "With Microscope and Tweezers: An Analysis of the Internet Virus of November 1988," in Proceedings, 1989 IEEE Computer Society Symposium on Security and Privacy page 326-343, 1-3 May 1989, Oakland, California

[8] Computer Emergency Response Team, CERT Advisory CA-95:10, 19 October 1995, available by anonymous FTP from in /pub/cert_advisories/CA-95:13.syslog.vul

[9] Peterson, A. Padgett, personal communication, 15 February 1996. Mr. Peterson reports, "I described it in an internal Martin Marietta memo on security threats presented in 1988 as `theoretically possible' but did not construct a working prototype [until 1994]."

[10] Rotenberg, Leo J., Making Computers Keep Secrets, Ph.D. Thesis, Massachusetts Institute of Technology, 1973, published as MIT Project MAC Technical Report TR-115, February 1974

[11] Levy, H. M., Capability-based Computer Systems, Digital Press, Maynard, Massachusetts, 1984

[12] Organick, Eliott I., A Programmer's View of the Intel 432 System, McGraw-Hill, New York, 1985

[13] Hardy, Norman, "KeyKOS Architecture," ACM Operating Systems Review, Volume 19, Number 4, October 1985

[14] Rajunas, Susan, et al., "Security in KeyKOS," in Proceedings, 1986 IEEE Computer Society Symposium on Security and Privacy, 7-9 April 1986, Oakland, California

[15] Pozzo, Maria, and Gray, Terrence, "Managing Exposure to Potentially Malicious Programs," in Proceedings of 1986 National Computer Security Conference, 15-18 September 1986, National Bureau of Standards, Gaithersburg, Maryland pages 75-80

[16] Sun Microsystems, Inc., Java Language Specification, available as

[17] van Rossum, Guido, Python Reference Manual, Dept. AA, CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands, available as

[18] Computer Emergency Response Team, CERT Advisory CA-96:07, 29 March 1996, available by anonymous FTP from in pub/cert_advisories/CA-96.07.java_bytecode_verifier

[19] Dean, Drew; Felten, Edward; and Wallach, Dan, "Java Security: From HotJava to Netscape and Beyond," in Proceedings, 1996 IEEE Computer Society Symposium on Security and Privacy, 6-8 May 1996, Oakland, California