40. Virtual Machines

Part of CS:2820 Object Oriented Software Development Notes, Spring 2016
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Where Are We

The term virtual machine refers to any set of hardware and software resources that, taken together, create an environment in which applications can be developed. Of course, you can develop applications directly on top of a real machine, but that means you have nothing but hardware to use as a starting point. Generally, we develop our applications on top of a combination of hardware and pre-existing software. We might use:

a software-defined virtual machines such as the J-machine that executes the J-code produced by a Java compiler. Or we might use a MIPS emulator to develop code for the MIPS CPU on a machine that has some other instruction set. Generally, there is a significant performance penalty for usint this type of virtual machine because it typically takes from 10 to 100 machine instructions on the underlying physical machine to execute each instruction on the virtual machine.
a virtual machine monitor such as VMWARE on the Intel architecture or VM on the IBM Enterprise Server Architecture. These allow extremely secure execution environments, secure enough that cloud-computing service providers can offer services to the public without fear that the applications run by one customer will interfere with those run by a competing customer. Generally, these virtual machines run at close to the real machine speed, using software to implement potentially dangerous instructions while using the native hardware to run the safe parts of the code. To do this, the native hardware must include provisions to detect potentially unsafe instructions so that the monitor's software can execute them.
a real machine augmented by a library of software. It is true that, to a programmer, writing a+b to operate on objects of the built-in types and writing a.add(b) or add(a,b) to operate on objects implemented in the library makes it clear which operations are built-in and which are implemented by the library, but this is just surface syntax. To a programmer who thinks abstractly, they are all just operations, and at a conceptual level, both operations are just operations on some underlying type.

An Example

Consider an application running on a server in the cloud (Amazon, Google, Microsoft, it hardly matters). This is probably running on a multi-layered stack of virtual machines:

On top of the physical machine, probably a high-end Intel processor, the cloud computing service is running something like VMware in order to safely isolate their customes from each other.
On top of VMware (or equivalent secure environment), the customer is running some operating system, for example, Linux. Linux extends the bare machine it thinks it is running on with a number of resources created by software such as files, timers, and processes.
On top of Linux, the customer might be running a language like Java that relies on a virtual machine such as JVM.
On top of the bare JVM environment, the customer is almost certainly using some subset of the Java standard library, for example, classes String and File. These library classes are implemented in Java, so the user could have ignored the library and directly used the resources of the underlying operating system.
On top of the Java library, if the customer is running our neuron network simulation, there is class ErrorMessage and Simulator. Both of these use classes from the Java library as well as pure Java code.
On top of the those classes, our code included class ScanSupport that rested on top of class ErrorMessage, so this can be viewed as yet another layer.

The actual neuron-network simulation code we wrote can be considered to rest on top of this stack of virtual machine layers. Almost all large software systems involve such stacks of virtual machines. Many of the layers described above can be further broken down into layers. For example, the internal code of many operating systems contains a kernel that is used to implement services that are used to implement other services that sit under application programs. From the outside, we consider the operating system to be a single layer, but to a system programmer working on developing the system, it has many layers.

Differences Between Virtual Machines

The virtual machine layers in the above hierarchy differ in some important ways. Software implementations of instruction sets have a significant performance penalty because it typically takes from 10 to 100 physical machine instructions to implement each virtual machine instruction. At the same time, a software implementation of an instruction set can completely isolate the user from knowing anything about the physical machine; this creates portability, and it also has the potential to be very secure.

In contrast, with a virtual machine consisting of a software library plus some lower level tools, the user is not obligated to use the library, so the user has full access to the lower level machine at its native speed. With full access comes danger, if some of the features of that machine are insecure or unsafe.

Transparency

Virtual machines in a hierarchy may be transparent or opaque; these terms were originally defined by Parnas in the early 1970s.

Transparent virtual machines: A virtual machine is transparent if it does not prevent its user from inspecting and making arbitrary changes to the state of underlying virtual (or physical) machines.
Opaque virtual machines: A virtual machine is opaque if it prevents its user from inspecting or making changes to the underlying state.

In general, opaque virtual machines offer security benefits, preventing user code from accessing or manipulating things that are dangerous, while transparent virtual machines offer greater flexibility and potentially greater transparency.

Parnas's original illustration for this concept was a 4-wheeled vehicle, perhaps an automobile. The low level virtual machine has 2 fixed wheels at the rear end, and two steerable wheels at the front. The front wheels can be steered independently (imagine two steering levers, one you can hold in each hand).

The low level vehicle is very flexible. It can follow any path a vehicle might want to follow, and it can even turn on a dime. Just position the rear axle over the dime, and then turn the two front wheels so that the lines of their axles also pass over the dime. This would make parallel parking incredibly easy, but this vehicle is extremely unsafe. If you are driving at any significant speed and you turn the front wheels so they are not parallel, you risk tumbling the vehicle tail over head. I would hate to drive this vehicle on I-80.

So, we build a higher level virtual machine on top of the low level one by linking the two front wheels to a steering wheel so that they always turn (approximately) in parallel. Our new vehicle is incredibly safer, but it is less flexible because our new virtual machine is opaque. Parallel parking is now a difficult skill involving much reversing and rocking of the vehicle.

Most virtual machines are a mix of transparent and opaque parts. Security problems such as the ability to install rootkits in operating systems tend to involve virtual machines that were intended to be opaque but had small (and typically difficult to find) transparent spots.

In Java, the primary tool we have to control the transparency of virtual machine layers in large programming projects are the ability to declare components of objects private. If we are careless in designing the methods of an abstraction, though, we can end up providing the user with a backdoor that allows some method or combination of methods to set a private field to an arbitrary and unsafe value.