Chapter 24 The Java Virtual Machine

Introducing the Virtual Machine
The Basic Parts of the Java Virtual Machine
Examining Bytecode
A Walk-through of the Instruction Set
Breaking Down the Class File Format
Signatures
Summary

Throughout this book, I have referenced the Java Virtual Machine and the mysterious bytecodes it reads. Without the Java Virtual Machine, Java would not be architecture neutral and you would not be able to run Java programs on platforms running different operating systems. To understand the Java Virtual Machine, you need to examine its basic parts, the objects it uses, and the bytecodes that make it work.

Introducing the Virtual Machine

Using an interpreter, all Java programs are compiled to an intermediate level called bytecode. You can run the compiled bytecode on any computer with the Java runtime environment installed on it. The runtime environment consists of the virtual machine and its supporting code.

The Java interpreter translates bytecode into sets of instructions the computer can understand. Because the bytecode is in an intermediate form, there is only a slight delay caused by the translation.

The difficult part of creating Java bytecode is that the source code is compiled for a machine that does not exist. This machine is called the Java Virtual Machine, and it exists only in the memory of your computer. Fooling the Java compiler into creating bytecode for a nonexistent machine is only one-half of the ingenious process that makes Java architecture neutral. The Java interpreter must also make your computer and the bytecode file believe they are running on a real machine. It does this by acting as the intermediary between the Virtual Machine and your real machine.

The Basic Parts of the Java Virtual Machine

Creating a Virtual Machine within your computer's memory banks requires building every major function of a real computer down to the very environment within which programs operate. For our purposes here, we'll break down these functions into seven basic parts:

A set of registers
A stack
An execution environment
A garbage-collected heap
A constant pool
A method storage area
An instruction set

Registers

The registers of the Java Virtual Machine are similar to the registers in your computer. However, because the Virtual Machine is stack based, its registers are not used for passing or receiving arguments. In Java, registers hold the machine's state and are updated after each line of bytecode is executed to maintain that state. The following four registers hold the state of the virtual machine:

frame, the reference frame, contains a pointer to the execution environment of the current method.
optop, the operand top, contains a pointer to the top of the operand stack and is used to evaluate arithmetic expressions.
pc, the program counter, contains the address of the next bytecode to be executed.
vars, the variable register, contains a pointer to local variables.

All these registers are 32 bits wide and are allocated immediately. This is possible because the compiler knows the size of the local variables and operand stack and because the interpreter knows the size of the execution environment.

The Stack

The Java Virtual Machine uses an operand stack to supply parameters to methods and operations, and to receive results back from them. All bytecode instructions take operands from the stack, operate on them, and return results to the stack. Like registers in the virtual machine, the operand stack is 32 bits wide.

The operand stack follows the last-in first-out (LIFO) methodology and expects the operands on the stack to be in a specific order. For example, the isub bytecode instruction expects two integers to be stored on the top of the stack, which means that the operands must have been pushed there by the previous set of instructions. isub pops the operands off the stack, subtracts them, and then pushes the results back onto the stack.

In Java, integers are a primitive data type. Each primitive data type has unique instructions that tell it how to operate on operands of that type. For example, the lsub bytecode is used to perform long integer subtraction, the fsub bytecode is used to perform floating-point subtraction, and the dsub bytecode is used to perform long integer subtraction. Because of this, it is illegal to push two integers onto the stack and then treat them as a single long integer. However, it is legal to push a 64-bit long integer onto the stack and have it occupy two 32-bit slots. Aren't you glad the Java compiler checks the rules for you as it compiles your program?

Each method in your Java program has a stack frame associated with it. The stack frame holds the state of the method with three sets of data: the method's local variables, the method's execution environment, and the method's operand stack. Although the sizes of the local variable and execution environment data sets are always fixed at the start of the method call, the size of the operand stack changes as the method's bytecode instructions are executed. Because the Java stack is 32 bits wide, 64-bit numbers are not guaranteed to be 64-bit aligned.

The Execution Environment

The execution environment is maintained within the stack as a data set and is used to handle dynamic linking, normal method returns, and exception generation. To handle dynamic linking, the execution environment contains symbolic references to methods and variables for the current method and current class. These symbolic calls are translated into actual method calls through dynamic linking to a symbol table.

Whenever a method completes normally, a value is returned to the calling method. The execution environment handles normal method returns by restoring the registers of the caller and incrementing the program counter of the caller to skip the method call instruction. Execution of the program then continues in the calling method's execution environment.

If execution of the current method completes normally, a value is returned to the calling method. This occurs when the calling method executes a return instruction appropriate to the return type.

If the calling method executes a return instruction that is not appropriate to the return type, the method throws an exception or an error. Errors that can occur include dynamic linkage failure, such as a failure to find a class file, or runtime errors, such as a reference outside the bounds of an array. When errors occur, the execution environment generates an exception. (See Chapter 8, "Tying It All Together: Threads, Exceptions, and More," for a discussion of exception handling.)

The Garbage-Collected Heap

Each program running in the Java runtime environment has a garbage-collected heap assigned to it. Because instances of class objects are allocated from this heap, another word for the heap is the memory allocation pool. By default, the heap size is set to 1MB on most systems.

Although the heap is set to a specific size when you start a program, it can grow-for example, when new objects are allocated. To ensure that the heap does not get too large, objects that are no longer in use are automatically deallocated or garbage-collected by the Java Virtual Machine.

Java performs automatic garbage collection as a background thread. Each thread running in the Java runtime environment has two stacks associated with it: The first stack is used for Java code; the second is used for C code. Memory used by these stacks draws from the total system memory pool. Whenever a new thread starts execution, it is assigned a maximum stack size for Java code and for C code. By default on most systems, the maximum size of the Java code stack is 400KB and the maximum size of the C code stack is 128KB.

If your system has memory limitations, you can force Java to perform more aggressive cleanup and thus reduce the total amount of memory used. To do this, reduce the maximum size of the Java and C code stacks. If your system has lots of memory, you can force Java to perform less aggressive cleanup, thus reducing the amount of background processing. To do this, increase the maximum size of the Java and C code stacks.

The Constant Pool

Each class in the heap has a constant pool associated with it. Because constants do not change, they are usually created at compile time. Items in the constant pool encode all the names used by any method in a particular class. The class contains a count of how many constants exist and an offset that specifies where a particular listing of constants begins within the class description.

All information associated with a constant follows a specific format based on the type of the constant. For example, class-level constants are used to represent a class or an interface and have the following format:

CONSTANT_Class_info { u1 tag; u2 name_index; }

where tag is the value of CONSTANT_Class and the name_index provides the string name of the class. The class name for int[][] is [[I. The class name for Thread[] is [Ljava.lang.Thread;.

The Method Area

Java's method area is similar to the compiled code areas of the runtime environments used by other programming languages. It stores bytecode instructions that are associated with methods in the compiled code and the symbol table the execution environment needs for dynamic linking. Any debugging or additional information that might need to be associated with a method is stored in this area as well.

The Bytecode Instruction Set

Although programmers prefer to write code in a high-level format, your computer cannot execute this code directly, which is why you must compile Java programs before you can run them. Generally, compiled code is either in a machine-readable format called machine language or in an intermediate-level format such as the assembly language or Java bytecode.

The bytecode instructions used by the Java Virtual Machine resemble Assembler instructions. If you have ever used Assembler, you know that the instruction set is streamlined to a minimum for the sake of efficiency and that tasks, such as printing to the screen, are accomplished using a series of instructions. For example, the Java language allows you to print to the screen using a single line of code, such as

System.out.println("Hello world!");

At compile time, the Java compiler converts the single-line print statement to the following bytecode:

0 getstatic #6 <Field java.lang.System.out Ljava/io/PrintStream;> 3 ldc #1 <String "Hello world!"> 5 invokevirtual #7 <Method java.io.PrintStream.println(Ljava/lang/String;)V> 8 return

The JDK provides a tool for examining bytecode called the Java class file disassembler. As you will see later in this chapter, you can run the disassembler by typing javap at the command line or by starting a graphical disassembler that you got with a third-party toolkit.

Because the bytecode instructions are in such a low-level format, your programs execute at nearly the speed of programs compiled to machine language. All instructions in machine language are represented by byte streams of 0s and 1s. In a low-level language, byte streams of 0s and 1s are replaced by suitable mnemonics, such as the bytecode instruction isub. As with the assembly language, the basic format of a bytecode instruction is

<operation> <operands(s)>

Therefore, an instruction in the bytecode instruction set consists of a 1-byte opcode specifying the operation to be performed, and zero or more operands that supply parameters or data that will be used by the operation.

Examining Bytecode

If you are familiar with how intermediate-level languages like Assembler are used, you should see the true beauty of Java. Here's a language that allows you to compile the source code to a machine-independent, intermediate form that will execute nearly as quickly as if it were fully compiled.

To allow you to examine bytecode instructions, the JDK includes a tool called the Java class file disassembler. The name of this tool is somewhat deceptive: You can use it to look at the internals of Java class files, but you cannot disassemble a compiled class file to create a Java source file.

You can use the disassembler for several purposes:

To gain quick insight into how a class works.
To see how a class uses system resources.
To check your import statements and dependencies

Two versions of the disassembler are included in the JDK. The first version, javap, is optimized for normal use and has only limited debugging capabilities. The second version, javap_g, is optimized for debugging and is intended for use with the Java debugger.

You run the Java disassembler from the command line and pass it the name of your Java class file without the .class extension. If you invoke the disassembler with no options, it outputs the name of the file you compiled the class file from and abbreviated declarations for public variables, methods, and classes. If you created the FirstApplet program in Chapter 23, "Advanced Debugging and Troubleshooting," you can use javap to disassemble the applet by changing to the directory with the compiled class file and typing the following:

javap FirstApplet

The output to your screen should be similar to this:

Compiled from FirstApplet.java public class FirstApplet extends java.applet.Applet { java.awt.Image NewImage; public void init(); public void paint(java.awt.Graphics); public FirstApplet(); }

Although the output looks rather terse, it is extremely useful. From this output, you can see all public fields and the basic structure of the program at a glance. This gives you a fair idea of how the program actually works. You can also see exactly where Java needs fully qualified paths to classes and what those paths are. The previous example shows that FirstApplet uses three classes in the Java API:

java.applet.Applet java.awt.Image java.awt.Graphics

Beginning Java programmers can use the output of javap as a guide to see if they are importing too many classes. To check this, compare which classes are actually used to the classes you are importing. You can also use javap to clean up your code. Using the fully qualified class names, you could rewrite the FirstApplet program so that it does not need import statements. Because the Java compiler no longer has to search the namespace for specific class instances, this new version of FirstApplet compiles slightly faster than the old version. The new version of FirstApplet follows:

public class FirstApplet extends java.applet.Applet { java.awt.Image NewImage; public void init() { resize(300,300); NewImage = getImage(getCodeBase(),"New.gif"); } public void paint(java.awt.Graphics g) { g.drawImage(NewImage,50,50,this); play(getCodeBase(),"New.au"); } }

You can use the -v option to get verbose output from the disassembler. Verbose output gives you some idea of the stacks, local variables, and arguments used by the program. If you use this option on the FirstApplet program by typing

javap -v FirstApplet

the output to your screen should be similar to this:

Compiled from FirstApplet.java public class FirstApplet extends java.applet.Applet { java.awt.Image NewImage; public void init(); /* Stack=4, Locals=1, Args_size=1 */ public void paint(java.awt.Graphics); /* Stack=5, Locals=2, Args_size=2 */ public FirstApplet(); /* Stack=1, Locals=1, Args_size=1 */ }

You can use the -p and -c options to get more information about a program. The -p option specifies that you want javap to print out private and protected variables, methods, and classes in addition to the public ones. You use the -c option to disassemble compiled code to bytecode instructions.

Note

You can use the Java disassembler to examine the bytecode instructions of any Java class file.

Bytecode instructions are useful when you want to see exactly what the Java Virtual Machine is doing when it runs the bytecode. However, bytecode instructions do not resemble your original source code. In fact, they more closely resemble an assembler program. For example, the four-line paint() method of FirstApplet is displayed in bytecode as the following:

Method void paint(java.awt.Graphics) 0 aload_1 1 aload_0 2 getfield #12 <Field FirstApplet.NewImage Ljava/awt/Image;> 5 bipush 50 7 bipush 50 9 aload_0 10 invokevirtual #8 <Method java.awt.Graphics.drawImage(Ljava/awt/Image; ÂIILjava/awt/image/ImageObserver;)Z> 13 pop 14 aload_0 15 aload_0 16 invokevirtual #9 <Method java.applet.Applet.getCodeBase()Ljava/net/URL;> 19 ldc #2 <String "New.au"> 21 invokevirtual #7 <Method java.applet.Applet.play Â(Ljava/net/URL;Ljava/lang/String;)V> 24 return

The entire listing of bytecode instructions for the FirstApplet.class file is shown in Listing 24.1.

Listing 24.1. Bytecode for FirstApplet.

Compiled from FirstApplet.java public class FirstApplet extends java.applet.Applet { java.awt.Image NewImage; public void init(); public void paint(java.awt.Graphics); public FirstApplet(); Method void init() 0 aload_0 1 sipush 300 4 sipush 300 7 invokevirtual #11 <Method java.applet.Applet.resize(II)V> 10 aload_0 11 aload_0 12 aload_0 13 invokevirtual #9 <Method java.applet.Applet.getCodeBase()Ljava/net/URL;> 16 ldc #1 <String "New.gif"> 18 invokevirtual #10 <Method java.applet.Applet.getImage Â(Ljava/net/URL;Ljava/lang/String;)Ljava/awt/Image;> 21 putfield #12 <Field FirstApplet.NewImage Ljava/awt/Image;> 24 return Method void paint(java.awt.Graphics) 0 aload_1 1 aload_0 2 getfield #12 <Field FirstApplet.NewImage Ljava/awt/Image;> 5 bipush 50 7 bipush 50 9 aload_0 10 invokevirtual #8 <Method java.awt.Graphics.drawImage(Ljava/awt/Image; ÂIILjava/awt/image/ImageObserver;)Z> 13 pop 14 aload_0 15 aload_0 16 invokevirtual #9 <Method java.applet.Applet.getCodeBase()Ljava/net/URL;> 19 ldc #2 <String "New.au"> 21 invokevirtual #7 <Method java.applet.Applet.play Â(Ljava/net/URL;Ljava/lang/String;)V> 24 return Method FirstApplet() 0 aload_0 1 invokenonvirtual #6 <Method java.applet.Applet.<init>()V> 4 return }

As you can see from the previous examples, when you compile a Java program the compiler translates each line of high-level code into multiple lines of low-level instructions. These instructions are organized by the key class-level and method objects.

In the sections that follow, I examine the bytecode instruction set to give you a firm understanding of both the Java Virtual Machine and Java bytecode.

You can also use the disassembler to create minimal C header files. To do this, you use the -h option. If you generate a header file for FirstApplet using javap, you will find that the resulting output is very different from the output generated by javah. For this reason, you should really use the javap tool for what it was designed for and not to generate C header files.

Other useful options are -classpath and -verify. The -classpath option lets you override the default and current CLASSPATH setting. If the program you are disassembling makes use of any classes that are not stored in the current directory, you should either set the CLASSPATH environment variable or use the -classpath option. The -verify option lets you validate the Java class file. The general message you will get if the class file is valid is

Class classname succeeds

The command-line syntax for the disassembler is

javap [options] classname

javap [options] classname1 classname2 classname3 …

The disassembler takes these options:

Option	Description
`-c`	Disassembles compiled code to bytecode instructions
`-classpath`	Overrides the default or current `CLASSPATH` environment variable setting
`-h`	Creates a minimal C header file
`-p`	Displays private and protected variables, methods, and classes in addition to the public ones
`-v`	Displays verbose output and gives you some idea of the stacks, local variables, and arguments used by the program
`-verify`	Validates the Java class file

A Walk-through of the Instruction Set

The instruction set walkthrough that follows should help you understand how the Java Virtual Machine uses bytecode instructions. As you read this section, keep in mind that each instruction in the bytecode instruction set consists of a 1-byte opcode specifying the operation to be performed, and zero or more operands that supply parameters or data that will be used by the operation. Most bytecode instructions take as an operand a numeric value, an object name, or both, such as

bipush 50

getfield #12 <Field FirstApplet.NewImage Ljava/awt/Image;>

Knowing this, you can key in on the most important aspect of the bytecode instruction set: understanding how the instruction affects the stack. For this reason, along with a brief description of what a bytecode instruction does, I present a text picture of the operand stack before and after the operation associated with a bytecode instruction is performed, such as the following example:

The first stack drawing shows that the bipush instruction expects no operands to be on the top of the stack but does push a value onto the stack. The second drawing shows that the iadd instruction expects two operands-v1 and v2-to be on the top of the stack. These values are in turn popped off the stack and operated on to produce the result, and the result is pushed back onto the top of the stack.

Note

After v1 and v2 are operated on, only the result is pushed back on the stack. The variables have served their purpose.

Bytecode instructions that do not affect control flow simply execute and advance the program counter register so that it points to the address of the next bytecode instruction. Otherwise, the program counter is advanced over any operand bytes so that it points to the next bytecode in sequence. In the sections that follow, references to byte1, byte2, and so on, refer to the bytes following the opcode.

Tip

To better understand the walkthrough of the bytecode instruction set that follows, you might want to disassemble some of the class files you created as you read this book. You can also disassemble any of the class files on the CD-ROM.

Pushing Constants onto the Stack

One of the most basic tasks of the virtual machine is to push constants onto the stack. The next section looks at instructions that are used to do this.

`bipush`

Meaning: Push a 1-byte signed integer.

Description: The first byte following the opcode, byte1, is interpreted as a signed 8-bit value. This value is expanded to a 32-bit integer and pushed onto the operand stack.

`sipush`

Meaning: Push a 2-byte signed integer.

Description: The first and second byte following the opcode, byte1 and byte2, are assembled into a signed 16-bit value. This value is expanded to a 32-bit integer and pushed onto the operand stack.

`ldc1`

Meaning: Push item from constant pool.

Description: byte1 is used as an unsigned 8-bit index into the constant pool of the current class. After the item at that index is resolved, it is pushed onto the stack.

Note

This bytecode instruction throws an OutOfMemoryError if a String object is being pushed and there is not enough memory to allocate space for it.

`ldc2`

Meaning: Push 2-byte item from constant pool.

Description: byte1 and byte2 are used to construct an unsigned 16-bit index into the constant pool of the current class. After the item at that index is resolved, it is pushed onto the stack.

Note

This bytecode instruction throws an OutOfMemoryError if a String object is being pushed and there is not enough memory to allocate space for it.

`ldc2w`

Meaning: Push long or double from constant pool.

Description: byte1 and byte2 are used to construct an unsigned 16-bit index into the constant pool of the current class. After the two-word constant at that index is resolved, it is pushed onto the stack.

Note

This bytecode instruction throws an OutOfMemoryError if a String object is being pushed and there is not enough memory to allocate space for it.

`aconst_null`

Meaning: Push a null object reference.

Description: This bytecode pushes a null object reference onto the stack.

`iconst_m1`

Meaning: Push the integer constant -1.

Description: This bytecode pushes the integer constant -1 onto the stack.

`iconst_<C>`

Meaning: Push the integer constant <C>.

Description: This bytecode pushes an integer constant onto the stack. There are six associated bytecodes, one for each of the integers 0-5: iconst_0, iconst_1, iconst_2, iconst_3, iconst_4, and iconst_5.

`lconst_<LC>`

Meaning: Push the long integer constant <LC>.

Description: This bytecode pushes a long integer constant onto the stack. There are two associated bytecodes, one for each of the integers 0-1: lconst_0 and lconst_1.

`fconst_<F>`

Meaning: Push the single-precision floating-point number <F>.

Description: This bytecode pushes a single-precision floating-point number onto the stack. There are three associated bytecodes, one for each of the integers 0-2: fconst_0, fconst_1, and fconst_2.

`dconst_<D>`

Meaning: Push the double-precision floating-point number <D>.

Description: This bytecode pushes a double-precision floating-point number onto the stack. There are two associated bytecodes, one for each of the integers 0-1: dconst_0 and dconst_1.

Loading Local Variables onto the Stack

`iload`

Meaning: Load integer from local variable.

Description: The value of the local variable in the current Java frame is pushed onto the operand stack. This value must be of type integer.

`iload_<l>`

Meaning: Load integer from local variable with specific index.

Description: The value of the local variable <l>in the current Java frame is pushed onto the operand stack. This value must be of type integer. There are four of these bytecodes, one for each of the integers 0-3: iload_0, iload_1, iload_2, and iload_3.

`lload`

Meaning: Load long integer from local variable.

Description: The values of the local variables word1 and word2 in the current Java frame are pushed onto the operand stack. These values together must form a long integer.

`lload_<l>`

Meaning: Load long integer from local variable with specific index.

Description: The values of the local variables <l> and <l>+1 in the current Java frame are pushed onto the operand stack. These values together must form a long integer. There are four of these bytecodes, one for each of the integers 0-3: lload_0, lload_1, lload_2, and lload_3.

`fload`

Meaning: Load single float from local variable.

Description: The value of the local variable in the current Java frame is pushed onto the operand stack. This value must be a single-precision floating-point number.

`fload_<l>`

Meaning: Load single float from local variable with specific index.

Description: The value of the local variable <l>in the current Java frame is pushed onto the operand stack. This value must be a single-precision floating-point number. There are four of these bytecodes, one for each of the integers 0-3: fload_0, fload_1, fload_2, and fload_3.

`dload`

Meaning: Load double float from local variable.

Description: The values of the local variables word1 and word2 in the current Java frame are pushed onto the operand stack. These values together must form a double-precision floating-point number.

`dload_<l>`

Meaning: Load double float from local variable with specific index.

Description: The values of the local variables <l> and <l>+1 in the current Java frame are pushed onto the operand stack. These values together must form a double-precision floating-point number. There are four of these bytecodes, one for each of the integers 0-3: dload_0, dload_1, dload_2, and dload_3.

`aload`

Meaning: Load object reference from local variable.

Description: The value of the local variable in the current Java frame is pushed onto the operand stack. This value must contain a return address or be a reference to an object or array.

`aload_<l>`

Meaning: Load object reference from local variable with specific index.

Description: The value of the local variable <l> in the current Java frame is pushed onto the operand stack. This value must contain a return address or be a reference to an object or array. There are four of these bytecodes, one for each of the integers 0-3: aload_0, aload_1, aload_2, and aload_3.

Storing Stack Values into Local Variables

`istore`

Meaning: Store integer into local variable.

Description: Local variable in the current Java frame is set to a value. This value must be an integer.

`istore_<l>`

Meaning: Store integer into local variable with specific index.

Description: Local variable <l> in the current Java frame is set to a value. This value must be an integer. There are four of these bytecodes, one for each of the integers 0-3: istore_0, istore_1, istore_2, and istore_3.

`lstore`

Meaning: Store long integer into local variable.

Description: Local variables word1 and word2 in the current Java frame are set to a value. This value must be a long integer.

`lstore_<l>`

Meaning: Store long integer into local variable with specific index.

Description: Local variables <l> and <l>+1 in the current Java frame are set to a value. This value must be a long integer. There are four of these bytecodes, one for each of the integers 0-3: lstore_0, lstore_1, lstore_2, and lstore_3.

`fstore`

Meaning: Store single float into local variable.

Description: Local variable in the current Java frame is set to a value. This value must be a single-precision floating-point number.

`fstore_<l>`

Meaning: Store single float into local variable with specific index.

Description: Local variable <l> in the current Java frame is set to a value. This value must be a single-precision floating-point number. There are four of these bytecodes, one for each of the integers 0-3: lstore_0, lstore_1, lstore_2, and lstore_3.

`dstore`

Meaning: Store double-precision floating-point number into local variable.

Description: Local variables word1 and word2 in the current Java frame are set to a value. The value must be a double-precision floating-point number.

`dstore_<l>`

Meaning: Store double float into local variable with specific index.

Description: Local variables <l> and <l>+1 in the current Java frame are set to a value. The value must be a double-precision floating-point number. There are four of these bytecodes, one for each of the integers 0-3: dstore_0, dstore_1, dstore_2, and dstore_3.

`astore`

Meaning: Store object reference into local variable.

Description: Local variable at the index in the current Java frame is set to a value. The value must be a return address or a reference to an object.

`astore_<l>`

Meaning: Store object reference into local variable with specific index.

Description: Local variable <l> in the current Java frame is set to a value. The value must be a return address or a reference to an object. There are four of these bytecodes, one for each of the integers
0-3: astore_0, astore_1, astore_2, and astore_3.

`iinc`

Meaning: Increment local variable by constant.

Description: Local variable at byte1 in the current Java frame must contain an integer. Its value is incremented by the value byte2, where byte2 is treated as a signed 8-bit quantity.

Handling Arrays

`newarray`

Meaning: Allocate new array.

Description: A new array of a specific array type, capable of holding size elements, is allocated. The result is a reference to the new object. Allocation of an array large enough to contain size items of the specific array type is attempted and all elements of the array are initialized to 0.

size represents the number of elements in the new array and must be an integer. The result is stored with an internal code that indicates the type of array to allocate. Possible values for the type of array are as follows: T_BOOLEAN(4), T_chAR(5), T_FLOAT(6), T_DOUBLE(7), T_BYTE(8), T_SHORT(9), T_INT(10), and T_LONG(11).

Note

A NegativeArraySizeException is thrown if size is less than 0. An OutOfMemoryError is thrown if there is not enough memory to allocate the array.

`anewarray`

Meaning: Allocate new array of objects.

Description: A new array of the indicated class type and capable of holding size elements is allocated. The result is a reference to the new object. Allocation of an array large enough to contain size elements of the given class type is attempted and all elements of the array are initialized to null.

size represents the number of elements in the new array and must be an integer. byte1 and byte2 are used to construct an index into the constant pool of the current class. When the item at that index is resolved, the resulting entry must be a class.

Note

A NegativeArraySizeException is thrown if size is less than 0. An OutOfMemoryError is thrown if there is not enough memory to allocate the array.

The anewarray instruction is used to create a single-dimension array. For example, the declaration new Thread[7] generates the following bytecode instructions:

bipush 7 anewarray <Class "java.lang.Thread">

The anewarray instruction can also be used to create the outermost dimension of a multidimensional array. For example, the array declaration new int[6][] generates the following bytecode instructions:

bipush 6 anewarray <Class "[I">

`multianewarray`

Meaning: Allocate new multidimensional array.

Description: A new multidimensional array of a specific array type is allocated. The number of dimensions in the array is determined by sizeN. The value of sizeN represents the number of elements in the new array and must be an integer. byte1 and byte2 are used to construct an index in the constant pool of the current class. The item at that index is resolved and the resulting entry must be an array class of one or more dimensions.

Note

A NegativeArraySizeException is thrown if sizeN is less than 0. An OutOfMemoryError is thrown if there is not enough memory to allocate the array.

`arraylength`

Meaning: Get length of array.

Description: The length of the array is determined and replaces aref on the top of the stack. The aref must be a reference to an array object.

Note

A NullPointerException is thrown if the aref is null.

`iaload`

Meaning: Load integer from array.

Description: The integer value at the array index is retrieved and pushed onto the top of the stack. The aref must be a reference to an array of integers; likewise, the index into the array must be an integer.

Note

A NullPointerException is thrown if aref is null. An ArrayIndexOutOfBoundsException is thrown if the array index is not within the bounds of the array.

`laload`

Meaning: Load long integer from array.

Description: The long integer value at the array index is retrieved and pushed onto the top of the stack. The aref must be a reference to an array of long integers; likewise, the index into the array must be an integer.

`faload`

Meaning: Load single float from array.

Description: The single-precision floating-point number at the array index is retrieved and pushed onto the top of the stack. The aref must be a reference to an array of single-precision floating-point numbers, and the index into the array must be an integer.

`daload`

Meaning: Load double float from array.

Description: The double-precision floating-point number at the array index is retrieved and pushed onto the top of the stack. The aref must be a reference to an array of double-precision floating-point numbers, and the index into the array must be an integer.

`aaload`

Meaning: Load object reference from array.

Description: The object reference at the array index is retrieved and pushed onto the top of the stack. The aref must be a reference to an array of object references, and the index into the array must be an integer.

`baload`

Meaning: Load signed byte from array.

Description: The signed byte value at the array index is retrieved, expanded to an integer, and pushed onto the top of the stack. The aref must be a reference to an array of signed byte values, and the index into the array must be an integer.

`caload`

Meaning: Load character from array.

Description: The character value at the array index is retrieved, expanded to an integer, and pushed onto the top of the stack. The aref must be a reference to an array of character values, and the index into the array must be an integer.

`saload`

Meaning: Load short integer from array.

Description: The short integer value at the array index is retrieved, expanded to an integer, and pushed onto the top of the stack. The aref must be a reference to an array of short integer values and the index into the array must be an integer.

`iastore`

Meaning: Store into integer array.

Description: An integer value is popped off the stack and stored in the array at the index. The aref must be a reference to an array of integer values, and the index into the array must be an integer as well.

`lastore`

Meaning: Store into long integer array.

Description: A long integer value is popped off the stack and stored in the array at the index. The arrayref must be a reference to an array of long integer values, and the index into the array must be an integer.

`fastore`

Meaning: Store into single float array.

Description: A single-precision floating-point number is popped off the stack and stored in the array at the index. The arrayref must be a reference to an array of single-precision floating-point numbers, and the index into the array must be an integer.

`dastore`

Meaning: Store into double float array.

Description: A double-precision floating-point number is popped off the stack and stored in the array at the index. The arrayref must be a reference to an array of double-precision floating-point numbers, and the index into the array must be an integer.

`aastore`

Meaning: Store into object reference array.

Description: An object reference is popped off the stack and stored in the array at the index. The arrayref must be a reference to an array of objects, and the index into the array must be an integer.

`bastore`

Meaning: Store into signed byte array.

Description: An integer is popped off the stack, converted to a signed byte, and stored in the array at the index. The arrayref must be a reference to an array of signed bytes, and the index into the array must be an integer. If the integer value is too large to be a signed byte, it is truncated.

`castore`

Meaning: Store into character array.

Description: An integer is popped off the stack, converted to a character, and stored in the array at the index. The arrayref must be a reference to an array of characters and the index into the array must be an integer. If the integer value is too large to be a character, it is truncated.

`sastore`

Meaning: Store into short array.

Description: An integer is popped off the stack, converted to a short integer, and stored in the array at the index. The aref must be a reference to an array of short integers, and the index into the array must be an integer. If the integer value is too large to be a short integer, it is truncated.

Note

Now that you are familiar with bytecode instructions, I will only provide descriptions as necessary.

Handling the Stack

`nop`

Meaning: Do nothing.

`pop`

Meaning: Pop the top word from the stack.

`pop2`

Meaning: Pop the top two words from the stack.

`dup`

Meaning: Duplicate the top word on the stack.

`dup2`

Meaning: Duplicate the top two words on the stack.

`dup_x1`

Meaning: Duplicate the top word on the stack and insert a copy two words down in the stack.

`dup2_x1`

Meaning: Duplicate the top two words on the stack and insert the copies two words down in the stack.

`dup_x2`

Meaning: Duplicate the top word on the stack and insert the copy three words down in the stack.

`dup2_x2`

Meaning: Duplicate the top two words on the stack and insert the copies three words down in the stack.

`swap`

Meaning: Swap the top two elements on the stack.

Performing Arithmetic

Bytecode arithmetic is performed at a very basic level. In general, values are popped off the stack, an arithmetic function is performed, and the result of the operation is placed back on the stack, which effectively replaces the original values. To perform the bytecode arithmetic correctly, both values must be of the same type.

Addition

In bytecode addition, two values are popped off the stack and added; then the sum is placed back on the stack.

`iadd`

Meaning: Integer add.

`ladd`

Meaning: Long integer add.

`fadd`

Meaning: Single-precision floating-point add.

`dadd`

Meaning: Double-precision floating-point add.

Subtraction

In bytecode subtraction, two values are popped off the stack, the second value is subtracted from the first, and the result is placed back on the stack.

`isub`

Meaning: Integer subtract.

`lsub`

Meaning: Long integer subtract.

`fsub`

Meaning: Single-precision floating-point subtract.

`dsub`

Meaning: Double-precision floating-point subtract.

Multiplication

In bytecode multiplication, the values are popped off the stack and multiplied; then the product is placed back on the stack.

`imul`

Meaning: Integer multiply.

`lmul`

Meaning: Long integer multiply.

`fmul`

Meaning: Single-precision floating-point multiply.

`dmul`

Meaning: Double-precision floating-point multiply.

Division

In bytecode division, two values are popped off the stack, the first value is divided by the second value, and the quotient is placed back on the stack. The result is truncated to the nearest integer.

Note

For integers, shorts, and long integers, an attempt to divide by zero results in an ArithmeticException. For floating-point numbers, an attempt to divide by zero results in the quotient being not a number. If you remember NAN from calculus, you'll know that it's immeasurable and will distort your results. The Virtual Machine can check for NAN values using bytecodes that perform comparisons.

`idiv`

Meaning: Integer divide.

`ldiv`

Meaning: Long integer divide.

`fdiv`

Meaning: Single-precision floating-point divide.

`ddiv`

Meaning: Double-precision floating-point divide.

Remainders

To get a remainder for division operations on integers and long integers, two values are popped off the stack, the first value is divided by the second value, and the remainder is placed back on the stack. The result is always truncated to the nearest integer. Therefore, to get a quotient and a remainder, two division operations are done by the compiler.

Note

For integers, shorts, and long integers, an attempt to divide by zero results in an ArithmeticException.

`irem`

Meaning: Integer remainder.

`lrem`

Meaning: Long integer remainder.

To get a remainder for division operations on floating-point numbers, two values are popped off the stack. The first value is divided by the second value and then multiplied by the second value. The product is subtracted from the first value, and the result is placed back on the stack. The result always rounds to the nearest integer, with a tie going to the even number.

Note

For floating-point numbers, an attempt to divide by zero results in the quotient being not a number.

`frem`

Meaning: Single-precision floating-point remainder.

`drem`

Meaning: Double-precision floating-point remainder.

Negation

In bytecode negation, a value is popped off the stack and negated, and the result is placed back on the stack.

`ineg`

Meaning: Integer negate.

`lneg`

Meaning: Long integer negate.

`fneg`

Meaning: Single-precision floating-point negate.

`dneg`

Meaning: Double-precision floating-point negate.

Logical Instructions

Logical instructions include operations to shift values and perform logical AND, logical OR, or logical XOR. To perform logical functions, both values must be of the same type.

Shifting Values

When values are left-shifted, the sign of the value is not affected. However, when values are right-shifted, the sign of the value can be affected. For this reason, values are right-shifted in one of two ways: with the sign extension (called arithmetic shifting) or without the sign extension (called logical or unsigned shifting).

For right arithmetic shifting, the first value is generally shifted by the amount indicated by the low 6 bits of the second value. For right logical shifting, the first value is generally shifted by the amount indicated by the low 5 bits of the second value.

`ishl`

Meaning: Integer shift left.

`ishr`

Meaning: Integer arithmetic shift right.

`iushr`

Meaning: Integer logical shift right.

`lshl`

Meaning: Long integer shift left.

`lshr`

Meaning: Long integer arithmetic shift right.

`lushr`

Meaning: Long integer logical shift right.

Logical `AND`

For logical AND, two values are popped off the stack and replaced on the stack by their bitwise logical AND. A logical AND is also called the conjunction of the values.

`iand`

Meaning: Integer boolean AND.

`land`

Meaning: Long integer boolean AND.

Logical `OR`

For logical OR, two values are popped off the stack and replaced on the stack by their bitwise logical OR. A logical OR is also called the disconjunction of the values.

`ior`

Meaning: Integer boolean OR.

`lor`

Meaning: Long integer boolean OR.

Logical `XOR`

For logical XOR, two values are popped off the stack and replaced on the stack by their bitwise logical XOR. A logical XOR is also called an exclusive disconjunction of the values.

`ixor`

Meaning: Integer boolean XOR.

`lxor`

Meaning: Long integer boolean XOR.

Handling Conversions

Conversions in the Java Virtual Machine are handled by a specific set of bytecode instructions. As you know, Java allows you to convert from one type of value to another either implicitly or explicitly. When a conversion occurs, such as an integer to single-precision floating-point, one of these bytecodes is used.

Note

If the conversion is to a type of smaller bit width, truncation may occur. There is no notification when truncation occurs.

`i2l`

Meaning: Integer-to-long integer conversion.

`i2f`

Meaning: Integer to single float.

`i2d`

Meaning: Integer to double float.

`l2i`

Meaning: Long integer to integer.

`l2f`

Meaning: Long integer to single float.

`l2d`

Meaning: Long integer to double float.

`f2i`

Meaning: Single float to integer.

`f2l`

Meaning: Single float to long integer.

`f2d`

Meaning: Single float to double float.

`d2i`

Meaning: Double float to integer.

`d2l`

Meaning: Double float to long integer.

`d2f`

Meaning: Double float to single float.

`int2byte`

Meaning: Integer to signed byte.

`int2char`

Meaning: Integer to char.

`int2short`

Meaning: Integer to short.

Control Transfer Instructions

Instructions that transfer control are a big part of any language structure. In the Java bytecode instruction set, control transfer is handled by instructions that perform branching, comparisons, and movement to and from subroutines.

Unconditional Branching

One way to transfer control is with unconditional branching. The Virtual Machine handles this type of branching with the bytecodes discussed in this section.

`goto`

Meaning: Branch.

Description: Execution proceeds at the signed 16-bit offset from the address of this instruction. The offset is constructed from byte1 and byte2.

`goto_w`

Meaning: Branch always for wide index.

Description: Execution proceeds at the signed 32-bit offset from the address of this instruction. The offset is constructed from byte1, byte2, byte3, and byte4.

Handling Subroutines and Breakpoints

Instructions that jump to a subroutine push the return address onto the stack. This address is retrieved from a local variable.

`jsr`

Meaning: Jump subroutine.

Description: Execution proceeds at the signed 16-bit offset from the address of this instruction. The offset is constructed from byte1 and byte2. The address of the instruction immediately following the current instruction is pushed onto the stack to provide a return address from the subroutine.

`jsr_w`

Meaning: Jump subroutine for wide index.

Description: Execution proceeds at the signed 32-bit offset from the address of this instruction. The offset is constructed from byte1, byte2, byte3, and byte4. The address of the instruction immediately following the current instruction is pushed onto the stack to provide a return address from the subroutine.

`ret`

Meaning: Return from subroutine.

Description: Local variable in the current Java frame must contain a return address. The contents of the local variable are written into the program counter.

`ret_w`

Meaning: Return from subroutine for wide index.

Description: Local variable in the current Java frame must contain a return address. The contents of the local variable are written into the program counter.

`breakpoint`

Meaning: Breakpoint.

Description: Stop and pass control to breakpoint handler.

Conditional Branching

Branching instructions check for a specific value. If the value is as expected, execution proceeds at an offset from the address of the current instruction. Otherwise, execution proceeds to the next instruction.

`ifeq`

Meaning: Branch if equal.

`ifnull`

Meaning: Branch if null.

`iflt`

Meaning: Branch if less than.

`ifle`

Meaning: Branch if less than or equal.

`ifne`

Meaning: Branch if not equal.

`ifnonnull`

Meaning: Branch if not null.

`ifgt`

Meaning: Branch if greater than.

`ifge`

Meaning: Branch if greater than or equal.

`if_icmpeq`

Meaning: Branch if integers v1 and v2 are equal.

`if_icmpne`

Meaning: Branch if integers v1 and v2 are not equal.

`if_icmplt`

Meaning: Branch if integer v1 is less than integer v2.

`if_icmpgt`

Meaning: Branch if integer v1 is greater than integer v2.

`if_icmple`

Meaning: Branch if integer v1 less than or equal to integer v2.

`if_icmpge`

Meaning: Branch if integer v1 greater than or equal to integer v2.

`if_acmpeq`

Meaning: Branch if object references equal.

`if_acmpne`

Meaning: Branch if object references not equal.

Comparisons

Comparison instructions compare two values. If v1 is less than v2, the value -1 is pushed onto the stack. If the values are equal, the value 0 is pushed onto the stack. If v1 is greater than v2, the value +1 is pushed onto the stack.

For floating-point numbers, if either v1 or v2 is not a number, the value -1 is pushed onto the stack for the first pair of bytecodes and the value +1 is pushed onto the stack for the second pair of bytecodes. The process of checking for the infamous not-a-number problem is handled by performing two comparisons. The first comparison checks for the value -1, and the second checks for the value +1.

`lcmp`

Meaning: Long integer compare.

`fcmpl`

Meaning: Single-precision floating-point number compare; return -1 if v1 or v2 is not a number.

`dcmpl`

Meaning: Double-precision floating-point number compare; return -1 if v1 or v2 is not a number.

`fcmpg`

Meaning: Single-precision floating-point number compare; return +1 if v1 or v2 is not a number.

`dcmpg`

Meaning: Double-precision floating-point number compare; return +1 if v1 or v2 is not a number.

Returning from Methods

Returns are handled in one of two ways by bytecode instructions: return (void) or return (normal). Void returns are used to back out cleanly from the previous method. Thus, all values on the operand stack are discarded and the interpreter then returns control to its caller. Normal returns are used to proceed normally with execution and push a value associated with the previous method onto the stack. After the return value is pushed onto the stack, any other values on the operand stack are discarded and the interpreter then returns control to its caller.

Note

Java's operand stack is not contiguous like the operand stacks you may be familiar with from programming in other languages. Because each method has its own section of the operand stack, when the operand stack for the method is discarded, only the section related to the method is cleared out.

`return`

Meaning: Return (void) from method.

`ireturn`

Meaning: Return integer from function.

`lreturn`

Meaning: Return long integer from function.

`freturn`

Meaning: Return single float from function.

`dreturn`

Meaning: Return double float from function.

`areturn`

Meaning: Return object reference from function.

Table Jumping

In Java, complex addressing is handled using jump tables. These jump tables are accessed with index switches or key lookups.

`tableswitch`

Meaning: Access jump table by index and jump.

Description: Immediately after the tableswitch instruction, padding consisting of 0-3 zeros is inserted so that the next byte begins at an address that is a multiple of four. A series of offsets follows the padding. These offsets are signed 4-byte quantities and consist of a default offset, a low offset, and a high offset, followed by additional high - low + 1 offsets. These additional offsets are treated as a 0-based jump table.

The index the offsets point to must be an integer. If the index is less than the low offset or greater than the high offset, the default offset is added to the address of the current instruction. Otherwise, the low offset is subtracted from the index, and the element at the position index - low offset in the jump table is extracted and added to the address of the current instruction.

`lookupswitch`

Meaning: Access jump table by key match and jump.

Description: Immediately after the lookupswitch instruction, padding consisting of 0-3 zeros is inserted so that the next byte begins at an address that is a multiple of four. A series of pairs of offsets follows the padding. These offsets are signed 4-byte quantities. The first item in the pair is the default offset, and the second item in the pair gives the number of pairs that follow. The additional pairs consist of a match and an offset.

The integer key on the stack is then compared against each of the matches. If the key is equal to one of the matches, the offset is added to the address of the current instruction. If the key does not match any of the matches, the default offset is added to the address of the current instruction.

Manipulating Object Fields

Java manipulates two types of fields: dynamic and static. Whereas dynamic fields change, static fields do not. These fields are manipulated using simple get and put mechanisms.

Putting Fields

The put mechanism uses byte1 and byte2 to construct an index into the constant pool of the current class. The item at the index is a field reference to a class name and a field name, which is resolved to a field block pointer that has both the field width and the field offset. The field at the offset from the start of the object referenced is set to the value on the top of the stack.

Note

A NullPointerException is generated if the referenced object is null. For static fields, an IncompatibleClassChangeError is thrown if the specified field is a static field.

`putfield`

Meaning: Set 32-bit field in object.

`putfield`

Meaning: Set 64-bit field in object.

`putstatic`

Meaning: Set 32-bit static field in class.

`putstatic`

Meaning: Set 64-bit static field in class.

Getting Fields

The get mechanism uses byte1 and byte2 to construct an index into the constant pool of the current class. The item at the index is a field reference to a class name and a field name, which is resolved to a field block pointer that has both the field width and the field offset. The value at the offset replaces the object reference on the top of the stack.

Note

A NullPointerException is generated if the referenced object is null. For static fields, an IncompatibleClassChangeError is thrown if the specified field is a static field.

`getfield`

Meaning: Fetch 32-bit field from object.

`getfield`

Meaning: Fetch 64-bit field from object.

`getstatic`

Meaning: Get 32-bit static field from class.

`getstatic`

Meaning: Get 64-bit static field from class.

Invoking Methods

Method invocation is a complex process. To invoke a method, the operand stack must contain a reference to an object and some number of arguments. The object reference is used as a pointer to the object's method table, which contains the method signature. The method signature is guaranteed to exactly match one of the method signatures in the table. The arguments byte1 and byte2 are used to construct an index into the constant pool of the current class, which contains the complete method signature.

The result of the lookup is an index into the method table of the named class. This index is used with the referenced object's dynamic type to look in the method table of that type, where a pointer to the method block for the matched method is found. The method block indicates the type of method, such as native or synchronized, and the number of arguments expected on the operand stack.

The object reference and arguments are popped off the method's operand stack and become the initial values of the local variables of the new method. Execution continues with the first instruction of the new method.

Note

The monitor associated with the referenced object is entered if the method is marked as synchronized. A NullPointerException is thrown if the object reference on the operand stack is null. A StackOverflowError is thrown if a stack overflow is detected during the method invocation.

`invokevirtual`

Meaning: Invoke method based on runtime type.

`invokenonvirtual`

Meaning: Invoke method based on compile-time type.

`invokestatic`

Meaning: Invoke a class (static) method.

`invokeinterface`

Meaning: Invoke interface method.

Note

Unlike invokevirtual and invokenonvirtual, the method block does not indicate the number of available arguments. This number is taken from the bytecode.

Exception Handling

Because exception handling is a major feature of the Java programming language, you might be surprised to learn that exceptions are handled with a single bytecode instruction. When exceptions occur, the object thrown must be a reference to an object of the subclass Throwable, and the current Java stackframe is searched for the most recent catch clause that catches exceptions of this class or a superclass of this class. When a matching catch clause is found, the program counter is reset to the address indicated by the catch clause and execution continues from there. When no appropriate catch clause is found, that frame is popped and the object is rethrown. If a catch clause is then found, the clause will contain the location of the code for the exception, the program counter is reset to that location, and execution continues. Otherwise, the frame is popped and the object is rethrown.

Note

A NullPointerException is thrown instead if the referenced object is null.

`athrow`

Meaning: Throw exception or error.

Miscellaneous Object Operations

Several miscellaneous object operations are grouped together here.

`new`

Meaning: Create new object.

Description: byte1 and byte2 are used to construct an index into the constant pool of the current class, which must be a class name that can be resolved to a class pointer. A new instance of that class is then created, and a reference to the object is pushed on the stack.

`checkcast`

Meaning: Make sure object is of given type.

Description: Determines whether the referenced object can be cast to be a reference to an object of another class. A null object reference can be cast to any class. Otherwise, the referenced object must be an instance of the expected class or one of its superclasses. byte1 and byte2 are used to construct an index into the constant pool of the current class, which is presumed to be a class name that can be resolved to a class pointer.

Note

A ClassCastException is thrown if the referenced object cannot be cast to the expected class.

`instanceof`

Meaning: Determine if an object is of given type and return result.

Description: Determines whether the referenced object can be cast to be a reference to an object of the expected class. This instruction will overwrite the object reference with 1 if the object is an instance of the expected class or one of its superclasses. Otherwise, the object reference is overwritten by 0.

Monitors

Monitors are used to obtain exclusive access to a referenced object using a lock. Because a single thread can have multiple locks on a single object, careful checks are performed before granting an exclusive lock. Likewise, checks are done before releasing a lock. The locking and unlocking process is handled with monitors.

To lock an object, the monitor checks the object's status. When the object is not locked by another thread, an exclusive lock is obtained. When another thread already has the object locked, the current thread waits until the object is unlocked.

When the lock on the object is released, the monitor checks to see if this is the last lock that this thread has on the object. If it is, then other threads waiting for the object are allowed to gain access to and possibly lock the object.

Note

A NullPointerException is thrown instead if the referenced object is null.

`monitorenter`

Meaning: Enter monitored region of code.

`monitorexit`

Meaning: Exit monitored region of code.

Breaking Down the Class File Format

Source files are organized by object, and so are compiled source files in bytecode. When you compile source code, the Java compiler places each class in its own file. This class file represents a single object that is in turn made up of smaller objects.

By breaking down the objects used in the compiled class files, you can gain a better understanding of how the Java Virtual Machine works. For this reason, this section examines the class file format and formats for related objects, including methods, method signatures, fields, and attributes.

To ensure that Java programs are portable to any computer platform, the compiled files with the .class extension must follow a specific format. This format is known as the .class file format. Because Java interfaces are essentially abstract classes, the .class file format is also used for Java classes and Java interfaces.

At its most basic level, the .class file format is represented by streams of 8-bit bytes, which means that all 16-bit and 32-bit values are constructed by reading in two or four 8-bit bytes, respectively. As with assembly language, the byte order of 16-bit and 32-bit values is extremely important. Therefore, in order to accurately reconstruct a 16-bit or 32-bit value, bytes must be stored either in low-byte order or high-byte order.

Low-byte order places the entire 16-bit or 32-bit value in one contiguous stream. Although it seems logical to store a byte stream in low-byte order, not all computers store bytes in this manner. In fact, many computers follow the high-byte order, where the highest 8 or 16 bits are stored first.

In assembly language it is perfectly acceptable to read assembly code in high-byte order on one platform and low-byte order on another platform. However, because Java can be used across multiple platforms with disparate operating systems and architecture, it was not enough to simply use the byte order specific to the local machine. For this reason, all byte streams are stored in high-byte order.

Note

Other terms for high-byte order are network order and big-endian order. If you want to read or write files in this format, you can use the java.io.DataInput and java.io.DataOutput interfaces.

Listing 24.2 shows the top-level format of class files. C programmers may recognize this format as being similar to the structures used in C. However, unlike a C struct, each field in the structure is represented without padding or alignment, and arrays may contain elements of various sizes. The types u1, u2, and u4 represent an unsigned 1-, 2-, or 4-byte value, respectively.

Listing 24.2. The .class file structure format.

ClassFile { u4 magic; u2 minor_version; u2 major_version; u2 constant_pool_count; cp_info constant_pool[constant_pool_count - 1]; u2 access_flags; u2 this_class; u2 super_class; u2 interfaces_count; u2 interfaces[interfaces_count]; u2 fields_count; field_info fields[fields_count]; u2 methods_count; method_info methods[methods_count]; u2 attributes_count; attribute_info attributes[attribute_count]; }

All fields listed in the .class files follow a specific variable-length format. (See Listing 24.3.) The type u2 represents an unsigned 2-byte value.

Listing 24.3. Field formats.

field_info { u2 access_flags; u2 name_index; u2 signature_index; u2 attributes_count; attribute_info attributes[attribute_count]; }

All methods listed in the .class files follow a specific variable-length format as well. (See Listing 24.4.) Again, the type u2 represents an unsigned 2-byte value.

Listing 24.4. Method formats.

method_info { u2 access_flags; u2 name_index; u2 signature_index; u2 attributes_count; attribute_info attributes[attribute_count]; }

The final format for .class files pertains to attributes. All attributes have the format shown in Listing 24.5. The types u1, u2, and u4 represent an unsigned 1-, 2-, or 4-byte value, respectively.

Listing 24.5. Attribute formats.

GenericAttribute_info { u2 attribute_name; u4 attribute_length; u1 info[attribute_length]; }

Signatures

Each object in the class file has a specific signature. Signatures are strings representing a type of method, field, or array.

The signature for a field represents the value of a function's argument or the value of a variable. The following syntax structure generates the series of bytes that form the signature of a field:

<field_signature> ::= <field_type> <field_type> ::= <base_type>|<object_type>|<array_type> <base_type> ::= B|C|D|F|I|J|S|Z <object_type> ::= L<fullclassname>; <array_type> ::= [<optional_size><field_type> <optional_size> ::= [0-9]*

The base type value determines the base type of the field as a B (byte), C(character), D (double), F (float), I (integer), J (long integer), S (short integer), or Z (boolean).

The signature for return type represents the return value from a method. In the following code line, the character V indicates that the method returns no value:

<return_signature> ::= <field_type> | V

The signature for an argument represents an argument passed to a method and is represented by the following:

<argument_signature> ::= <field_type>

Finally, the signature for a method represents the arguments the method expects and the value the method returns. Method signatures are represented by

<method_signature> ::= (<arguments_signature>) <return_signature> <arguments_signature>: := <argument_signature>*

To put these rules to use, you can build a method signature for an arbitrary method. For example, let's create a method called construct() in the class your.package.constructors. This method takes four arguments, two integers, a boolean, and a two-dimensional array of characters. Then the method signature is

(II[[C)Lyour.package.constructors.constructor;

Complete method signatures are usually prefixed by the name of the method or the full package name to the class level followed by a forward slash and the name of the method. Therefore, the complete method signature for the construct() method would look like this:

your_package_constructors/constructor(II[[C)Lyour.package. Âconstructors.constructor;

Summary

The Java Virtual Machine exists only in the memory of your computer. Reproducing a machine within your computer's memory requires seven key objects: a set of registers, a stack, an execution environment, a garbage-collected heap, a constant pool, a method storage area, and a mechanism to tie it all together. This mechanism is the bytecode instruction set.

To examine bytecode, you can use the Java class file disassembler, javap. By examining bytecode instructions in detail, you gain valuable insight into the inner workings of the Java Virtual Machine and Java itself. Each bytecode instruction performs a specific function of extremely limited scope, such as pushing an object onto the stack or popping an object off the stack. Combinations of these basic functions represent the complex high-level tasks defined as statements in the Java programming language. As amazing as it seems, sometimes dozens of bytecode instructions are used to carry out the operation specified by a single Java statement. When you use these bytecode instructions with the seven key objects of the virtual machine, Java gains its platform independence and becomes the most powerful and versatile programming language in the world.

Chapter 24

The Java Virtual Machine

CONTENTS

bipush

sipush

ldc1

ldc2

ldc2w

aconst_null

iconst_m1

iconst_<C>

lconst_<LC>

fconst_<F>

dconst_<D>

iload

iload_<l>

lload

lload_<l>

fload

fload_<l>

dload

dload_<l>

aload

aload_<l>

istore

istore_<l>

lstore

lstore_<l>

fstore

fstore_<l>

dstore

dstore_<l>

astore

astore_<l>

iinc

newarray

anewarray

multianewarray

arraylength

iaload

laload

faload

daload

aaload

baload

caload

saload

iastore

lastore

fastore

dastore

aastore

bastore

castore

sastore

nop

pop

pop2

dup

dup2

dup_x1

dup2_x1

dup_x2

dup2_x2

swap

Addition

iadd

ladd

fadd

dadd

Subtraction

isub

lsub

fsub

dsub

Multiplication

imul

lmul

fmul

dmul

`bipush`

`sipush`

`ldc1`

`ldc2`

`ldc2w`

`aconst_null`

`iconst_m1`

`iconst_<C>`

`lconst_<LC>`

`fconst_<F>`

`dconst_<D>`

`iload`

`iload_<l>`

`lload`

`lload_<l>`

`fload`

`fload_<l>`

`dload`

`dload_<l>`

`aload`

`aload_<l>`

`istore`

`istore_<l>`

`lstore`

`lstore_<l>`

`fstore`

`fstore_<l>`

`dstore`

`dstore_<l>`

`astore`

`astore_<l>`

`iinc`

`newarray`

`anewarray`

`multianewarray`

`arraylength`

`iaload`

`laload`

`faload`

`daload`

`aaload`

`baload`

`caload`

`saload`

`iastore`

`lastore`

`fastore`

`dastore`

`aastore`

`bastore`

`castore`

`sastore`

`nop`

`pop`

`pop2`

`dup`

`dup2`

`dup_x1`

`dup2_x1`

`dup_x2`

`dup2_x2`

`swap`

`iadd`

`ladd`

`fadd`

`dadd`

`isub`

`lsub`

`fsub`

`dsub`

`imul`

`lmul`

`fmul`

`dmul`

`idiv`

`ldiv`

`fdiv`

`ddiv`

`irem`

`lrem`