On Implementing Debuggers For Interpreted Languages

A research paper for CS322, Fall 1998
by Steven Bytnar and Joshua Parker

Abstract

Languages are becoming more diverse and prevalent in end-user programs these days. This paper traces through the ideas involved in adding a debugger into an interpreter, as well as presenting language independent debugger languages that make interaction between the user and the interpreter simpler. It will also focus on the Actor foundry and how a debugger might be added to it by viewing the goals we want to achieve in such a real-time distributed system.

Introduction

Computer languages are easily becoming one of the largest areas of research for end-user customization of programs. Recent incarnations have been higher level scripting languages, such as AppleScript for the Macintosh, and Visual Basic automation in Windows. Even more recently, Java has been given an interface to communicate with the host it runs on via the Java Native Interface and remote hosts via Remote Method Invocation. These all can be used to automate repetitive steps, change the state of the program, or even be used to intercommunicate between different programs.

Definitions of terms

Debugger-language: a hopefully language neutral language meant explicitly for program control and debugging.

End-user: the user of the interpreted language/runtime system, as well as the debugger.

Language-neutral language: a language dissimilar enough from the native language such that it is distinct in its own respects.

Native-language: the language or system we are attempting to add a debugger to.

Why a debugger?

When given a scripting language, users like to experiment and find out the most useful ways to use the syntax they are presented. To raise the learning curve of these users, it is helpful to give the end-user an interface for understanding how your language interprets and executes the expressions that it is given. You will see that the debugger itself can take on a neutral language of its own, as well as evaluate expressions native to the interpreter itself.

What general direction are we going in?

Most interpreters are based on parsing expressions. We are going to investigate how to integrate some basic functions into a sample parser and proceed on to see how we can make our debugger interpret expressions also.

Goals

When you add a debugger to a language, it is good to define your requirements. The goals of any debugger should be to:

1) Parse native expressions
a) in a new variable scope.
b) in the current variable scope of the executing program.
c) in the root-level scope of the executing program.
d) in whatever alternate contexts the environment provides.
2) Set variable watchpoints. (i.e.. If a variable changes STOP execution.)
3) Create a language-neutral-debugging language.
4) Create commands in the debugger's language for doing repetitive tasks such as stepping and tracing expression evaluation.
 

Reasoning for our goals

Goal 1: If we can parse native expressions in the context of the program, or an isolated context of the debugger, we give the end-user the ability to understand expression evaluation, experiment with the language, and the ability to interactively fix bugs.

Goal 2: Watchpoints are important so that the enduser can see side-effects that might be occurring in ways they did not expect or intend.

Goal 3: If we have a debugging language that mimics the native language, then we lose: 1) reusability and 2) distinction between the debugger and native language parsers. Bugs in either interpreter should not affect development of the other part of the system. In an object-oriented parser, such as used in the JikesOS Java compiler by AlphaWorks of IBM, it would be acceptable to reuse the parser, but not to have the exact same BNF description for the language.

If the languages are too similar, the enduser will have a hard time keeping the syntax of the two languages distinct from each other. This would be especially problematic for us when we deal with allowing the user to be able to parse native expressions within a debugger command

Goal 4: By finding the lowest level tasks that we can factor our debugger commands into, we can make it so that the debugger itself is written in terms of the low-level debugger functions and the debugger's syntax.

What is already out there?

These are our web references:

           A GDB reference page. GDB has key debugging features that we feel our debugger could use:

  • The purpose of a debugger such as GDB is to allow you to see what is going on "inside" another program while it executes--or what another program was doing at the moment it crashed.
  • GDB can do four main kinds of things (plus other things in support of these) to help you catch bugs in the act:
    • Start your program, specifying anything that might affect its behavior.
    • Make your program stop on specified conditions.
    • Examine what has happened, when your program has stopped.
    • Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.

Paper references:

What are our functional requirements?

Every parser and interpreter or runtime system has functions available that we can just use to implement a debugger. These are functions we are looking for:

GetCurrentContext: ---> Context
Could also be named GetContinuationContext.
 
GetPreviousContext: ---> Context
This is so we can view variable stack frames.
 
GetRootContext: ---> Context
Assuming we have a stack based context system, we could just keep using GetPreviousContext to get the root context. The root context would be top-level definitions of functions and/or variables.
 
ExpressionEvaluateOne: Expr, Context --> Context
So we can evaluate native expressions. And so we can single step through each expression as the parser breaks it up into smaller expressions.
 
GetNextExpression: ---> Expr
After calling ExpressionEvaluateOne(...), this gives us the next expression.
 
ExpressionAsText: Expr ---> String
 

We also need functions for getting previously evaluated expressions. If this is unavailable, we can keep an expression evaluation history as expressions are executed by the interpreter. This is analogous to walking the function call stack frame of a microprocessor.

How do you approach this task?

We also need functions to print out

Given the above requirements, we have enough functionality to do the common tasks a debugger requires:

Implementing tracing...

Way 1: Blindly hacking away at an interpreter...

For this attempt, we considered the interpreters as presented in "Essentials of Programming Languages" by Friedman, Ward and Haynes. The first idea we had to extend the interpreter was to create a semi-unique name prefix or postfix attached onto the original function name.

Here is a sample of how to interactively delay evaluation of an expression. It evaluates to "(define y 'DEAD)" and is then executed.

(define settoDEAD (lambda (y)
    (let ((a (list 'define y ''DEAD ) ))
        (begin (display a) (eval a))
    )
))

(define trace
    (lambda (functionName)
        (begin
            (define (cons `traced functionName)
                    (begin (display functionName)
                           (lambda x (functionName x))
                    )
            (define functionName (cons `traced functionName))
        )
    )
)

(define untrace
    (lambda (functionName)
        (define functionName (cdddr functionName))
    )
)

The problem with this is that you wind up with the possibility of clobbering functions defined by the user in their environment. This is not acceptable.

Pros: Simple.
Cons: Destructively modifies the namespace of the native language.

Way 2: Implementing tracepoints

In our first approach at attacking this problem, we considered using the assumption that functions are stored in local and global environments.  Here we are using the idea that TracedFunctionList is an abstract data type that lets us map function names to actual finite functions.  Now, we can define functions for tracing the execution of a program by "overriding" that function with a prologue to do whatever work we want, continued by the original function.

(define trace
   (lambda (functionName)
     (let ((originalFunction = env-functionName->value))
        (begin
            (TracedFunctionList-add functionName originalFunction)
            (cell-set! (env-functionName->value)
                        (lambda x (begin (display-arguments x)
                            (originalFunction))
                        )
            )
        )
     )
   )
)

(define untrace
    (lambda (functionName)
        (cell-set! (env-functionName->value)
                   (TracedFunctionList-remove functionName)
        )
    )
)

For function application then, we have to check the
functionName list to see if a particular function/variable is being
traced. If so, print out debug info. Else, don't.
This has a high overhead rating because we are searching a list of <>
 

Way 3: A more abstract way.

Generalizing the concepts of debugging just a bit farther, we can see some special classifications for debugger functionality that we want to handle.  Let us redefine our variables so that they contain a flag value called debug:

(define-record variable-ref (var value debug))

Its special purpose would be to reflect functionality of these states:

(define dNoDebug 0)              ;  variables     procedures
(define dWatchForAssignment 1)   ;  x = 1;        f(x) = add1(x);
(define dWatchForReference 2)    ;  x = +(x,y);   f(x);

In the case of WatchForAssignment, a BREAK would occur when x is assigned to 1, as well as when the function f(x) is defined to be add1(x). Notice how this is analogous to watching for a Write to a particular place in memory.

In the case of WatchForReference, a BREAK would occur when x and y are referred to in the RHS (right hand side) of the expression. For functions, f(x) would BREAK when f is executed, but after x is evaluated such that x is not being watched in the first place. Additionally, + is a built-in expression, so we can also BREAK for that also. Notice how this is analogous to watching for a Read from a particular place in memory.

Considering time and space, this method of implementing a debugger can become costly for programs that use an excessive amount of variable and procedure names.  However, because we are implementing the debugger in the first place, we realize that the user is willing to suffer some performance loss to facilitate this.

Scheme & Way 3

If we attach an extra variable onto the variant record data type for our variables, each time we access a variable (or function) we check this "flag" to see if it is true. If it is, then we are tracing this value and we want its invocation and possibly the result also to be printed out. Variables can be seen as (lambda () value). Functions can be seen as (lambda (params) value).

In variant-case (literal (datum)... we want to check this flag and print out the value, and then return the datum.

In variant-case (application (operator operands)... we check the flag and print out the arguments to the procedure and then

An interesting analogy

Computer languages are a subset of computer architecture, and as such, we can make this abstract statement: (important ideas are bold face)

The lowest level computer language in computer architecture is probably microcode, but lets just consider assembly language. It maps Instructions (stored in Memory) to Functions. It is desirable to watch writing to pieces of memory. This is called a Watchpoint for writing. It also is desirable to watch for reading Data from memory and Instruction Execution. This classifies a data watchpoint and an instruction breakpoint: both deal with reading from where Data and Instructions are stored. Notice that a microprocessor debugger implements a user interface that is defined in terms of the Instructions of the microprocessor's syntax.

There are four areas that a debugger intrudes upon

Now, let us rewrite that first paragraph: (important parts and changes are bold face)

Let us consider computer languages as a higher level interpretation of a computer's assembly language. High level computer languages map Coded Operations in the syntax of the language (stored in Memory somewhere) to Functions. It is desirable to watch for the Assignment values to pieces of memory. This is called a Watchpoint for writing. It also is desirable to watch for variable references and function references. This classifies a variable watchpoint and an interpreter breakpoint: both deal with reading from where variables and Coded Operations are stored. Notice that an interpreter's debugger implements a user interface that is defined in terms of the Coded Operations of the interpreter's syntax.

Functionality

Parsing native expressions

Why reinvent the wheel? Using the functional requirements we defined, it is trivial to compose the functions in order to execute native expressions in the context of the debugger, the current language environment, or even the global language environment.

Parsing debugger expressions

Well, what good is a generic debugger language? By far, the best example is the language described in the Acid paper by Bell Labs. It is language-neutral and allows us to define common functions of a debugger as expressions in Acid's syntax. However, for the purpose of this paper, the idea can be seen as:

on TraceUntilWatchpoint(variable)
   while (expression != variable)
      Step()
   end while
   Break()
end TraceUntilWatchpoint

to Break
   PrintCurrentContext()
   evaluate `getInput()`
end Break

to Step 
   EvaluateNextExpression()
end Step

We wanted to explore this further, but it is beyond the scope of our ability and resources.

Remember though, a neutral language is desired so that there is no chance of colliding with the contextual definitions of the regular language. Also, different usage could help the user keep the debugger functionality and language functionality (interpreting abilities) separate.

Concurrent languages

In concurrent systems, we seek:

If we were in a concurrent system and two breaks occurred simultaneously, how would we present a user interface such that the user could discern between the debugger contexts for both breaks? The choice here is simplicity and usability.

The High Performance Debugger Forum covers concepts such as this.

The implementations details are highly specific to the system being considered, so we will attempt to briefly analyze the Actor foundry. As before, we need to be able to head-and-tail patch the expression evaluation mechanism. Because Actors are written in Java, they use class Reflection in order to extract functions from a particularly defined Java class. The methods described for "Implementing tracing..." could be applied to the function invocation method. The starting point would be to create your own actor implementation:

public class DebuggableActorImpl extends ActorImpl {....

and override the invokeMethod function in order to do pre-execution context display/modification:

  protected Object invokeMethod(String nextMethod,
                                Object[] methodArgs) throws 
    NoSuchMethodException, InvocationTargetException,
    IllegalAccessException {

so that it interacts with a static global Vector containing methodNames we want to trace. If contained in the Vector, we print out current state information for that Actor, using reflection to print out detailed information about the methodArgs. Clearly, we would have to use Reflection to compare ActorNames so that we only debug function names belonging to a certain Actor type or even Actor instance. This is inherent for a concurrent system, and something you probably would not see in a normal computer system. (Except for microcomputer emulators that have to emulate real-RAM as well as values that emulated hardware could return for memory-mapped RAM.)

For post-execution context display/modification, we could modify:

  protected void processMessage(ActorMsgRequest nextMsg) throws
Exception {

In the final case of watchpoints, it would be very hard to track watch points of Java variables or non-Actor functions. Therefore, we consider that the Actor foundry is a subset of computer languages. Set and Get for Actors deal with the ActorManager, therefore we want to create a class such as:

public class DebuggableActorManager extends ActorManager {

which overrides the functions:

  protected ActorName actorCreate(ActorImpl caller,
                                  ActorCreateRequest request) 
    throws SecurityException, RemoteCodeException {....

  public Object actorInvokeService(ActorImpl caller,
                         ServiceName serviceName, String meth,
                         Object[] serviceArgs) 
    throws ServiceNotFoundException, ServiceException
{....

Here, we must consider our variable space and function space to be the same.

Notice what has happened now.

A possible future assignment for this course thus would be to implement debugging facilities in the Actor foundry. A more advanced problem would be to implement a synchronized DebuggerShell that would be derived off of the ActorShell.

Summary and conclusion

Throughout this paper, you should have noticed that many analogies to debugging in computer architecture are similar to those of a debugger in an interpreter. We tried to analyze common ideas of computer language design and how it fits into the bigger picture of computer architecture and design. Thus, for all systems for which a debugger is implemented at a lower level, a high level debugger can be created upon the same ideals.