COMP4403 - Compilers and Interpreters
Assignment 2
Due date: 3pm (15:00) Friday 23rd May, 2025
This is an individual assignment which involves modifying the LALR assignment 2 compiler for the PL0 language to add record types, operations on records, and procedures with function results.
Assignment Compiler Files
All sources for the assignment PL0 compiler are available as a2.zip (below). Please be sure to use the version for this assignment and not the one used for the tutorials or another assignment. There are differences (like the lexical tokens you need for this assignment are only defined in the assignment version).
· a2.zipMon 5 May 2025 18:49:52 AEST Save this .zip file and follow the instructions for setting up a compiler project in IntelliJ
· Setting up and running PL0 in IntelliJFri 14 Feb 2025 16:55:17 AEST
· Brief documentation on assignment 2 filesFri 14 Feb 2025 16:55:17 AEST
· Here is the documentation for
o Java CUP [HTML]
o JFlex [HTML]
For the most part you will not need these.
Please pay attention to course Blackboard announcements, and ensure you follow the course discussion board (https://edstem.org/) for any updates and further information on the assignment.
· Do not use imports for external packages other than those in java.util.*. Note that IntelliJ may offer the option of importing an external package to resolve an issue; please avoid accepting this option because it will often add an erroneous import that you will not need. Such imports lead to the compilation failing in the environment in which your compiler will be assessed because that environment does not include the external libraries. Please check you are not importing external libraries before submitting.
· You must only modify the files that must be submitted (see below).
· You must not modify any other files because we will be testing your implementation using the existing other files with your submitted files.
· Please do not reformat the files because we would like to just print the differences between the originals and the versions you hand in.
· Please keep the length of lines in your files below 100 characters, so that we can print them sensibly.
· Please avoid using non-standard characters, e.g. Chinese characters, including in the comments. Non-standard characters are not accepted by the Java compiler used to test your assignment and all comments should be readable by the person assessing your assignment.
· Please remove any debugging output before your assignment is submitted because debugging output will cause your program to fail our automated testing of your assignment.
· Either avoid using tabs or set your tabs stops to 4 spaces (this is the default for IntelliJ/Eclipse) so that your files will print sensibly.
Read the fine print in detail before you start! And, most important, when you have finished implementing the assignment, come back and reread the fine print again.
Overview
Records types
Records are similar to C structs (and very simple Java classes). They only have fields (no methods). A record type may be declared in a type definition. In the following declaration
type
Pair = record first: int; second: int end;
List = record value: Pair; next: List end;
Pair is a record type that contains fields first and second, both of type integer, and List is a record type that contains field value of type Pair and next of type List. One may declare a variable to be of a record type:
var
x : Pair;
list: List;
An instance of a record may be created using a new expression. For example,
x := new Pair(1,2);
list := new List(new Pair(3,4), nil);
assigns to variable x a new (dynamically allocated) record of type Pair with fields first and second initialised to values 1 and 2, respectively, and assigns to variable list a new record of type List with field value assigned to a new record (with fields first and second initialised to values 3 and 4) and field next set to nil, where nil is the special record constant for a null record.
Fields of an instance of a record may accessed and assigned appropriate values. For example:
x.first := 11;
x.second := 22;
list.next := new List(nil, nil);
list.next.value := x; // record assignment (list.next.value and x now refer to the same record)
x.first := 100;
x.second := 99;
write list.next.value.first; // writes 100
write list.next.value.second; // writes 99
After the record assignment list.next.value := x both list.next.value and x refer to the same record, and so, for example, after the assignment x.first := 100 both x.first and list.next.value.first will have the same value (100). Records of the same type may also be compared, but only for equality and inequality, so after the above assignments the comparison
if list.next.value = x then
if list.next.next != nil then
write -1
else
write 98
else
write -2;
will write 98. nil can be treated as having the same type as any defined record type, and so may be compared (for equality and inequality) with any defined record type. Two records are only equal if they refer to the same record.
Procedures with function results
Procedures may now optionally specify a result type. We use the term function to refer to a procedure with a result type. Functions can be used as part of an expression. For example, in the code
var
n: int;
// A function fact with result type of int
procedure fact(): int =
var
f: int;
begin
if n = 0 then
return 1
else
begin
n := n-1;
f := (n+1)*fact();
n := n+1; // restore n
return f
end
end;
begin
n := 5;
write fact() // writes 120
end
the function fact returns the value of factorial n, e.g., when the body of the main method is executed, it writes out the value of the factorial of 5.
Syntax Changes
The reserved keywords "record", "new", and "return" have already been added to the lexical analyser as the tokens KW_RECORD, KW_NEW and KW_RETURN and the symbol "." has been added as token PERIOD. They have also been added to the terminals definitions in PL0.cup.
The syntax for type definitions (Type) is extended with an additional alternative for record types. The syntax is given here in EBNF, but you won't be able to use the EBNF directly in the parser generator Java-CUP. (You will need to convert the productions from EBNF to BNF.)
Type → TypeIdentifier | SubrangeType | RecordType
RecordType → "record" Fields "end"
Fields → Field { ";" Field }
Field → IDENTIFIER ":" TypeIdentifier
A procedure may now optionally have a result type. We'll call a procedure with a result type a function. The new syntax for a procedure declaration follows.
ProcedureDef → ProcedureHead "=" Block ";"
ProcedureHead → "procedure" IDENTIFIER "(" ")" [ ":" TypeIdentifier ]
A function's result is returned via a return statement.
Statement → ... | "return" Condition
A reference to a field of a record can be used as an LValue either within an expression or on the left side of an assignment, and both a "new" expression and a function call can now be used as a Factor in an expression.
LValue → IDENTIFIER | LValue "." IDENTIFIER
Factor → ... | "new" TypeIdentifier "(" ExpList ")" | IDENTIFIER "(" ")"
ExpList → Condition { "," Condition }
Note that a field of a record may itself be of type record. Hence the above syntax has an LValue before the "." rather than an IDENTIFIER.
The Parser Generator Java-CUP
The parser specification for the compiler is specified in the file PL0.cup. You will need to add productions (and their associated actions) to the specification and then run the parser generator Java-CUP (manually or automatically) to generate the files CUPParser.java and CUPToken.java. Do not modify these two Java files directly (even if you think you understand them (do you really?)) - remake them from PL0.cup. You should make the compiler before you change anything just to see what forms of messages to expect. When you make the compiler (before you modify it) there will be some warning messages about the terminal symbols like ILLEGAL being declared but never used; these are to be expected at this stage. Any new warning/error messages will be significant. Beware that if there are Java-CUP errors reported, the Java files for the parser will not be generated, so check for Java-CUP errors first. There is HTML documentation for Java-CUP available from the class web page (with the assignment).
The Scanner Generator JFlex
All the lexical tokens for this version of the compiler have already been added to the lexical analyser.
The file Lexer.java is automatically generated by the scanner generator JFlex from PL0.flex; again, do not modify Lexer.java - remake Lexer.java from PL0.flex.
Both Java-CUP and JFlex are available with the assignment files on the course web page, with instructions on how to run them in IntelliJ. Java archives for Java-CUP and JFlex are part of the IntelliJ project for the assignment.
Static Semantic Restrictions
Records
The names of fields of a record must be distinct. All of the type identifiers used for fields must be defined type identifiers. A field of a record may be of any type, including a record type. In Type.java a class RecordType has already been added to represent record types within the compiler. The rule for well formedness of a record type is given below.
∀ i,j ∈ 1..n • i ≠ j ⇒ idi ≠ idj
∀ j ∈ 1..n • syms ⊢ typeof(tj) = Tj
syms ⊢ typeof(record id1: t1; id2: t2; ... idn: tn end) = RecordType([ (id1, T1), (id2, T2), ... , (idn, Tn) ])
For a new record expression new id(e1, e2, ... en ) its type identifier id must be a record type and it gives the type of the constructed record. The expressions e1, e2, ... en in the new expression must match the fields of the record in the order in which they were declared in the record type declaration. Each expression must be assignment compatible with its corresponding field:
id ∈ dom(syms)
syms(id) = TypeEntry(RecordType([ (id1, T1), (id2, T2), ... , (idn, Tn) ]))
∀ j ∈ 1..n • syms ⊢ ej : Tj
syms ⊢ new id(e1, e2, ... en ) : RecordType([ (id1, T1), (id2, T2), ... , (idn, Tn) ])
For a reference to a field of a record, e.idj, LValue expression e must have a record type, and idj must be a field of that record. The type of the field reference e.idj is a reference to the field type of idj in the record:
syms ⊢ e : RecordType([ (id1, T1), (id2, T2), ... , (idn, Tn) ])
j ∈ 1..n
syms ⊢ e.idj : ref(Tj)
Note that the type of e.idj is ref(Tj) rather than Tj so that a reference to a field of a record can be used as an LValue (i.e. it can be used the left side of an assignment).
Assignments between records are allowed, provided the left and right sides of the assignment are of the same type. Two types are considered equivalent if they are the same type identifier, or if one is a type identifier, B, that is defined as equal to another type identifier, A, i.e.,
type
A = record x: int; y: int end;
B = A;
[This makes the implementation simpler than doing structural equivalence of records.]
Comparison of records is supported for equality and inequality only, provided the types are equivalent. If T is declared as a record type (i.e., some type RecordType([ (id1, T1), (id2, T2), ... , (idn, Tn) ])) then the equality and inequality operators have the following additional types.
= : T x T → boolean
≠ : T x T → boolean
An exception here is the special constant nil which is a record value that is compatible with all record types. The constant nil has already been defined. It is of the special record type NIL_TYPE, which is compatible with any defined record type. Hence for any type identifier id,
id ∈ dom(syms)
syms(id) = TypeEntry(RecordType([ (id1, T1), (id2, T2), ... , (idn, Tn) ]))
syms ⊢ nil :RecordType([ (id1, T1), (id2, T2), ... , (idn, Tn) ])
Procedures with function results
Although the keyword "procedure" is used for both, here we'll use the term function to refer to a procedure with a result type and procedure to refer to a procedure without any result type. A function must only be called in an expression (Factor), and the type of a function call is the result type of the function being called. A (non-function) procedure must only be called in a call statement.
The name of a procedure/function must be distinct from all constants, variables, types, and other procedures/functions declared in the declaration scope (as already).
A function returns its result via a return statement of the form.
return exp
The expression exp must be assignment compatible with the result type of the function. A return statement is not valid in a non-function procedure.
Dynamic Semantics
Records
Records are dynamically allocated via a new record expression. As such, the value of a record will be an absolute (i.e. global) address: the address where the values of the fields of the record are stored. The value of a record will be nil until it has been otherwise assigned.
When one record is directly assigned to another record of the same type, such as in
rec2 := rec1
then the value (an absolute address) stored by rec2 will become the same value (an absolute address) stored by rec1. Hence, following the assignment, rec1 and rec2 will refer to the same record object in memory. Two records of the same type are equal only if their value is the same absolute address, and they are not equal otherwise.
The expression new id(e1, e2, ... en ) dynamically allocates space on the heap to store all of the fields of the record, and it stores the given values of those fields in that dynamically allocated space. Objects dynamically allocated via a new record expression have a life time that is not restricted to that of the variable to which they were allocated. For example, a new record may be created within a procedure and assigned to a local variable within that procedure. Provided that variable's value (the absolute address of the allocated record) is assigned to a variable or field that is accessible via variables global to the procedure, before the procedure exits, the object will remain accessible.
Although we dynamically allocate records via the new expression, we won't implement garbage collection of objects that are no longer accessible.
A field reference, e.g. r.x, acts like a variable of the type of the field and hence can be used on either the left or right sides of assignments. If the value of a record is nil when an attempt is made to reference one of its fields, then this is a run time error and the stack machine should be stopped with an exit code of StackMachine.NIL_RECORD.
Variables of a record type are local variables and hence are allocated on the stack just like any other local variable. The main difference from scalar variables is that the value stored in the local variable will be the absolute address of a record. The absolute address of the record (unless it is nil) can then be used to access the fields of the record.
Procedures with function results
For a procedure/function call, the statements in the body of the called procedure/function are executed.
For a function call, the expression in a return statement is evaluated and left on top of the stack after the return from the function.
If the execution of a function (but not procedure) gets to the end of the function without encountering a return statement, a run time error with a code of StackMachine.NO_RETURN occurs. A procedure implicitly returns after the statements in the body of the procedure have been executed.
Object Code
The PL0 compiler generates code for the Stack Machine. A description of the stack machine is available in StackMachineHandout.pdf. See also the file StackMachine.java (near the end) for details of the instruction set.
Dynamic allocation of records
There is an instruction, ALLOC_HEAP, which assumes that there is an integer on the top of the stack that represents the size of the object to be allocated. It pops that value from the stack and replaces it with the absolute address of a newly allocated object of that size. The stack machine does not support disposing objects or garbage collection.
If there is insufficient space then ALLOC_HEAP will fail with a "memory overflow" message. In the implementation of the stack machine there is a single array representing the whole of memory: the stack (the bottom end of the array) and the heap of dynamically allocated objects (the top end). If either pushing onto the stack reaches the lower limit of the heap, or allocating on the heap reaches the top of the stack, then there is insufficient memory and the program aborts (with the same error message in both cases).
You need to be aware that the instructions LOAD_FRAME, STORE_FRAME, LOAD_MULTI and STORE_MULTI expect an address that is an offset from the frame. pointer for the current (procedure) scope. You can use instruction TO_LOCAL to convert an absolute address into an address relative to the current frame. pointer.
Procedures with function results
The Stack Machine provides features to help implement procedure/function calls.
· The CALL instruction assumes that the top of the stack contains the address of the procedure/function to be called and the second top of stack has been set up with the static link for the call. It pops and remembers the address of the procedure/function (but leaves the static link on the stack where it is -- it becomes the first word of the new stack frame). It then pushes the frame. pointer onto the stack to form. the dynamic link, and sets the frame. pointer to the address of the (already set up) static link. Finally it pushes the return address (location of the instruction after the call) onto the stack and branches to the (saved) address of the procedure/function.
· The RETURN instruction exits from a procedure/function. It automatically deallocates all the local variables (not parameters), pops the return address into the program instruction counter (pc), restores the frame. pointer (fp) from the dynamic link, removes the dynamic and static links, and branches to the return address.
It is suggested that you implement a procedure/function call as follows. Your generated code should:
· For a function, allocate space on the stack for the function result. Note that the function result on the stack can be directly accessed from within the function by using negative offsets from the frame. pointer. (The ALLOC_STACK instruction can be used to allocate locations on the runtime stack.)
· Set up the static link -- this is already done in the current code for procedures.
· Call the procedure/function (genCall may be of use for both the previous step and this step).
Reporting runtime errors
Under exceptional circumstances, the STOP instruction can be used to stop execution of the stack machine. The top of the stack is popped to get an exit code.
Types and space
Your implementation should not assume that the space required for all types is one word. The space required for an element of any given type, which has been resolved, can be accessed using the getSpace method of Type.
Additional stack machine instructions
Additional instructions (not in the Stack Machine handout) STORE_STACK and LOAD_STACK have been added to the Stack Machine. You do not have to use these instructions, however you may find them useful. For a precise description of their behaviour, refer to StackMachine.java.
STORE_STACK: The value of the second top of the stack is stored at the (absolute) address given by the stack pointer minus one (sp-1), minus the top of the stack. The two values on the stack are popped.
LOAD_STACK: The top of the stack is replaced with the contents of the memory location whose (absolute) address is given by the stack pointer minus one (sp-1), minus the top of the stack.
Tests
Some tests are available in the test-pgm directory (in a2.zip), and can be used to help you to debug your code. All of the tests can be run together using the Test_LALR configuration. You can also individually run a test using PL0_LALR on a selected PL0 program. The test cases of the form. test-base*.pl0 are useful for regression testing, to make sure that you haven't broken any existing functionality in the compiler, and the other tests can help you find bugs in the new compiler features.
Student Misconduct
Students are reminded of the University's policy on student misconduct, including plagiarism. See the course profile and the School web page https://eecs.uq.edu.au/current-students/student-guidelines/student-conduct
You are expected to protect your files so that they are not readable by other users.
Your assignment is expected to be your own individual work and must not include code copied from other students, current or past. You are also reminded not to post your (partial) solutions to assignments to any place accessible by others (before or after the assignment deadline), including the course discussion board or emailing to other students. If you need that sort of help consult the lecturer and/or tutor. Note that course discussion board allows private posts to the instructors.
This assignment compiler is provided solely for the purposes of doing this assignment and your solutions must never be shared, especially publicly, even after completion of the course. Such publication would be considered both student misconduct and a breach of copyright.
This assessment task evaluates students' abilities, skills and knowledge without the aid of generative Artificial Intelligence (AI) or Machine Translation (MT). Students are advised that the use of AI or MT technologies to develop responses is strictly prohibited and may constitute student misconduct under the Student Code of Conduct.
Late Submission
See the course profile for details.
If there are documented medical or exceptional circumstances that will affect your ability to complete the assignment by the due date, then you can apply for an extension (up to 7 days). Extensions to non-exam assignments must be requested via my.UQ. You can find instructions on how to submit your request online.
If the assignment is submitted after the deadline, without an approved extension, a late penalty will apply. A penalty of 10% of the maximum possible mark will be deducted per 24 hours from time submission is due for up to 7 days. After 7 days, you will receive a mark of 0.
Submission
Please keep the length of lines in your files below 100 characters, so that we can print them sensibly. You should avoid using tabs or set your tabs stops to 4 spaces so that when we print them (with tab stops set to 4 spaces) they will print sensibly. Do not forget to remove any code generating debugging output and any rogue external imports before submission.
You must submit your completed assignment electronically through the assessment section of the course BlackBoard site (the BlackBoard Assessment page rather than the course web pages).
You need to submit the following list of individual files (not a .zip or any other form. of archive file) for evaluation and marking. Note that file names are case-sensitive.
· PL0.cup
· ExpNode.java
· ExpTransform.java
· StatementNode.java
· StatementTransform.java
· StatementVisitor.java
· StaticChecker.java
· CodeGenerator.java
You can submit your assignment multiple times, but only the last copy submitted will be retained for marking.
Assessment
The assignment is marked out of a total of 20 marks.
Marks will be allocated as follows:
· 5 - Syntax analysis and tree building
· 7 - Static semantics checking
· 8 - Code generation
Marks will be awarded for the correctness of the changes to each category. Readability and modular structure will also be criteria. For readability, we expect that you follow good software engineering practice, such as appropriate choices of variable names, consistent indentation, appropriate comments where needed, etc. For modularity we expect you introduce new methods where it makes sense to help structure the program and to avoid unnecessary duplication of code. Use of generic Java utility interfaces (like Set, Map, List, Queue, ...) and their implementations (like HashSet, ..., TreeMap, ..., LinkedList, ...) is encouraged. We expect you to produce well structured programs that are not unnecessarily complex, both structurally (e.g. complex control flow that is hard to follow), and in terms of execution time and space requirements, (e.g. an O(n) algorithm is preferred to an O(n2) algorithm, and a O(log n) algorithm is even better).
We will not be concerned with the quality of syntactic error recovery because the parser generator CUP takes care of that for the most part, but you must handle semantic errors appropriately, including handling the situation when there is a syntax error, i.e., your compiler should not crash because of a syntax error.
Your assignment files will be compiled in the context of the remaining assignment files and put through a sequence of tests. The total mark for this assignment will be limited by the overall success of the development in the following way:
· The program submitted does not compile: Maximum 10/20.
· The program submitted will not correctly handle any test case with the new facilities: Maximum 13/20.
You are not required to correct any bugs that may exist in the original compiler. However, we would appreciate being informed of any such bugs that you detect, so that we can correct them, or any improvements to the compiler you might like to suggest.