3.2.1 Chomsky hierarchy of languages and recognizers, Type checking, type conversions, equivalence of type expressions.
Type checking
Type synthesis
Names must be declared before use.
The type of an expression is built from its subexpressions.
E.g. If f has type s → t and x type s, then f(x) has type t.
• Type inference
Determine the type from the way of usage.
Type variables (α, β…)can be used by the compiler for types which are not known when accessed first.
E.g. If f(x) is an expression, then for some α and β, f has the type α → β and x has type α.
Languages with parametric polymorphism.
Static and Dynamic Checking of Types:
• The compiler must perform static checking . This ensures that certain types of programming errors will be detected and reported.
• Some examples of static checks are as follows.
Type checks: A compiler should report an error if an operator is applied to an incompatible operand.
Flow-of-control checks: Statements that cause flow of control to leave a construct must have some place to transfer flow of control. For example, branching to nonexistent labels.
Uniqueness checks: Objects should be defined only once. This is true in many languages.
Name related checks: Sometimes, the same name must appear two or more times. For example, in Ada the name of a block must appear both at the beginning of the block and at the end.
• Type information gathered by a type checker when code is generated. For example, arithmetic operators may be different at the machine level for different types of operands.
• A symbol that can represent different operations in different contexts is said to be overloaded.
• Overloading may be accompanied by coercion of types, where a compiler supply an operator, to convert an operand into the type expected by the context.
• A distinct notion from overloading is polymorphism. The body of a polymorphic function can be executed with arguments of several types.
Specification of a Simple Type Checker:
• Checking done by the compiler is static, while if it is done at run time, it is dynamic.
• A sound type system eliminates the dynamic checking for type errors, because it allows us to determine statically that these errors cannot occur when the target program runs.
• A language is strongly typed if its compiler can guarantee that the programs it accepts will execute without type errors.
• It is important for a type checker to do something reasonable when an error is discovered.
• The compiler must report the nature and location of the error.
• It is desirable for the type checker to recover from errors, so it can check the rest of the input.
Type Conversion :
Since the representation of integers and reals is different within a computer and different machine instructions are used for operations on integers and mats, the compiler may have to first convert one of the operands of + to ensure that both operands are of same type when the addition takes place. The language definition determines when conversion is necessary.
Type Coercion:
• Sometimes absolute type equivalence is too strict.
Type equivalence vs type compatibility in Ada is as follows.
1. Types must be equivalent.
2. One type must be a subtype of another or both are subtypes of the same base type.
3. Types are arrays with the same sizes and element types in each dimension. Pascal extends slightly and also allow the following.
1. Base and sub range types are cross compatible.
2. Integers may be used where a real is expected.
• Type coercion is an implicit type conversion between compatible, but not necessarily equivalent types.
Coercions:
Conversion from one type to another is said to be implicit, if it is to be done automatically by the compiler. Implicit type conversions are also called coercion. Conversion is said to be explicit, if the programmer must write something to cause the conversion.
• Consider the evaluation of expressions shown in below table.
For i :=1 to N do x[i] := 1
Coercions of Expressions
For i :=1 to N do x[i] := 1 .0
Type Coercion Issues:
Type coercion issue are sometimes viewed as a weakening of type security.
1. Allows mixing of types without explicit indication of intent.
2. Opposite end of the spectrum are C and Fortran.
• It allow interchangeable use of numeric types.
• Fortran arithmetic can be performed on entire arrays.
• C arrays and pointers are roughly interchangeable.
• C++ adds programmer extensible coercion rules.
class ctr {
public:
ctr(int i = 0, char* x = "ctr") { n = i; strcpy(s, x); }
ctr& operator++(int) { n++; return *this; }
operator int() { return n; } // Coercion to int
operator char*() { return s; } // Coercion to char *
private:
int n; char s[64];
};
Equivalence of type expressions
Type Expressions:
• The type of a language construct will be denoted by a type expression.
• A type expression is either a basic type or is formed by applying an operator called a type constructor to other type expressions.
Type expressions can be defined as follows:
A basic type is a type expression. A special basic type, type error will signal an error during type checking. Finally, a basic type void denoting the absence of a value allows statements to be checked.
2. Since type expressions may be named, a type name is a type expression.
3. A type constructor applied to type expressions. Constructors include as follows.
(a) Arrays: If T is a type expression, then array(1, T) is a type expression denoting the type of an array with elements of type T and index set 1.
(b) Products: If T1 and T2 are type expressions, then their Cartesian product T1 x T2 is a type expression.
(c) Records: The type of a record is in a sense the product of the types of its fields. The difference between a record and a product is that the fields of a record have names. Type checking of records can be done using the type expression formed by applying the constructor record to a tuple formed from field names and their associated types.
(d) Pointers: If T is a type expression, then pointer(T) is a type expression denoting the type pointer to an object of type T.
(e) Functions: Functions take values in some domain and map value in some range. This is denoted by domain values and range values.
Type expressions may contain variables whose values are type expressions.
The System for Expressions
The above system expressions which performs the type checking for expressions.
• Note that the synthesized attribute type for E gives the type of the expression assigned by the type system for the expression generated by E.
• The function lookup returns the type of id.
Algorithm foe structural Equivalence
Equivalence of Type Expressions:
• The key issue is whether a name in type expression stands for itself or whether it is an abbreviation for another type expression.
• We should remember that the notion of type equivalence needs to be implemented in the compiler in an efficient manner.
Structural Equivalence of Type Expressions:
• Structural equivalence of two expressions are either the same basic type or are formed by applying the same constructor to structurally equivalence types.
Names for Type Expressions:
• In some languages, types can be given names.
• Let us consider the following situation.
type link =cell;
var next : link;
last : link;
p : ^cell;
q ,r : ^cell;
• When names are allowed in type expressions, two notion of equivalence of type expressions arise, depending on the treatment of names as follows.
Name equivalence: Each type name is viewed as a distinct type, so two type expressions are name equivalent if and only if they are identical.
Structural equivalence : Names are replaced by the type expressions they define, so two type expressions are structurally equivalent if they represent, two structural equivalent type expressions when all names have been substituted out.
UNIT IV: Storage Organization
Objectives: Focus on various storage allocation schemes
Storage Organization:
Storage language Issues, Storage Allocation, Storage Allocation Strategies, Scope, Access to Nonlocal Names, Parameter Passing, Dynamics Storage Allocation Techniques.
4.1 Storage Organization: Storage language Issues
Storage Organization :
The executing target program runs in its own logical address space in which each program value has a location. The management and organization of this logical address space is shared between the complier, operating system and target machine. The operating system maps the logical address into physical addresses, which are usually spread throughout memory.
Typical subdivision of run-time memory:
Run-time memory
• Run-time storage comes in blocks, where a byte is the smallest unit of addressable memory. Four bytes form a machine word, multibyte objects are stored in consecutive bytes and given the address of first byte.
• The storage layout for data objects is strongly influenced by the addressing constraints of the target machine. A character array of length 10 needs only enough bytes to hold 10 characters, a compiler may allocate 12 bytes to get alignment, leaving 2 bytes unused. This unused space due to alignment considerations is referred to as padding.
• The size of some program objects may be known at run time and may be placed in an area called static.
• The dynamic areas used to maximize the utilization of space at run time are stack and heap.
Activation records:
• Procedure calls and returns are usually managed by a run time stack called the control stack.
• Each live activation has an activation record on the control stack, with the root of the activation tree at the bottom, the activation has its record at the top of the stack.
• The contents of the activation record vary with the language being implemented.
Contents of activation record
• Temporary values such as those arising from the evaluation of expressions.
• An access link may be needed to locate data needed by the called procedure but found elsewhere.
• A control link pointing to the activation record of the caller.
• Space for the return value of the called functions, if any. Again, not all called procedures return a value and if one does, we may prefer to place that value in a register for efficiency.
• The actual parameters used by the calling procedure. These are not placed in activation record but rather in registers, when possible for greater efficiency.
Source Language Issues Procedures:
A procedure definition is a declaration that associates an identifier with a statement. The identifier is the procedure name and the statement is the procedure body.
For example, the following is the definition of procedure named read array:
procedure readarray;
var i : integer;
begin
for i : =1 to 9 do read(a[i])
end;
When a procedure name appears within an executable statement, the procedure is said to be called at that point.
Activation trees: An activation tree is used to depict the way control enters and leaves activation. An activation tree contain the following.
Each node represents an activation of a procedure.
2. The root represents the activation of the main program.
3. The node for ‘a’ is the parent of the node for ‘b’ and only if control flows from activation ‘a’ to ‘b’.
4. The node for ‘a’ is to the left of the node for ‘b’ and only if the lifetime of a occurs before the lifetime of ‘b’.
Control stack:
A control stack is used to keep track of live procedure activation. The idea is to push the node for an activation onto the control stack, as the activation begins and to pop the node when the activation ends. T
The Scope of a Declaration: A declaration is a syntactic construct that associates information with a name. Declarations may be explicit such as, var i : integer or they may be implicit. Example, any variable name starting with ‘I’ is assumed to denote an integer.
The portion of the program to which a declaration applies is called the scope of that declaration.
Binding of names: Even if each name is declared once in a program, the same name may denote different data objects at run time. "Data object" corresponds to a storage location that holds values. The term environment refers to a function that maps a name to a storage location. The term state refers to a function that maps a storage location to the value held there.
Binding of names
When an environment associates storage locations with a name ‘x’, we say that ‘x’ is bound to ‘s’. This association is referred to as ‘a’ binding of ‘x’.
Source Language Issues Procedures:
A procedure definition is a declaration that associates an identifier with a statement. The identifier is the procedure name and the statement is the procedure body.
For example, the following is the definition of procedure named read array:
procedure readarray;
var i : integer;
begin
for i : =1 to 9 do read(a[i])
end;
When a procedure name appears within an executable statement, the procedure is said to be called at that point.
Activation trees: An activation tree is used to depict the way control enters and leaves activation. An activation tree contain the following.
Each node represents an activation of a procedure.
2. The root represents the activation of the main program.
3. The node for ‘a’ is the parent of the node for ‘b’ and only if control flows from activation ‘a’ to ‘b’.
4. The node for ‘a’ is to the left of the node for ‘b’ and only if the lifetime of a occurs before the lifetime of ‘b’.
Control stack:
A control stack is used to keep track of live procedure activation. The idea is to push the node for an activation onto the control stack, as the activation begins and to pop the node when the activation ends. The contents of the control stack are related to paths to the root of the activation tree. When node ‘n’ is at the top of control stack, the stack contains the nodes along the path from n to the root.
The Scope of a Declaration: A declaration is a syntactic construct that associates information with a name. Declarations may be explicit such as, var i : integer or they may be implicit. Example, any variable name starting with ‘I’ is assumed to denote an integer.
The portion of the program to which a declaration applies is called the scope of that declaration.
Binding of names: Even if each name is declared once in a program, the same name may denote different data objects at run time. "Data object" corresponds to a storage location that holds values. The term environment refers to a function that maps a name to a storage location. The term state refers to a function that maps a storage location to the value held there.
Binding of names
When an environment associates storage locations with a name ‘x’, we say that ‘x’ is bound to ‘s’. This association is referred to as ‘a’ binding of ‘x’.
4.1.1 Storage Allocation, Storage Allocation Strategies
Storage Allocation Strategies:
The different storage allocation strategies are as follows.
1. Static allocation: Lays out storage for all data objects at compile time.
2. Stack allocation: Manages the run time storage as a stack.
3. Heap allocation: Allocates and locates storage as needed at run time from a data area known as heap.
Static Allocation:
• In static allocation, names are bound to storage as the program is compiled, so there is no need for a run time support package.
• Since the bindings do not change at run time, every time a procedure is activated its names are bound to the same storage locations.
• Therefore values of local names are retained across activation of a procedure. That is, when control returns to a procedure the values of the locals are same as when control left the last time.
• From the type of a name, the compiler decides the amount of storage for the name and decides where the activation records go. At compile time, we can fill in the addresses at which the target code can find the data it operates on.
Stack Allocation of Space :
• All compilers for languages that use procedures, functions or methods as units of user defined actions manage at least part of their run time memory as a stack.
•Each time a procedure is called, space for its local variables is pushed onto a stack and when the procedure terminates, that space is popped off the stack.
Calling sequences:
• Procedures called are implemented in what is called as calling sequence, which consists of code that allocates an activation record on the stack and enters information into its fields.
• A return sequence is similar to code to restore the state of machine, so the calling procedure can continue its execution after the call. The code in calling sequence is often divided between the calling procedure and the procedure it calls.
• When designing calling sequences and the layout of activation records, the following principles are helpful.
Values communicated between caller and call are generally placed at the beginning of the activation record, so they are as close as possible to the caller's activation record.
Division of task between caller and callee
Fixed length items are generally placed in the middle. Such items typically include the control link, the access link and the machine status fields. Items whose size may not be known early enough are placed at the end of the activation record. The most common example is dynamically sized array, where the value of one of the callee's parameters determines the length of the array. We must locate the top of stack pointer judiciously. A common approach is to point the end of fixed length fields in the activation record. Fixed length data can be accessed by fixed offsets, known to the intermediate code generator relative to the top of stack pointer.
• The calling sequence and its division between caller and callee are as follows.
The caller evaluates the actual parameters.
The caller stores a return address and the old value of top sp into the callee's activation record. The caller then increments the top sp to the respective positions.
The callee saves the register values and other status information.
The callee initializes its local data and begins execution.
A suitable, corresponding return sequence is:
1.The callee places the return value next to the parameters.
2. Use the information in the machine status field. The callee restores top sp and other registers then branches to the return address that the caller placed in the statue field.
3. Although top sp has been decremented, the caller knows where the return value is relative to the current value of top sp, the caller therefore may use that value .
Access to dynamically allocated Arrays
Variable length data on stack:
• The run-time memory management system must deal frequently with the allocation of space for objects, the sizes of which are not known at the compile time, but are local to a procedure and thus may be allocated on the stack. The reason to prefer placing object on the stack is that we avoid the expense of garbage collecting their space. The same scheme works for objects of any type, if they are local to the procedure call and have a size that depends on the parameters of the call.
• Procedure ‘p’ has three local arrays, whose sizes cannot be determined at compile time. The storage for these arrays is not part of the activation record for ‘p’. Access to the data is through two top and top sp. Here, the top marks the actual top of stack, it points the position at which the next activation record will begin.
• The second wrap is used to find local, fixed length fields of the top activation record. The code used for reposition and the top-sp can be generated at compile time in terms of sizes that will become known at run time.
Heap Allocation:
Stack allocation strategy cannot be used if either of the following is possible.
The values of local names must be retained when an activation ends.
2. A called activation outlives the caller.
Heap allocation parcels out pieces of contiguous storage, as needed for activation records or other objects. Pieces may be deallocated in any order, so over the time the heap will consist of alternate areas that are free and it use the record for an activation of procedure ‘r’ which is retained when the activation ends. Therefore, the record for the new activation q(I , 9) cannot follow that for ‘s’ physically. If the retained activation record for ‘r’ is deallocated, there will be free space in the heap between the activation records for ‘s’ and ‘q’.
Heap Allocation