Records and Files
Pascal Records; C Structures and Unions (again)
- Recall the syntax of Pascal record declaration. To define a record type, student, we would
have in the type-declaration part:
student = record
field1: type1;
field2: type2;
...
end {record student};
To declare a variable cur_stud of anonymous record type, the declaration would occur in the
variable-declaration part, and first line should read:
cur_stud: record
and the rest is the same.
- To access a field in a record, we use the “dot operator”, cur_stud.name.first. If
new_stud is a pointer to type student, we would use new_stud^.name.first, etc.
- C structures have a quite similar declaration, allowing for (1) C’s custom of putting the type first,
and the name second, (2) the use of braces instead of begin-end, and (3) the possibility of
declaring the type and a number of variables in the same declaration:
struct type-name {
type1 field1;
type2 field2;
... } id1, id2, ...;
This declares a structure type type-name, and variables id1, id2, ... of that type. Omitting
the type-name results in an anonymous type; omitting the variables results in a type declaration only.
Variables of that type can be created (regardless of whether the original declared variables) using:
struct type-name id3, id4, ...;
- Unions (see below ) differ in syntax only by using the keyword
union instead of struct.
- In both Pascal and C, multiple fields of the same type can be declared on a single line. This is
particularly useful for recursive pointers.
Record Layout in Memory:
A record declaration in a language like Pascal (or a structure declaration in C) assigns a length to the
record, and an offset to each of its fields. There are two factors involved in computing the offset:
- The amount of space needed to store an element of the given type, and
- The required alignment for new fields (for example, word-alignment, byte-alignment, no alignment).
Layout for simple types:
The amount of space needed for simple types and strings is as follows:
- integer
: system-dependent, but typically one word.
- A double integer typically uses twice the space. C also has short and long
integers, which may have other system-dependent lengths. The rule is: short < integer < long <
double.
- real/float:
system-dependent, but typically two words.
- Again, there is often a notion of a double float.
- boolean: one bit.
- enumerated types: length in bits = ceiling of log-base-2 (lg) of the number of alternatives (so if there
are, for example, 10 alternatives, we would need four bits).
- char: one byte.
- string: length of the string, plus 1, bytes. In Pascal, the extra byte occurs at the start, and contains a
length; in C, it occurs at the end, and is the 0 (NULL) byte.
Type constructors, other than records or unions, work as follows:
- arrays: length of the array times the space needed for the aligned type. Need one extra byte for a
variable-length array, such as a string.
- packed arrays: length of the array times the space needed for the type, ignoring alignment. Strings
are essentially packed arrays of character.
- sets: size of the universal set, in bits. Essentially implemented as a packed array of boolean (these
boolean strings are often called characteristic functions, or bitvectors).
- pointers: always the same (usually one word, sometimes 6 bytes or longer), independent of what is
being pointed at. You may remember the notion of near pointers (near jumps) from assembler; if a
language supported those (unusual at best), they would need one byte.
- subranges: may use the size of the original type, or (more likely) be treated like an enumerated type.
- functions and files: always what is stored is a pointer, so length is as for a pointer.
Alignment:
- In most language/operating system pairs, most values are word-aligned. It is, however, typical to
byte-align strings.
- For this reason, arrays of characters in C are byte-aligned, as is the target of a character pointer,
which is by default a string.
- In Pascal, an array of character would be word-aligned; strings are “packed arrays of character” and
are byte-aligned.
- Some language/operating system pairs may use byte alignment.
Record layout:
- The fields of a record are laid out in consecutive storage, as far as alignment (typically byte or word
alignment) permits.
- The offset of each field is the number of bytes at which the field begins (so the first field starts at
offset 0).
- Embedded records (records whose fields are records, or arrays of records) are handled recursively,
and its fields have offsets with respect to the start of the embedded record.
Examples [to be added]
Union layout:
Unions in C are types which are literally set unions:
union employee {
int id;
char title [10]; /* used instead of char * */
/* to limit length */
} new_emp;
will hold at any time either an integer or a string of up to 10 characters. Since only one of these
will ever exist, space needs to be allocated to hold only the longer (in general, the longest) alternative. In
this case, the integer will take 4 bytes, while the string will take 11, so 11 bytes will be allocated for the
union.
Variant records:
Variant records consist of three parts: (1) a record part, (2) a tag field, and (3) a
union part.
employee = record
name: record { note syntax for imbedded records }
first: string [10];
... ;
end {record name};
department: string [6];
salary: double integer;
class: enum of (work, manage); { tag part }
case class of { union part }
work: (id: integer;);
manage: (title: string [10];
end {case};
end {record employee};
The layout consists of the concatenation of the layout for the three parts:
Examples [to be added]