Records and Files

Pascal Records; C Structures and Unions (again)
             student = record
                field1: type1;
                field2: type2;
                ...
             end {record student};
To declare a variable cur_stud of anonymous record type, the declaration would occur in the variable-declaration part, and first line should read:

             cur_stud: record
and the rest is the same.

               struct type-name {
                  type1 field1;
                  type2 field2;
                  ... } id1, id2, ...;
This declares a structure type type-name, and variables id1, id2, ... of that type. Omitting the type-name results in an anonymous type; omitting the variables results in a type declaration only. Variables of that type can be created (regardless of whether the original declared variables) using:

                  struct type-name id3, id4, ...;
Record Layout in Memory:
A record declaration in a language like Pascal (or a structure declaration in C) assigns a length to the record, and an offset to each of its fields. There are two factors involved in computing the offset:

  1. The amount of space needed to store an element of the given type, and
  2. The required alignment for new fields (for example, word-alignment, byte-alignment, no alignment).
Layout for simple types:
The amount of space needed for simple types and strings is as follows:

    1. integer: system-dependent, but typically one word.
      • A double integer typically uses twice the space. C also has short and long integers, which may have other system-dependent lengths. The rule is: short < integer < long < double.
    2. real/float: system-dependent, but typically two words.
      • Again, there is often a notion of a double float.
    3. boolean: one bit.
    4. enumerated types: length in bits = ceiling of log-base-2 (lg) of the number of alternatives (so if there are, for example, 10 alternatives, we would need four bits).
    5. char: one byte.
    6. string: length of the string, plus 1, bytes. In Pascal, the extra byte occurs at the start, and contains a length; in C, it occurs at the end, and is the 0 (NULL) byte.
Type constructors, other than records or unions, work as follows:

    1. arrays: length of the array times the space needed for the aligned type. Need one extra byte for a variable-length array, such as a string.
      • packed arrays: length of the array times the space needed for the type, ignoring alignment. Strings are essentially packed arrays of character.
    2. sets: size of the universal set, in bits. Essentially implemented as a packed array of boolean (these boolean strings are often called characteristic functions, or bitvectors).
    3. pointers: always the same (usually one word, sometimes 6 bytes or longer), independent of what is being pointed at. You may remember the notion of near pointers (near jumps) from assembler; if a language supported those (unusual at best), they would need one byte.
    4. subranges: may use the size of the original type, or (more likely) be treated like an enumerated type.
    5. functions and files: always what is stored is a pointer, so length is as for a pointer.
Alignment:
TABLE: Examples:
Record layout:
  1. The fields of a record are laid out in consecutive storage, as far as alignment (typically byte or word alignment) permits.
  2. The offset of each field is the number of bytes at which the field begins (so the first field starts at offset 0).
  3. Embedded records (records whose fields are records, or arrays of records) are handled recursively, and its fields have offsets with respect to the start of the embedded record.
Examples [to be added]
Union layout:
Unions in C are types which are literally set unions:

           union employee {
              int  id;
              char title [10];    /* used instead of char * */
                                  /* to limit length        */
              } new_emp;
will hold at any time either an integer or a string of up to 10 characters. Since only one of these will ever exist, space needs to be allocated to hold only the longer (in general, the longest) alternative. In this case, the integer will take 4 bytes, while the string will take 11, so 11 bytes will be allocated for the union.

Variant records:
Variant records consist of three parts: (1) a record part, (2) a tag field, and (3) a union part.

          employee = record
             name: record        { note syntax for imbedded records }
                first: string [10];
                ... ;
             end {record name};
             department: string [6];
             salary: double integer;
             class: enum of (work, manage);   { tag part   }
             case class of                    { union part }
                work:   (id: integer;);
                manage: (title: string [10];
             end {case};
          end {record employee};
The layout consists of the concatenation of the layout for the three parts:

Examples [to be added]