Notes 5 --- Blocking and Buffering

Idea behind buffering:
Channels:
Actually, this discussion ignores the device driver (channel or direct memory access).

What really happens:

Idea behind blocking:
Buffering has several additional advantages beyond the need to map between bytes and bits:

  1. The time spent in actual transfer of data is typically a small portion of access time, and most of the rest is essentially constant in the amount of data to be transferred.
  2. Instead of on each operation transferring a byte from the middle of some record, operations will deal with semantically meaningful units of data (e.g., a record).
  3. As with memory functions in Architecture, a certain amount of bookkeeping and other processing can occur during secondary memory accesses.
  4. If we can use two buffers for a file, one can be used in processing while the other participates in a secondary memory operation.
If records are small enough, however, access overhead will still greatly dominate transfer time, and will slow the program considerably. The solution is to view the buffer as corresponding to a (fixed) N number of records. N is called the blocking factor for the buffer. The unit of transfer is now intuitively an array of records.

Opening and closing files --- buffers:
Intuitively, thefollowing pseudocode sequence (in linked file) of actions occur when a file is opened (note some of them, particularly in C, require explicit program commands). The record fields in those tuples are used as follows:

Closing a file then has to

  1. Delete the symbol table entry.
  2. Dispose the data associated with the file handle, or mark it as invalid.
  3. Make the file handle available again.
  4. Free the space associated with the buffer.
  5. Delete the file pointer (not necessarily in this order).
The most important things are to free the buffer space, and to make the file handle available, since most systems will provide a limited number of file handles (typically 31, where this includes default files such as stdio, and I/O devices such as sensors, printers, etc.)

Choosing a blocking factor:
The key constraints on blocking factor are:

    1. record size.
    2. the amount of space available for the buffer.
    3. the size of blocks, sectors, or tracks on the secondary memory device.
    4. the file organization and the distribution of activity in the file.
    5. processing time versus I/O time.
  1. (0) and (1) are obvious, since the question of how many objects fit in a sack should take account not just of the size of the sack, but also the sizes of the objects; the value for (0) also tends to be exactly or approximately available. If record size is, say 411 bytes, and buffer space is 16384 bytes, then blocking factor can't be more than
                    floor ( 16384/411 ) = 39.
Time and blocking factor:
Most of our discussions in class on blocking factor have assumed we are dealing with a sequential file, all or most of whose records need to be processed. If we are, and we have a good estimate of the average time needed to do the processing for a record, we can use this, and the estimated access time for the secondary device, as another input to determine blocking factor for better parallelism with double buffers, where total time per block will be the maximum of the access and processing times, rather than the sum.

If, on the other hand, access is not purely sequential, the above considerations do not necessarily result in optimal blocking. For direct access files which don't respect key order, blocking factors are typically small, since a typical request will affect only one of the records in the block. The issue then becomes a tradeoff between a quick search of the file, but with expensive operations, versus a linear search of a block, but with quick operations.

For indexed sequential files, and those random access files which maintain key order, things are a little more complicated. In addition to the search tradeoff, there is an issue of locality of activity: the more likely consecutive requests are to refer to nearby records, the more payoff from a high blocking factor.

Degree of buffering:
Buffer pools:
While most programs can assign a fixed number of buffers to each file opened, a program which has a large number of open files (or wants in general to assign large numbers of buffers to one or more files), or must compete for buffer space with other programs, will not be able in general to satisfy all requests for multiple buffers.

In such cases, the operating system or the language environment may maintain a pool of available buffers (actually, chunks of buffer space), and use a policy to intercept and partially satisfy requests for additional buffers. If later, other buffers are freed, some of the buffers which could not be provided originally can be assigned to the file.