Introduction to File Processing

Notes 1 --- T. Marlowe, Spring 1994

References:

Overview:

Where does file processing fit?
The course has essentially five aspects (Architecture (A), Language (L) L), Operating Systems (O), Algorithms/Data Structures (G) , Databases (D)), and is organized in five or six sections, plus incidental material on C and Pascal languages.

TABLE: A Hierarchical View
Architecture issues:
Access to secondary memory, buffering between secondary storage and main memory, organization of secondary storage media, file organizations supported on various media.

Language issues:
Primitives for structured files, language support for sequential and direct (and possibly indexed sequential) files, support for databases and queries.

Operating systems issues:
[Management of multiprocess and multiuser systems.] (Asynchronous) device access, time/space/resource allocation, file/directory management, file security, data security/integrity, sequentiality, persistence, error recovery/fault tolerance.

Database issues:
File management/security, key access, query management and optimization, data security/integrity, sequentiality, persistence.

Algorithm/data structure/theory issues:
Measures of performance for devices and file organizations (complexity), determination of properties of indexed sequential files and hash-table alternatives, (inductive) definition of user-definedtree and list-like structures, inductive proofs, algorithms for access to standard file types.

2. Why bother?

3. Why pay attention to file access?
For safety:
    1. systems programming & operating systems
    2. database program development
    3. real-time, control, and complex systems
For performance:
    1. database query interpreters
    2. parallel computers
    3. compiler optimization & compiler back ends
    4. virtual reality and multimedia applications
For other reasons:
    1. fault tolerance/recovery
    2. persistence