Introduction to File Processing
Notes 1 --- T. Marlowe, Spring
1994
References:
- Claybrook, File Management Techniques, Wiley, 1983.
- Deitel/Deitel, C: How to Program, Prentice-Hall, 1992.
- Deitel/Deitel, C++: How to Program, Prentice-Hall, 1994.
- Detmer, Fundamentals of Assembly Language Programming, Heath, 1990.
- Glass, UNIX for Programmers and Users, Prentice Hall, 1993.
- Kernighan & Pike, The UNIX Programming Environment, Prentice Hall, 1984.
- Kernighan & Ritchie, The ANSI C Programming Language, Prentice Hall, 1988.
- Miller, File Structures Using Pascal, Benjamin-Cummings, 1987.
- Shiva, Computer Design and Architecture, Harper Collins, 1991.
Overview:
Where does file processing fit?
The course has essentially five aspects (Architecture (A), Language (L) L), Operating Systems (O), Algorithms/Data
Structures (G) , Databases (D)), and is organized in five or six sections, plus incidental material on C and Pascal
languages.
Architecture issues:
Access to secondary memory, buffering between secondary storage and main memory, organization of
secondary storage media, file organizations supported on various media.
Language issues:
Primitives for structured files, language support for sequential and direct (and possibly indexed sequential) files, support
for databases and queries.
Operating systems issues:
[Management of multiprocess and multiuser systems.] (Asynchronous) device access, time/space/resource allocation,
file/directory management, file security, data security/integrity, sequentiality, persistence, error recovery/fault
tolerance.
Database issues:
File management/security, key access, query management and optimization, data security/integrity, sequentiality,
persistence.
Algorithm/data structure/theory issues:
Measures of performance for devices and file organizations (complexity), determination of properties of indexed
sequential files and hash-table alternatives, (inductive) definition of user-definedtree and list-like structures, inductive
proofs, algorithms for access to standard file types.
2. Why bother?
- Get some idea of hardware considerations in use of secondary memory.
- Extend knowledge and familiarity with data structures and algorithms.
- Get some idea of what systems programming involves.
- Appreciate need to address secondary memory issues in applications --- particularly databases, operating systems,
and compiler optimization.
- Write larger higher-level language programs than in CS I & II, making serious use of data structures and files.
3. Why pay attention to file access?
For safety:
- systems programming & operating systems
- database program development
- real-time, control, and complex systems
For performance:
- database query interpreters
- parallel computers
- compiler optimization & compiler back ends
- virtual reality and multimedia applications
For other reasons:
- fault tolerance/recovery
- persistence