Notes 2
External storage device:
- memory external to the computer
- typically not directly addressable
- a drive unit and a recording/storage medium
VOLUME
- VOLUME = physical unit of storage
- a tape, a disk, etc.
- a volume can be partitioned into multiple file systems
- multiuser system disks typically are partitioned, largely for security and fault tolerance
Basic types of files --- an intuition
- sequential (think of linked-lists and arrays --- the worst of each)
- direct (hash-tables and related approaches)
- indexed sequential (trees)
- multikey (multiply-linked lists).
- see also Notes 4
Fields of Data Records/Objects in a File:
- simple (integer, float, boolean) --- data attributes of a record (e.g., name, address, manager)
- structured (array, subrecord) --- table attributes of a record (e.g., courses taken)
- pointer
- other related records in this file (e.g., peer mentor)
- related records or tables in other files (e.g., faculty mentor, class list)
- keys
- primary --- identifier for this record (e.g., student number)
- secondary --- assist in access to this record (e.g., major, year)
- function/method pointer
- computed attributes --- things which can be inferred from other fields (e.g., number of credits taken)
- dynamic attributes --- things which depend on input values (e.g., salary from overtime hours)
- dependence on attributes of pointer targets
Types and Properties of Keys:
- primary key:
- a field storing a unique identifier for each entry in the file,
- usually related to the organization of the file
- secondary key:
- a field storing a classifier present for some entries in the file, for which (efficient) access methods exist
- possible characteristics of secondary keys:
- unique --- at most one record has any given key value for this key
- singleton --- each record has at most one key value for this key
- required --- each record has at least one key value for this key
- files may be:
- unkeyed
- keyed but unsorted
- sorted by primary key
- multikeyed
Types of query:
- (Key.value is the value of Key in the current record, key_value is a constant)
- Simple:
- Range:
- Key.value relop key_value
- Relop is usually one of (=, > , < , > =, < =, < > , in)
- Boolean:
- OR (AND (Key_i.value relop key_value_j))
- multiple terms may use same key or key value
- can have NOTs in here as well (but unlimited NOTs are a problem)
- Equational:
- Boolean plus terms of the form Key_i.value relop K_j.value
- General:
- terms can also involve
- relations of the form R (Key_1.value, Key_2.value, ...)
- and relations between key values in different records
Some software engineering issues: ( See also Notes 4 )
Why strong typing?:
- Self-documenting code
- Interfaces between modules
- Use by teams of programmers/maintainers/users across long time spans
- Intelligent use of binary files: blocking and unblocking
- Particularly important for file processing and distributed networks.