Basic software engineering issues:
In the real world, a large application is typically designed by a team of programmers and other specialists
over a period of months or years. The programming team generally overlaps with neither the people who
want the application designed, nor its eventual user community. In such large projects, the “good
programming practice” you saw in CS I & II becomes even more important, on an even larger scale,
and other issues of scale start to emerge.
Requirements and specification:
- At the beginning of an application design, application designers (experienced programmers and
software engineers) sit down with the community which wants the application (which might be corporate
management, a scientific research project, people with an idea for a video game, whatever), and come up
with the behavior and interaction the software ought to provide. These requirements typically
include:
- constraints based on the laws of physics, strength and properties of materials to be used, human
reaction times, and so on.
- constraints based on the needs or wants of the user, particularly for the input/output interface, such
as time to handle a request, or rate of screen update.
- aesthetic preferences of the designers.
- requirements based on the properties of the parts to be used.
- The requirements for the application are now translated at a high level to specifications for
the software --- at first, for the whole system. The specification gives properties on the interface to the
user and to other devices, and on the consistency and timeliness of data in the system.
- The next step is typically decomposition of the problem into modules, with a specification of
the interfaces between them. The design process now proceeds, largely through the familiar process of
top-down refinement.
Top-down design:
- Top-down design uses the above step recursively: at each stage, view the current task as a set of
related tasks, and decide how the tasks will communicate. Eventually, task descriptions will change from
English language and mathematical descriptions to code, and communication and state will change from,
again, an English language description to a typed interface (this task expects two integers and computes a
Boolean) and eventually to compound types, variables, and function parameters.
- In the beginning, most of the modules will eventually become separate code or data files. Later,
while tasks will often remain in the same file, they will be distinct functions. As time goes on, however,
more and more of the pieces will simply be translated into pieces of code within a function/procedure.
- Pseudocode
exists largely to handle the translation: to provide an intermediate form for
things which are more precisely specified than an English-language description, but should not yet be
written in a programming language, either because we still haven’t decided on some features (such as the
implementation data types, how the task will be divided into subtasks, or even the language itself). We
may also decide to concentrate on the more difficult (algorithm details) or more visible (user interface)
tasks first, and leave some of the rest for later, even if we have a good idea of how to go about it.
- Appropriate use of bottom-up coding
: Nonetheless, there are times when bottom-up coding
may be appropriate. The purpose of top-down design is not so much to eliminate bottom-up coding, as to
provide better guidelines for what needs to be done, and to try to ensure that the separately-designed
bottom-up features fit together right.
- One particular situation is in the design of library modules (eventually including many classes in
object-oriented languages). Once we know/have a reasonable idea of what the data structures will look
like, we have to add type definitions, and various functions/procedures to create, access, modify, and
destroy data. These will generally be designed simplest-first.
- For any moderately complicated program, it is futile to expect that everything is doable as it was
initially designed. The process of design revision often has a bottom-up element: “At this point, I need
things to be so-and-so, so they must look like this over here ...”.
Strong typing and type checking:
- We have already alluded to the importance of getting data types, and the parameter and result types
of functions, right early in the design process. Among other things, a common set of type definitions and
function headers (in C, called “function prototypes”) is absolutely essential if several different people,
often in different locations, or working at different times, are to design a single integrated piece of
software.
- The process of declaring all variables with types, and not allowing a program to be compiled unless
every statement and expression has a type-consistent interpretation is called strong typing.
(Formally, a language is strongly typed or type-safe if it is impossible for a run-time error to be
due to a type inconsistency. Also, for completeness, note that there are languages --- mostly functional
languages such as ML --- which do not declare types, but require a unique type to be inferred for
each expression.) Pascal is (almost completely) strongly typed; K & R C was not; and ANSI C is
partially type-safe.
- Strong typing has two arguments in its favor from a software engineering point-of-view.
- First, other programmers can often make use of your variables and functions by knowing only their
types and a high-level specification. It is particularly important that the interfaces (parameters and
results) of those functions visible to other programmers be provided.
- Second, typed variables provide documentation of the programmer’s
intent, especially for record types, when the record type itself, and its fields, are given reasonable names.
Type aliasing (using, for example, “dollar” as a synonym for “integer”) can provide additional
documentation.
Modularity and interfaces.
- Once interfaces have been specified, the pieces of the project connected by the interface can be
detached as modules, and developed individually, often by different teams of programmers.
- Following this logic to its natural consequence, we would like to have the ability to compile
the modules separately, without needing the code of the other modules.
- However, any checking of expression and function legality, or generation of code to implement this,
require knowledge of the number of parameters, parameter types, and result type of externally-defined
functions visible in the module, and of the definition of compound data types.
- This is typically provided by dividing modules into two files: a definition section --- containing types
and function prototypes, and perhaps some other information; and an implementation section, containing
the actual code.
- It has become customary to label the definition section as filename.h, where .h
basically stands for “header” (and the code is in filename.c or filename.pas, for
example). This is more-or-less mandatory in C/C++, and is also used in Turbo Pascal and others.
- See example of C .h file [to be added].
- This approach also allows use of library files: the language, or the environment, can reuse standard
functions --- like file manipulation, string operations, and so on, Use of library .h files allows us
to compile modules that use library functions without having to actually import and translate the library
source code. In fact, most software manufacturers don’t provide source code for their libraries, but only
the .h files, and the object code to which the source files have been compiled.
Documentation:
- While comments and other documentation are at best helpful in a small, one-time, single-user
program, they become crucial in the situation we are considering.
- The application is (1) very large (thousands of procedures, and tens of thousands of lines of code),
(2) designed by a large number of programmers (3) over an extended period of time, where (4) the
intended users are neither the application specialists nor the programmers themselves, but expect to be
able to understand how to use the software.
- Your comments will not only help your users and their software specialists, but also your fellow
programmers --- including yourself six months from now!
- Some of the need for comments can be obviated by choice of informative names. As a trivial
example, using hours_worked and hourly_rate makes a much more comprehensible
program than using h and r.
- This is particularly helpful for names of compound types --- especially record types, for
functions/procedures, for named constants , and, in those languages that
support them, for exceptions and for classes/objects.
- Like many other rules, this can be taken to silly extremes. Almost everyone understands the purpose
of a loop index, and you can be perfectly content to use i, j, k as iteration indices. Likewise, while
using temp (or something similar) for a swap variable is informative, you don’t have to define
temp_student, temp_class, and so on, except to avoid type conflicts.
- In a very large module, particularly one with a lot of global data, it may be helpful to have as
comments tables of global variables and visible procedures, indicating their type, purpose, and other
relevant information. This can easily be made a part of the .h file.
- Large applications often require external manuals --- text and/or on-line documents in addition to
comments in the code. This allows a user or specialist to contemplate the application “off-line”. Help
facilities are often also added to very large applications, especially for novice or occasional users.
Update documentation and version control:
- In multi-programmer projects, it is customary --- and almost essential --- for each programmer who
makes changes in a project to add comments to the front of the code, documenting:
- the date and the programmer’s name
- the changes made and the reasons for the changes
- If an application is under intensive development, more than one programmer may want to make
changes to the application at the same time. Most large systems use a “source-control system” or
“version-control system” such as SCC. These allow the most recent version of the program to be read by
anybody in the group, but to be “checked out for changes” to at most one programmer at a time. It also
prevents other programmers from unknowingly using a version which is not up-to-date.
- Version-control systems also allow different rights to the application (read, execute, modify, create,
delete, list/print) to different groups of programmers, allowing one group to develop the application
without interference, but other groups to stay current with their progress.
Reuse:
- Library files are explicitly designed to be reused by multiple applications. But it may also be useful
for an individual programmer or group to develop its own “library files”, to, for example:
- Implement common data types such as stacks, queues, linked lists.
- Provide common operations such as sorts or searches.
- Provide standard features of an application area which can be used by multiple programs or modules.
- This provides another significant incentive for modularity and checked interfaces. A whole
application can rarely be reused, but pieces of it might be recurrent. By isolating these pieces in separate
modules, we can avoid recopying or re-inventing the same code. Fully defined interfaces give a means of
checking whether a module can be reused in a given situation; one will of course have to make sure that
the semantics are also correct. If it is always type-correct to use a module with a given type interface in
place of another with the same interface, the language is said to be plug compatible.
- However, we can rarely reuse a function with exactly the same parameters. Using named constants instead of hard-coding in values (e.g., a Pascal constant
declaration, or a C-preprocessor #define) allows a module to be reused with different parameters,
simply by changing the constant definitions.
- In languages other than C and Pascal, this approach can be extended to allow use of different types
(so that, for example, we could write a single stack package, independent of the type of the stack
arguments), or the use of different functions (such as a sort routine which can take either < = or > =
as an argument).
Testing and debugging:
- Compiling and producing output for typical inputs is no guarantee that the code will work correctly
in all circumstances, or continue to work well as things change.
- Testing is a process using a suite of inputs designed to reveal flaws. Among the techniques used in
testing are:
- Data-flow testing: try to achieve some combination of the following: make sure that (1) all functions
are called at least once; (2) that all branches of all conditionals (and all return sites) get taken at least
once; (3) that all assignments get executed at least once, and all assigned values are used at least once; (4)
that all uses are evaluated at least once.
- Fence-post testing: where a range of values is being used (array bounds, loop indices, etc.) see what
happens when the values (array entries, etc.) at the ends of the interval, and the values (etc.) just beyond
those, are used
- Exception and robustness testing: sometimes code is included which is never expected to be invoked,
such as (for safety) handling cases which should never arise. These exception handlers can sometimes be
tested by using illegal inputs; other times, they can only be tested inside a debugger, in which values in
the middle of a program can be changed by the user to invoke the exception handler.
- Once testing finds erroneous behavior, the error has to be corrected, typically using a debugger, but
sometimes also various compiler analyzers. It is important to document the bug and the fix.
Maintenance:
- Even if an application behaves correctly at first, changes can occur that require continued updating
of the application:
- A port to a different machine, operating system, or environment can reveal undocumented
dependences on system features.
- Changes in user requirements can require additional code to be added. Adding the code can then
reveal unexpected behavior in other modules.
- Changes in the distribution, size, or volume of inputs can cause previously unused branches in the
code to be taken, or reveal dependences on limited system resources.
- Changes to library modules (especially in the absence of good documentation for both the library and
the modules used by the application) can result in almost untraceable errors in the application.
- Documentation of the original specifications, and of the development process and decisions made in
development, are extremely helpful in fixing maintenance bugs.