- Distributed Systems: remove single point of failure by using multiple components
- Loose Coupling: avoids failure from spreading via ripple effects or brittleness
- Ack/Nak: verify results and, on failure, repeat request (e.g. protocols)
- Fail Over: verify results and, on failure, repeat request but to different component
- Do Over: set aside failed requests or out-of-bound results and retry them later
- Auto Restart: on failure of a component, it should auto-reset/restart and continue
- Leases: dead-man switch on any allocated resource of a component (including the attention of its partners a la protocol timeout)
- Event Driven: handle non-deterministic order of returned results from components
- Fault Tolerance: ignore failed requests or out-of-bound results and continue rather than generating errors/exceptions
- Backtracking: processing that expects to hit dead-ends and so backtracks to try other approaches (e.g. Prolog logic rules, parsers that employ backtracking algorithms)
- Redundant Components: issue parallel requests to multiple components and take a majority-rules vote on the result (but must know if results are deterministic or not; i.e. if more than one result can be valid then different components may return different, but still valid, results)
- Evolutionary Programming: issue parallel requests to multiple components and result is taken from the most fit component i.e. chosen by the quality of the result rather than the result itself
- Auctions: issue parallel requests to multiple components that compete on cost of service as a definition of most fit
- Neural Nets: issue parallel requests to multiple components and take a vote biased by each component's dynamically adjusted success rating (i.e. each component votes its stock).
- Fuzzy Logic: result returned by component is biased by a probability/quality rating
- Transactions: state transitions are confined to successful atomic steps
- Exceptions: asynchronous notification and response to problems where there is the ability to either continue or abort current operation.
- Game Theory: multiple conflicting rule sets are projected via min-max trees to find a balanced result
- Blackboard Systems: multiple workers on problem process as much as each is able and share common result state i.e. individual workers are not expected to produce a complete result or even any result at all. E.G. JavaSpaces
- Workflow Models: combination of state-transition models and PERT dependency models to keep track of progress at a global level given multiple parallel workers and to reset to some given state(s) if results are not converging
- Belt & Suspenders: multiple independent methods of verifying results or progress
- Mobile Agents: workers dynamically move to more appropriate environments e.g. load balancing, fail-over, seek more reliable communications, etc.
- Design by Contract: assertions to detect failure on the part of either the requestor or the worker e.g. Cleanroom techniques, Component Interfaces, Strong Types, policy driven security managers, etc.
- Pattern Matching: non-trivial matching of requestor interfaces with worker interfaces to allow more flexible and dynamic establishment of contracts e.g. KQML , JATLite
Wednesday, July 7, 1999
Catalog of Fault Mechanisms
In the previous blog entry, What Could Fault Expectant Programming Mean?, I outlined a framework abstracted out of recurring themes in a series of error detection, recovery, and avoidance mechanisms that I had compiled. Here follows that list of mechanisms (in no particular order).