NSF Project Report

0956780 EAGER: Autonomic Computing Systems and Wireless Networking

Instructor

PI: Sukumar Ghosh, University of Iowa, sukumar-ghosh@uiowa.edu

Preamble This NSF supported (CNS-0956780) project addressed several algorithmic issues in the design of autonomic distributed systems, and explored its applicability in the self-management of large-scale wired and wireless networks. An autonomic system mimics the human body’s autonomic nervous system that regulates homeostatic functions without conscious intelligent control, and provides facilities for self-management in large-scale complex heterogeneous systems. In the domain of computing systems, the autonomic property can be translated into an assortment of self-management tasks, collectively known as self-* properties. The project focused on the design of self-stabilizing (a.k.a stabilizing) large-scale distributed systems and networks, that can spontaneous recover from any faulty configuration without any external intervention.

Participants

Sukumar Ghosh (Principal investigator)
Andrew D. Berns (Graduate Research Assistant)
Lincoln Lourens (Undergraduate student)

Andrew Berns received his PhD degree, and Lincoln Lourens received his BS in Computer Science.

Accomplishments and Outcomes

1. The various features of autonomic distributed systems were dissected and presented in [BG09]

2. Pipelining is a popular way of structuring large-scale distributed systems. The design of self-stabilizing pipelines was investigated. A major finding is th at linear pipelines are inherently stabilizing, therefore can spontaneously reco ver from all transient failures. Non-linear pipelines. however, require additional effort for stabilization. Specific algorithms for stabilizing non-linear pipelines were developed [BDG10].

3. The feasibility of self-stabilization was extended to large-scale overlay networks (wired and wireless), and algorithms for restoring the topology of a network from any weakly connected configuration were investigated. Many overlay networks that are currently used are not locally checkable , so the occurrence of failures may not even be detected without a global snapshot. What makes an overlay network locally checkable is indeed intriguing: a linear topology is locally checkable but a ring topology is not locally checkable. As a case study, SKIP graphs were considered, and a polylog-round algorithm for stabilizing SKIP+ graphs, a locally checkable extension of SKIP graphs, was developed. This version used the Transitive Closure Framework (TCF), which guarantees stabilization in optimal time, but has the limitation that in the worst case, the interim space requirement per node may be linear. The paper was published in SSS 2011 and it received the Best Paper Award [BGP11] in SSS 2011. The journal version of this result will appear in [BGP13]

4. Service discovery was formulated in terms of overlay networks. To overcome the limitations of the TCF algorithm, a new algorithm AVATAR was designed for the self-stabilization of overlay networks. It is based on Binary Search Trees, and is the first self-stabilizing overlay network algorithm that guarantees polylog complexity in running time as well as in space [BP13]. It introduces the novel idea of network scaffolding that will enable the technique to be extended to arbitrary overlay network topologies. In addition to self-stabilization, the algorithm addresses the fault-containment problem that singles out the common cases of single node or link failures, and expedites stabilization in such cases. Fault containment includes the add and delete operations. Attempts to add delay-tolerant networks to this framework and population protocols had limited success.

5. An optimal solution for designing stabilizing k-token rings with only (k+1) states per process was developed [BDG09].

6. The probabilistic technique for weak stabilization was demonstrated for the persistent bit protocol, as well for the leader election protocol. Subsequently, a method of containing the effect of minor failures in weakly stabilizing systems was developed [DGX09][DGX11]

7. A framework consisting of detectors, correctors, and predictors for designing self-immune systems, was proposed. Self-immunity was established as a dynamic mechanism for tolerating failures and perturbation on demand, and is likely to lead to a more cost-effective solution when perturbations exhibit certain specific properties. The cost-effectiveness may fall apart when the prediction of failures or perturbations turn out to be non-trivial. As a test case, a self-immune mobile ad-hoc network was simulated.

Products

[DGX09] Anurag Dasgupta, Sukumar Ghosh, Xin Xiao: Fault-Containment in Weakly-Stabilizing Systems. SSS 2009, 209-223

[BG09] Andrew Berns, Sukumar Ghosh: Dissecting Self-* Properties. IEEE SASO 2009: 10-19

[BDG09] Andrew Berns, Anurag Dasgupta, Sukumar Ghosh: Brief announcement: Optimal Self-stabilizing Multi-token ring: A Randomized Solution. ACM PODC 2009, pp. 302-303

[BGP10] Andrew Berns, Sukumar Ghosh, Sriram Pemmaraju: Brief announcement: A Framework for Building Self-stabilizing Overlay Networks. ACM PODC 2010, 398-399

[BDG10] Andrew Berns, Anurag Dasgupta, and Sukumar Ghosh. Stabilizing Pipelines for Streaming Applications. IPDPS 2010.

[DGX11] Anurag Dasgupta, Sukumar Ghosh, Xin Xiao, "Fault containment in weakly stabilizing systems", Theoretical Computer Science, p. 4297, vol. 412, (2011).

[BGP13] ndrew Berns, Sukumar Ghosh, Sriram Pemmaraju (6/1/13). Building Self-Stabilizing Overlay Networks with the Transitive Closure Framework. Theoretical Computer Science (to appear in Theoretical Computer Science) DOI 10.1016/j.tcs.2013.02.021.

[B13] Andrew Berns. Self-stabilizing Overlay Networks (PhD thesis). Department of Computer Science, University of Iowa, http://ir.uiowa.edu/etd/3431/

[BP13] Andrew Berns, Sriram Pemmaraju. AVATAR: A Time- and Space-Efficient Self-Stabilizing Overlay Network. To be submitted.