Distributed Systems

A distributed system uses multiple machines to achieve better performance, user experience and redundancy. It also brings added complexity to software development and maintenance. We have developed ways to make the most of distributed systems.

Affine Load Balancing

Web sites are built on top HTTP - a stateless protocol. Being stateless is the key to scalability as web servers don't need to remember the 99 visitors they have already served before responding to the 100th visitor.

Similar to static websites, web applications also use HTTP. To achieve the illusion of a stateful application, web servers are usually made to remember additional information about each user, or "session". In a cluster of 3 computers, for example, if the majority of the requests come from one user, only one of the servers is utilized, assuming servers in a cluster do not communicate with one another.

In order to achieve better utilization, we ensure that application servers don't need to share anything. "Share-nothing" is the key to high performing parallel systems. The concept is simple, but most application frameworks don't fully honour this principle. As a result, we have developed our own frameworks and methodologies.

AsyncD - Asynchronous Processing & Partial Transactions

Sometimes a task that requires intensive computation is issued by a simple user request, but the result is not immediately needed. For example, in a virtual server control panel, the user can request a backup simply by clicking a button, and he doesn't have to wait until the entire hard disk is copied.

We created a model called AsyncD that gracefully handles transactions partially and in parallel. It optimizes redundant requests and divides workload among multiple workers.

Distributed Dashboard Network

Another research project we're actively working on aims to break the limit of network bandwidth. In the context of real-time communication and surveillance, we have developed a prototype that streams data from various servers into a browser-based dashboard. The browser then discovers relay servers and forwards information to other dynamically spawned dashboard instances. The end result is a peer-to-peer bi-directional communication network.