Distributed Systems

Common pattern in distributed systems research paper

  1. Define the problem
  2. Implement the solution
  3. Test the solution

Challenges in this field

  • The overhead
    • Cold-start of containers - solution: bypass the container, use sandbox instead
    • Initialization of workers/executor
  • Better scheduling
    • Exploiting data locality

Two types of papers

Each year, thousands of papers in distributed system are published. They are categorized into two types:

Prototyping

Most of the papers are in this category. It’s okay if nobody’s using it. The key point is:

  • Is your problem really a problem?
  • Do your ideas work out?

Production

Only a few papers are in this category. Certain projects composed papers only after they have gained a solid user base. These papers usually get accepted “automatically”, as their value has already been well proved.

Note that, these production systems are extremely challenging to be maintained!

  • Parsl employed full-time developers to implement the system.
  • Apache Spark started as a prototyping project, and their paper got rejected at the beginning. They then employed 3 engineers to refactor the whole system to make it production.
  • ND CCTools has senior software engineer Ben Tovar to help maintain the project.
  • TensorFlow has 300 software engineers worked 2 years on it.

Funding matters! No fund, no engineers.