Distributed Systems
Common pattern in distributed systems research paper
- Define the problem
- Implement the solution
- Test the solution
Challenges in this field
- The overhead
- Cold-start of containers - solution: bypass the container, use sandbox instead
- Initialization of workers/executor
- Better scheduling
- Exploiting data locality
Two types of papers
Each year, thousands of papers in distributed system are published. They are categorized into two types:
Prototyping
Most of the papers are in this category. It’s okay if nobody’s using it. The key point is:
- Is your problem really a problem?
- Do your ideas work out?
Production
Only a few papers are in this category. Certain projects composed papers only after they have gained a solid user base. These papers usually get accepted “automatically”, as their value has already been well proved.
Note that, these production systems are extremely challenging to be maintained!
- Parsl employed full-time developers to implement the system.
- Apache Spark started as a prototyping project, and their paper got rejected at the beginning. They then employed 3 engineers to refactor the whole system to make it production.
- ND CCTools has senior software engineer Ben Tovar to help maintain the project.
- TensorFlow has 300 software engineers worked 2 years on it.
Funding matters! No fund, no engineers.