Explains a few buzz words in Spark, External Shuffle Service (ESS), Remote Shuffle Service (RSS). And a few existing ESS, RSS solutions for cloud use cases.
Stability and the resilience of a distributed system is aways the hardest goal to achieve. But also the most important. In this post, we’ll see how to leverage some chaos monkey tools to discover vulnerable defects.
Golang tests are running good on local but becoming flaky on CI/CD pipeline? This post introduces a few possibilities could cause this and help you improve your unit tests stability.
A tutorial about how to setup a local k8s cluster on MacOS for development.
This post introduces Kubernetes scheduler from a newbie’s narrative.
The adaption of CSI in Apache Hadoop YARN.
This post introduces how to run SLS tool to measure YARN performance.
Some reading notes about how scheduling request is handled in YARN on the master branch. For Chinese reader ONLY.
This post is about some practices around opportunistic containers in yarn.
Recently we are considering to dockernize our computation slots and integrate them
into a Kubernetes like container management system. There are two network models
in docker, Host
vs Bridge
, and we have some debates on which one to adopt. This
post introduces some performance testing result to support our decision.
Yarn is stepping forward to improve cluster utilization. One interesting work is to provide the capability to oversell its resource for applications. In this post, I am exploring this feature and some other related ones to present a comprehensive view about how this will be done.