Xudong Sun

xudongs3@illinois.edu
Google Scholar | DBLP | GitHub | LinkedIn

I'm Xudong Sun, a final (6th) year CS Ph.D. student at the University of Illinois Urbana-Champaign. My advisor is Prof. Tianyin Xu. I work on computer systems and my research primarily focuses on system correctness and reliability. I got my bachelor degree from Nanjing University in 2019.

I have interned at VMware Research (2020, 2022) and Microsoft Research (2021) working on the reliability of cloud systems.

I am on the job market! Here are my application materials:

while presenting at OSDI'24

Research

The goal of my research is to make computer systems reliable. Even critical computer systems have bugs, and the bugs come with serious consequences, e.g., outage of your favorite app, loss of your personal data. My research is about making sure such bugs never happen (again)!

I enjoy building systems, hacking systems, testing systems and writing mathematical proof for systems. My research is in the intersection of computer systems, software testing and formal methods, and I take a hybrid approach to systems reliability — I build (1) automated and systematic testing techniques to detect bugs in widely-used systems and develop a deep understanding of these systems, and (2) practical formal verification techniques to build provably correct, production-grade systems based on insights learned from testing. My research involves techniques including deductive verification, model checking, fault injection, and testing.

As a system researcher, I care about real-world systems that support everyone's daily life, and developers who struggle with bugs and failures of these systems. My research has improved the reliability of many critical systems, including Kubernetes, ZooKeeper and Hadoop, by finding bugs in them. We have presented at industry venues including KubeCon and SRECon on some of the emerging reliability challenges developers have to face, and how our research can help.

My Ph.D. research focused on the reliability of cloud systems. In particular, I have built Anvil [OSDI'24], a framework that allows developers to build practical cluster managers (e.g., Kubernetes) and formally verify high-level properties (e.g., liveness and safety) of their implementations. The verified cluster managers can be deployed in real-world clusters and manage real-world applications and services. Anvil is inspired by my previous work Sieve [OSDI'22] and Acto [SOSP'23] on testing cluster managers. I have been continuously developing and improving Anvil to make it an open platform for practitioners and researchers to develop cluster managers and verify various liveness and safety properties.

In addition to cluster managers, I have also worked on other layers and aspects of modern cloud stack, including consensus-based systems (e.g., ZooKeeper, etcd), cloud service (e.g., Azure Storage, AWS S3) integrations, and cloud system configurations.

I believe that systems research should be relevant, practical and beautiful. That's what I always aim for.

Recent news

Nov 2024: We publish an article about Sieve on USENIX ;login:.
Nov 2024: Give a talk on my past research and future work at the SOSP Doctoral Workshop 2024 (SysDW'24).
Nov 2024: Give a talk on Anvil at ByteDance.
Oct 2024: Give a talk on Anvil at the MadSystems Seminar at the University of Wisconsin-Madison.
Oct 2024: Give a talk on Anvil at Duke Systems Group's seminar.
Aug 2024: Give a talk on Anvil at Microsoft Research.
Aug 2024: Give a talk on Anvil at VMware Research.
Aug 2024: We publish an article about Acto on USENIX ;login:.
Aug 2024: IllinoisCS News writes a nice article about Anvil.
Jul 2024: Present Anvil at OSDI'24. Talk video is available.
Jul 2024: Anvil wins Jay Lepreau Best Paper Award at OSDI 2024! Congratulations to the team!
Jul 2024: We publish an article about Anvil on USENIX ;login:.

Publications

Multi-Grained Specifications for Distributed System Model Checking and Verification
EuroSys 2025
Lingzhi Ouyang, Xudong Sun, Ruize Tang, Yu Huang, Madhav Jivrajani, Xiaoxing Ma, and Tianyin Xu
PDF  |  BibTex  |  Code

Anvil: Verifying Liveness of Cluster Management Controllers
OSDI 2024
Xudong Sun, Wenjie Ma, Jiawei Tyler Gu, Zicheng Ma, Tej Chajed, Jon Howell, Andrea Lattuada, Oded Padon, Lalith Suresh, Adriana Szekeres, and Tianyin Xu
PDF  |  BibTex  |  Talk  |  Code Star
Jay Lepreau Best Paper Award
Invited to publish at USENIX ;login: (article)
Covered by IllinoisCS News

SandTable: Scalable Distributed System Model Checking with Specification-Level State Exploration
EuroSys 2024
Ruize Tang, Xudong Sun, Yu Huang, Yuyang Wei, Lingzhi Ouyang, and Xiaoxing Ma
PDF  |  BibTex  |  Code Star

Acto: Automatic End-to-End Testing for Operation Correctness of Cloud System Management
SOSP 2023
Jiawei Tyler Gu, Xudong Sun, Wentao Zhang, Yuxuan Jiang, Chen Wang, Mandana Vaziri, Owolabi Legunsen, and Tianyin Xu
PDF  |  BibTex  |  Code Star
Selected for solution showcase at KubeCon + CloudNativeCon Europe 2024
Invited to publish at USENIX ;login: (article)
Covered by IllinoisCS News

Push-Button Reliability Testing for Cloud-Backed Applications with Rainmaker
NSDI 2023
Yinfang Chen, Xudong Sun, Suman Nath, Ze Yang, and Tianyin Xu
PDF  |  BibTex  |  Talk  |  Slides  |  Code Star
Featured by The Weekend Read

Automatic Reliability Testing for Cluster Management Controllers
OSDI 2022
Xudong Sun, Wenqing Luo, Jiawei Tyler Gu, Aishwarya Ganesan, Ramnatthan Alagappan, Michael Gasch, Lalith Suresh, and Tianyin Xu
PDF  |  BibTex  |  Talk  |  Slides  |  Code Star
Selected for presentation at KubeCon + CloudNativeCon North America 2021
Invited for presentation at KBE Insider (Episode 14)
Covered by VMware Office of CTO Blog (endorsement from Kit Colbert), Hacker News, Paper Review by Micah Lerner, IllinoisCS News
Invited to publish at USENIX ;login: (article)

Reasoning about modern datacenter infrastructures using partial histories
HotOS 2021
Xudong Sun, Lalith Suresh, Aishwarya Ganesan, Ramnatthan Alagappan, Michael Gasch, Lilia Tang, and Tianyin Xu
PDF  |  BibTex  |  Talk
Covered by "A CAP tradeoff in the wild" by Professor Lindsey Kuper

Testing Configuration Changes in Context to Prevent Production Failures
OSDI 2020
Xudong Sun*, Runxiang Cheng*, Jianyan Chen, Ran Ang, Owolabi Legunsen, and Tianyin Xu (*co-primary)
PDF  |  BibTex  |  Talk  |  Slides  |  Code Star

Service

Teaching

Mentoring

Honors & Awards

How to pronounce my name