-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathpubs.html
More file actions
302 lines (287 loc) · 41.7 KB
/
pubs.html
File metadata and controls
302 lines (287 loc) · 41.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
---
layout: default
title: Publications
---
<section class="section-margin">
<div class="container">
<h2 id="publications">2025</h2>
<ul class="publications">
<li>
<a target="_blank" href="/paper/trainverify-sosp25.pdf">TrainVerify: Equivalence-Based Verification for Distributed LLM Training</a><br>
<span class="authorlist"><i><a href="https://luyunchi.github.io" class="nodec">Yunchi Lu</a>, </i><i><a href="https://www.microsoft.com/en-us/research/people/yomia" class="nodec">Youshan Miao</a>, </i><i><a href="https://naizhengtan.github.io" class="nodec">Cheng Tan</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="https://www.microsoft.com/en-us/research/people/yizhu1" class="nodec">Yi Zhu</a>, </i><i><a href="https://www.microsoft.com/en-us/research/people/zhxian" class="nodec">Xian Zhang</a>, </i><i><a href="https://www.microsoft.com/en-us/research/people/fanyang" class="nodec">Fan Yang</a><br></i></span>
<a target="_blank" href="https://sigops.org/s/conferences/sosp/2025/" class="conf"><b>SOSP 2025</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/trainverify.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/trainverify_sosp25_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/verify-llm/TrainVerify">software</a> <a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="https://arxiv.org/abs/2506.15961">arXiv</a>
</li>
<li>
<a target="_blank" href="/paper/phoenix-sosp25.pdf">Optimistic Recovery for High-Availability Software via Partial Process State Preservation</a><br>
<span class="authorlist"><i><a href="https://osdi.dev" class="nodec">Yuzhuo Jing</a>, </i><i>Yuqi Mai, </i><i>Angting Cai, </i><i><a href="https://chenyi.world" class="nodec">Yi Chen</a>, </i><i><a href="https://hwanning.netlify.app" class="nodec">Wanning He</a>, </i><i>Xiaoyang Qian, </i><i><a href="https://web.eecs.umich.edu/~pmchen" class="nodec">Peter M. Chen</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a><br></i></span>
<a target="_blank" href="https://sigops.org/s/conferences/sosp/2025/" class="conf"><b>SOSP 2025</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/phoenix.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/phoenix_sosp25_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/OrderLab/phoenix">software</a>
</li>
<li>
<a target="_blank" href="/paper/atropos-sosp25.pdf">Mitigating Application Resource Overload with Targeted Task Cancellation</a><br>
<span class="authorlist"><i><a href="https://yigonghu.github.io" class="nodec">Yigong Hu</a>, </i><i>Zeyin Zhang, </i><i>Yicheng Liu, </i><i>Yile Gu, </i><i>Shuangyu Lei, </i><i><a href="https://homes.cs.washington.edu/~baris" class="nodec">Baris Kasikci</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a><br></i></span>
<a target="_blank" href="https://sigops.org/s/conferences/sosp/2025/" class="conf"><b>SOSP 2025</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/atropos.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/atropos_sosp25_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/OrderLab/Atropos">software</a>
</li>
<li>
<a target="_blank" href="/paper/traincheck-osdi25-preprint.pdf">Training with Confidence: Catching Silent Errors in Deep Learning Training with Automated Proactive Checks</a><br>
<span class="authorlist"><i><a href="https://essoz.github.io" class="nodec">Yuxuan Jiang</a>, </i><i>Ziming Zhou, </i><i>Boyu Xu, </i><i>Beijie Liu, </i><i>Runhui Xu, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a><br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/osdi25" class="conf"><b>OSDI 2025</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/traincheck-osdi25.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/traincheck_osdi25_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/OrderLab/TrainCheck">software</a> <a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="https://www.arxiv.org/abs/2506.14813">arXiv</a><br><div class="press"><b>Coverage:</b> <a target="_blank" href="https://cse.engin.umich.edu/stories/improving-ai-models-automated-tool-detects-silent-errors-in-deep-learning-training">CSE News</a>, <a target="_blank" href="https://news.engin.umich.edu/2025/07/improving-ai-models-automated-tool-detects-silent-errors-in-deep-learning-training">Michigan Engineering News</a>, <a target="_blank" href="https://techxplore.com/news/2025-07-ai-automated-tool-silent-errors.html">Tech Xplore</a> </div>
</li>
<li>
<a target="_blank" href="/paper/t2c-osdi25-preprint.pdf">Deriving Semantic Checkers from Tests to Detect Silent Failures in Production Distributed Systems</a><br>
<span class="authorlist"><i><a href="https://www.cs.jhu.edu/~chlou/about" class="nodec">Chang Lou</a>, </i><i>Dimas Shidqi Parikesit, </i><i>Yujin Huang, </i><i>Zhewen Yang, </i><i>Senapati Diwangkara, </i><i><a href="https://osdi.dev" class="nodec">Yuzhuo Jing</a>, </i><i>Achmad Imam Kistijantoro, </i><i><a href="http://www.eecg.toronto.edu/~yuan" class="nodec">Ding Yuan</a>, </i><i><a href="https://www.microsoft.com/en-us/research/people/sumann" class="nodec">Suman Nath</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a><br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/osdi25" class="conf"><b>OSDI 2025</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/t2c-osdi25.bib">citation</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/OrderLab/T2C">software</a>
</li>
<li>
<a target="_blank" href="/paper/xinda-nsdi25-preprint.pdf">One-Size-Fits-None: Understanding and Enhancing Slow-Fault Tolerance in Modern Distributed Systems</a><br>
<span class="authorlist"><i><a href="https://ruiming-lu.github.io" class="nodec">Ruiming Lu</a>, </i><i><a href="https://luyunchi.github.io" class="nodec">Yunchi Lu</a>, </i><i><a href="https://essoz.github.io" class="nodec">Yuxuan Jiang</a>, </i><i>Guangtao Xue, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a><br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/nsdi25" class="conf"><b>NSDI 2025</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/xinda-nsdi25.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/xinda_nsdi25_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/OrderLab/xinda">software</a><br><div class="press"><b>Coverage:</b> <a target="_blank" href="https://cse.engin.umich.edu/stories/a-new-tool-to-manage-slow-faults">CSE News</a>, <a target="_blank" href="https://techxplore.com/news/2025-05-tool-faults-real-adjustment.html">Tech Xplore</a> </div>
</li>
</ul>
<h2 id="publications">2024</h2>
<ul class="publications">
<li>
<a target="_blank" href="/paper/anduril-sosp24.pdf">Efficient Reproduction of Fault-Induced Failures in Distributed Systems with Feedback-Driven Fault Injection</a><br>
<span class="authorlist"><i><a href="https://tonypan123.github.io" class="nodec">Jia Pan</a>*, </i><i><a href="https://www.cs.jhu.edu/~hwu80" class="nodec">Haoze Wu</a>*, </i><i><a href="https://www.microsoft.com/en-us/research/people/taleesat" class="nodec">Tanakorn Leesatapornwongsa</a>, </i><i><a href="https://www.microsoft.com/en-us/research/people/sumann" class="nodec">Suman Nath</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a><br></i></span>
<a target="_blank" href="https://sigops.org/s/conferences/sosp/2024" class="conf"><b>SOSP 2024</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/anduril-sosp24.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/anduril_sosp24_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/OrderLab/Anduril">software</a> *: equal contribution
</li>
<li>
<a target="_blank" href="/paper/legolas-nsdi24-preprint.pdf">Efficient Exposure of Partial Failure Bugs in Distributed Systems with Inferred Abstract States</a><br>
<span class="authorlist"><i><a href="https://www.cs.jhu.edu/~hwu80" class="nodec">Haoze Wu</a>, </i><i><a href="https://tonypan123.github.io" class="nodec">Jia Pan</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a><br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/nsdi24" class="conf"><b>NSDI 2024</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/legolas-nsdi24.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/legolas_nsdi24_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/OrderLab/Legolas">software</a>
</li>
</ul>
<h2 id="publications">2023</h2>
<ul class="publications">
<li>
<a target="_blank" href="/paper/cloudless-hotnets23.pdf">Simplifying Cloud Management with Cloudless Computing</a><br>
<span class="authorlist"><i><a href="https://yimingqiu.me" class="nodec">Yiming Qiu</a>, </i><i>Patrick Tser Jern Kon, </i><i>Jiarong Xing, </i><i>Yibo Huang, </i><i>Hongyi Liu, </i><i><a href="https://web.eecs.umich.edu/~xwangsd" class="nodec">Xinyu Wang</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="https://www.mosharaf.com" class="nodec">Mosharaf Chowdhury</a>, </i><i><a href="https://web.eecs.umich.edu/~chenang" class="nodec">Ang Chen</a><br></i></span>
<a target="_blank" href="https://conferences.sigcomm.org/hotnets/2023" class="conf"><b>HotNets 2023</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/cloudless-hotnets23.bib">citation</a>
</li>
<li>
<a target="_blank" href="/paper/pbox-sosp23.pdf">Pushing Performance Isolation Boundaries into Application with pBox</a><br>
<span class="authorlist"><i><a href="https://yigonghu.github.io" class="nodec">Yigong Hu</a>, </i><i><a href="https://gongqihuang.com" class="nodec">Gongqi Huang</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a><br></i></span>
<a target="_blank" href="https://sosp2023.mpi-sws.org" class="conf"><b>SOSP 2023</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/pbox-sosp23.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/pbox_sosp23_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/OrderLab/pBox">software</a>
</li>
<li>
<a target="_blank" href="/paper/vprof-eurosys23.pdf">Effective Performance Issue Diagnosis with Value-Assisted Cost Profiling</a><br>
<span class="authorlist"><i>Lingmei Weng, </i><i><a href="https://yigonghu.github.io" class="nodec">Yigong Hu</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="http://www.cs.columbia.edu/~nieh" class="nodec">Jason Nieh</a>, </i><i><a href="http://www.cs.columbia.edu/~junfeng" class="nodec">Junfeng Yang</a><br></i></span>
<a target="_blank" href="https://2023.eurosys.org" class="conf"><b>EuroSys 2023</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/vprof-eurosys23.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/vprof_eurosys23_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/wenglingmei/vprofAE">software</a>
</li>
</ul>
<h2 id="publications">2022</h2>
<ul class="publications">
<li>
<a target="_blank" href="/paper/orbit-osdi22.pdf">Operating System Support for Safe and Efficient Auxiliary Execution</a><br>
<span class="authorlist"><i><a href="https://osdi.dev" class="nodec">Yuzhuo Jing</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a><br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/osdi22" class="conf"><b>OSDI 2022</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/orbit-osdi22.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/orbit_osdi22_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/OrderLab/orbit">software</a>
</li>
<li>
<a target="_blank" href="/paper/oathkeeper-osdi22.pdf">Demystifying and Checking Silent Semantic Violations in Large Distributed Systems</a><br>
<span class="authorlist"><i><a href="https://www.cs.jhu.edu/~chlou/about" class="nodec">Chang Lou</a>, </i><i><a href="https://osdi.dev" class="nodec">Yuzhuo Jing</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a><br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/osdi22" class="conf"><b>OSDI 2022</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/oathkeeper-osdi22.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/oathkeeper_osdi22_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/OrderLab/OathKeeper">software</a>
</li>
<li>
<a target="_blank" href="/paper/resin-osdi22.pdf">RESIN: A Holistic Service for Dealing with Memory Leaks in Production Cloud Infrastructure</a><br>
<span class="authorlist"><i><a href="https://www.cs.jhu.edu/~chlou/about" class="nodec">Chang Lou</a>, </i><i>Cong Chen, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i>Yingnong Dang, </i><i>Si Qin, </i><i>Xinsheng Yang, </i><i>Xukun Li, </i><i>Qingwei Lin, </i><i>Murali Chintalapati<br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/osdi22" class="conf"><b>OSDI 2022</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/resin-osdi22.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/resin_osdi22_slides.pdf">slides</a>
</li>
</ul>
<h2 id="publications">2021</h2>
<ul class="publications">
<li>
<a target="_blank" href="/paper/argus-atc21.pdf">Argus: Debugging Performance Issues in Modern Desktop Applications with Annotated Causal Tracing</a> <b style="color:green">[Best Paper Award]</b><br>
<span class="authorlist"><i>Lingmei Weng, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="http://www.cs.columbia.edu/~nieh" class="nodec">Jason Nieh</a>, </i><i><a href="http://www.cs.columbia.edu/~junfeng" class="nodec">Junfeng Yang</a><br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/atc21" class="conf"><b>ATC 2021</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/argus.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/argus_atc21_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/columbia/ArgusDebugger">software</a>
</li>
<li>
<a target="_blank" href="/paper/arthas-eurosys21.pdf">Understanding and Dealing with Hard Faults in Persistent Memory Systems</a><br>
<span class="authorlist"><i><a href="https://portugasian.github.io" class="nodec">Brian Choi</a>, </i><i>Randal Burns, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a><br></i></span>
<a target="_blank" href="https://2021.eurosys.org" class="conf"><b>EuroSys 2021</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/arthas-eurosys21.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/arthas_eurosys_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/OrderLab/Arthas">software</a> <a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/paper/arthas-tech-report.pdf">tech report</a>
</li>
</ul>
<h2 id="publications">2020</h2>
<ul class="publications">
<li>
<a target="_blank" href="/paper/violet-osdi20.pdf">Automated Reasoning and Detection of Specious Configuration in Large Systems with Symbolic Execution</a><br>
<span class="authorlist"><i><a href="https://yigonghu.github.io" class="nodec">Yigong Hu</a>, </i><i><a href="https://gongqihuang.com" class="nodec">Gongqi Huang</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a><br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/osdi20" class="conf"><b>OSDI 2020</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/violet-osdi20.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/violet_osdi20_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/OrderLab/violet">software</a> <a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/paper/violet-tech-report.pdf">tech report</a>
</li>
<li>
<a target="_blank" href="/paper/narya-osdi20.pdf">Predictive and Adaptive Failure Mitigation to Avert Production Cloud VM Interruptions</a><br>
<span class="authorlist"><i>Sebastien Levy, </i><i>Randolph Yao, </i><i>Youjiang Wu, </i><i>Yingnong Dang, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i>Zheng Mu, </i><i>Pu Zhao, </i><i>Tarun Ramani, </i><i>Naga Govindaraju, </i><i>Xukun Li, </i><i>Qingwei Lin, </i><i>Gil Lapid Shafriri, </i><i>Murali Chintalapati<br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/osdi20" class="conf"><b>OSDI 2020</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/narya-osdi20.bib">citation</a> <a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/paper/narya-tech-report.pdf">tech report</a>
</li>
<li>
<a target="_blank" href="/paper/omegagen-nsdi20.pdf">Understanding, Detecting and Localizing Partial Failures in Large System Software</a> <b style="color:green">[Best Paper Award]</b><br>
<span class="authorlist"><i><a href="https://www.cs.jhu.edu/~chlou/about" class="nodec">Chang Lou</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="https://www.cs.jhu.edu/~scott" class="nodec">Scott Smith</a><br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/nsdi20" class="conf"><b>NSDI 2020</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/omegagen.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/omegagen_nsdi20_slides.pdf">slides</a><br><div class="press"><b>Coverage:</b> <a target="_blank" href="https://blog.acolyer.org/2020/03/16/omega-gen">The Morning Paper</a> </div>
</li>
<li>
<a target="_blank" href="/paper/gandalf-nsdi20.pdf">Gandalf: An Intelligent, End-To-End Analytics Service for Safe Deployment in Large-Scale Cloud Infrastructure</a><br>
<span class="authorlist"><i>Ze Li, </i><i>Qian Cheng, </i><i>Ken Hsieh, </i><i>Yingnong Dang, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i>Pankaj Singh, </i><i>Xinsheng Yang, </i><i>Qingwei Lin, </i><i>Youjiang Wu, </i><i>Sebastien Levy, </i><i>Murali Chintalapati<br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/nsdi20" class="conf"><b>NSDI 2020</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/gandalf.bib">citation</a><br><div class="press"><b>Coverage:</b> <a target="_blank" href="https://blog.acolyer.org/2020/02/28/microsoft-gandalf">The Morning Paper</a> </div>
</li>
<li>
<a target="_blank" href="/paper/sdig-aaai20-workshop.pdf">Scaling Performance Issue Detection and Diagnosis in Cloud Infrastructures</a><br>
<span class="authorlist"><i><a href="https://yigonghu.github.io" class="nodec">Yigong Hu</a>, </i><i>Ze Li, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i>Suhas Pinnamaneni, </i><i>Francis David, </i><i>Yingnong Dang, </i><i>Murali Chintalapati<br></i></span>
<a target="_blank" href="https://cloudintelligenceworkshop.org" class="conf"><b>AAAI-20 Workshop on Cloud Intelligence</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/sdig-aaai20.bib">citation</a>
</li>
</ul>
<h2 id="publications">2019</h2>
<ul class="publications">
<li>
<a target="_blank" href="/paper/watchdog-hotos19.pdf">Comprehensive and Efficient Runtime Checking in System Software through Watchdogs</a><br>
<span class="authorlist"><i><a href="https://www.cs.jhu.edu/~chlou/about" class="nodec">Chang Lou</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="https://www.cs.jhu.edu/~scott" class="nodec">Scott Smith</a><br></i></span>
<a target="_blank" href="http://hotos19.sigops.org" class="conf"><b>HotOS 2019</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/watchdog.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/watchdog_hotos19_slides.pdf">slides</a>
</li>
<li>
<a target="_blank" href="/paper/ursa-eurosys19.pdf">URSA: Hybrid Block Storage for Cloud-Scale Virtual Disks</a><br>
<span class="authorlist"><i>Huiba Li, </i><i>Yiming Zhang, </i><i>Dongsheng Li, </i><i>Zhiming Zhang, </i><i>Shengyun Liu, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i>Zheng Qin, </i><i>Kai Chen, </i><i>Yongqiang Xiong<br></i></span>
<a target="_blank" href="https://www.eurosys2019.org" class="conf"><b>EuroSys 2019</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/ursa.bib">citation</a>
</li>
<li>
<a target="_blank" href="/paper/leaseos-asplos19.pdf">A Case for Lease-Based, Utilitarian Resource Management on Mobile Devices</a> <b style="color:green">[Best Paper Award]</b><br>
<span class="authorlist"><i><a href="https://yigonghu.github.io" class="nodec">Yigong Hu</a>, </i><i><a href="https://sylll.github.io" class="nodec">Suyi Liu</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a><br></i></span>
<a target="_blank" href="https://asplos-conference.org" class="conf"><b>ASPLOS 2019</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/leaseos.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/leaseos_asplos19_slides.pptx">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/OrderLab/leaseos_frameworks_base">software</a><br><div class="press"><b>Coverage:</b> <a target="_blank" href="https://blog.acolyer.org/2019/05/31/lease-os">The Morning Paper</a> </div>
</li>
<li>
<a target="_blank" href="/paper/aiops-icse19-briefing.pdf">AIOps: Real-World Challenges and Research Innovations</a><br>
<span class="authorlist"><i>Yingnong Dang, </i><i>Qingwei Lin, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a><br></i></span>
<a target="_blank" href="https://2019.icse-conferences.org/info/technical-briefings" class="conf"><b>ICSE 2019 Technical Briefings</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/aiops.bib">citation</a>
</li>
</ul>
<h2 id="publications">2018</h2>
<ul class="publications">
<li>
<a target="_blank" href="/paper/panorama-osdi18.pdf">Capturing and Enhancing In Situ System Observability for Failure Detection</a><br>
<span class="authorlist"><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="https://www.microsoft.com/en-us/research/people/chguo" class="nodec">Chuanxiong Guo</a>, </i><i><a href="https://www.microsoft.com/en-us/research/people/lorch" class="nodec">Jacob R. Lorch</a>, </i><i><a href="https://www.microsoft.com/en-us/research/people/lidongz" class="nodec">Lidong Zhou</a>, </i><i>Yingnong Dang<br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/osdi18" class="conf"><b>OSDI 2018</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/panorama.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/panorama_osdi18_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://github.com/ryanphuang/panorama">software</a><br><div class="press"><b>Coverage:</b> <a target="_blank" href="https://blog.acolyer.org/2018/10/15/capturing-and-enhancing-in-situ-system-observability-for-failure-detection">The Morning Paper</a>, <a target="_blank" href="https://blog.csdn.net/TiDB_PingCAP/article/details/84388408">CSDN</a> </div>
</li>
<li>
<a target="_blank" href="https://cs.unc.edu/~csturton/papers/zhang2018MICRO.pdf">End-to-End Automated Exploit Generation for Validating the Security of Processor Designs</a> <b style="color:green">[Best Paper Candidate]</b><br>
<span class="authorlist"><i><a href="https://cs.unc.edu/~rzhang" class="nodec">Rui Zhang</a>, </i><i><a href="http://cs.unc.edu/~cd" class="nodec">Calvin Deutschbein</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="https://cs.unc.edu/~csturton" class="nodec">Cynthia Sturton</a><br></i></span>
<a target="_blank" href="https://www.microarch.org/micro51" class="conf"><b>MICRO 2018</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/coppelia.bib">citation</a>
</li>
<li>
<a target="_blank" href="/paper/tersecades-atc18.pdf">TerseCades: Efficient Data Compression in Stream Processing</a><br>
<span class="authorlist"><i><a href="http://www.cs.toronto.edu/~pekhimenko" class="nodec">Gennady Pekhimenko</a>, </i><i><a href="https://www.microsoft.com/en-us/research/people/chguo" class="nodec">Chuanxiong Guo</a>, </i><i><a href="https://sites.google.com/site/myeongjae" class="nodec">Myeongjae Jeon</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="https://www.microsoft.com/en-us/research/people/lidongz" class="nodec">Lidong Zhou</a><br></i></span>
<a target="_blank" href="" class="conf"><b>USENIX ATC 2018</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/tersecades.bib">citation</a>
</li>
</ul>
<h2 id="publications">2017</h2>
<ul class="publications">
<li>
<a target="_blank" href="/paper/grayfailure-hotos17.pdf">Gray Failure: The Achilles’ Heel of Cloud-Scale Systems</a><br>
<span class="authorlist"><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="https://www.microsoft.com/en-us/research/people/chguo" class="nodec">Chuanxiong Guo</a>, </i><i><a href="https://www.microsoft.com/en-us/research/people/lidongz" class="nodec">Lidong Zhou</a>, </i><i><a href="https://www.microsoft.com/en-us/research/people/lorch" class="nodec">Jacob R. Lorch</a>, </i><i>Yingnong Dang, </i><i>Murali Chintalapati, </i><i>Randolph Yao<br></i></span>
<a target="_blank" href="https://www.sigops.org/hotos/hotos17" class="conf"><b>HotOS 2017</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/grayfailure.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/grayfailure_hotos17_slides.pdf">slides</a><br><div class="press"><b>Coverage:</b> <a target="_blank" href="https://blog.acolyer.org/2017/06/15/gray-failure-the-achilles-heel-of-cloud-scale-systems">The morning paper</a>, <a target="_blank" href="http://www.zdnet.com/article/how-clouds-fail">ZDNet</a>, <a target="_blank" href="http://storagemojo.com/2017/07/24/how-high-redundancy-can-hurt-availability">StorageMojo</a>, <a target="_blank" href="https://news.ycombinator.com/item?id=16253405">Hacker News</a> </div>
</li>
</ul>
<h2 id="publications">2016</h2>
<ul class="publications">
<li>
<a target="_blank" href="http://opera.ucsd.edu//paper/osdi16-pcheck.pdf">Early Detection of Configuration Errors to Reduce Failure Damage</a> <b style="color:green">[Best Paper Award]</b><br>
<span class="authorlist"><i><a href="http://cseweb.ucsd.edu/~tixu" class="nodec">Tianyin Xu</a>, </i><i><a href="http://cseweb.ucsd.edu/~x7jin" class="nodec">Xinxin Jin</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="http://cseweb.ucsd.edu/~yyzhou" class="nodec">Yuanyuan Zhou</a>, </i><i><a href="http://people.cs.uchicago.edu/~shanlu" class="nodec">Shan Lu</a>, </i><i>Long Jin, </i><i>Shankar Pasupathy<br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/osdi16" class="conf"><b>OSDI 2016</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/pcheck.bib">citation</a>
</li>
<li>
<a target="_blank" href="/paper/defdroid-mobisys16.pdf">DefDroid: Towards a More Defensive Mobile OS Against Disruptive App Behavior</a><br>
<span class="authorlist"><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="http://cseweb.ucsd.edu/~tixu" class="nodec">Tianyin Xu</a>, </i><i><a href="http://cseweb.ucsd.edu/~x7jin" class="nodec">Xinxin Jin</a>, </i><i><a href="http://cseweb.ucsd.edu/~yyzhou" class="nodec">Yuanyuan Zhou</a><br></i></span>
<a target="_blank" href="http://www.sigmobile.org/mobisys/2016" class="conf"><b>MobiSys 2016</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/defdroid.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/defdroid_mobisys16_slides.pdf">slides</a>
<a target="_blank" class="btn btn-outline-primary publinkitem" href="https://youtu.be/lguUoitv80U">video</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="https://defdroid.github.io">website</a>
</li>
<li>
<a target="_blank" href="/paper/nchecker-eurosys16.pdf">Saving Mobile App Developers from Network Disruptions</a><br>
<span class="authorlist"><i><a href="http://cseweb.ucsd.edu/~x7jin" class="nodec">Xinxin Jin</a>, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="http://cseweb.ucsd.edu/~tixu" class="nodec">Tianyin Xu</a>, </i><i><a href="http://cseweb.ucsd.edu/~yyzhou" class="nodec">Yuanyuan Zhou</a><br></i></span>
<a target="_blank" href="http://eurosys16.doc.ic.ac.uk" class="conf"><b>EuroSys 2016</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/nchecker.bib">citation</a>
</li>
</ul>
<h2 id="publications">2015</h2>
<ul class="publications">
<li>
<a target="_blank" href="/paper/confvalley-eurosys15.pdf">ConfValley: A Systematic Configuration Validation Framework for Cloud Services</a><br>
<span class="authorlist"><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i>Bill Bolosky, </i><i>Abhishek Singh, </i><i><a href="http://cseweb.ucsd.edu/~yyzhou" class="nodec">Yuanyuan Zhou</a><br></i></span>
<a target="_blank" href="http://eurosys2015.labri.fr" class="conf"><b>EuroSys 2015</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/confvalley.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/confvalley_eurosys15_slides.pdf">slides</a>
</li>
<li>
<a target="_blank" href="/paper/perfanalysis_techreport.pdf">Experience in Building a Comparative Performance Analysis Engine for a Commercial System</a><br>
<span class="authorlist"><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i>Craig Schechter, </i><i>Vincent Chen, </i><i>Steven Hill, </i><i>Dongcai Shen, </i><i><a href="http://cseweb.ucsd.edu/~yyzhou" class="nodec">Yuanyuan Zhou</a>, </i><i><a href="http://cseweb.ucsd.edu/~saul" class="nodec">Lawrence K. Saul</a><br></i></span>
<i>UC San Diego Technical Report CS2015-1014</i><br> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/perfanalysis.bib">citation</a>
</li>
</ul>
<h2 id="publications">2014</h2>
<ul class="publications">
<li>
Why Does a Cloud-Scale Service Fail Despite Fault-Tolerance?<br>
<span class="authorlist"><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="http://cseweb.ucsd.edu/~x7jin" class="nodec">Xinxin Jin</a>, </i><i>Bill Bolosky, </i><i><a href="http://cseweb.ucsd.edu/~yyzhou" class="nodec">Yuanyuan Zhou</a><br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/osdi14" class="conf"><b>OSDI 2014</b></a><span style="color:green">*: Retracted for confidentiality reasons</span>
</li>
<li>
<a target="_blank" href="/paper/perfscope-icse14.pdf">Performance Regression Testing Target Prioritization via Performance Risk Analysis</a><br>
<span class="authorlist"><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i>Xiao Ma, </i><i>Dongcai Shen, </i><i><a href="http://cseweb.ucsd.edu/~yyzhou" class="nodec">Yuanyuan Zhou</a><br></i></span>
<a target="_blank" href="http://2014.icse-conferences.org" class="conf"><b>ICSE 2014</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/perfscope.bib">citation</a>
<a target="_blank" role="button" class="btn btn-outline-primary publinkitem" href="/slides/perfscope_icse14_slides.pdf">slides</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="http://ryanphuang.github.io/perfscope">software</a>
</li>
</ul>
<h2 id="publications">2013</h2>
<ul class="publications">
<li>
<a target="_blank" href="http://cseweb.ucsd.edu/~tixu/papers/sosp13.pdf">Do Not Blame Users for Misconfigurations</a><br>
<span class="authorlist"><i><a href="http://cseweb.ucsd.edu/~tixu" class="nodec">Tianyin Xu</a>, </i><i>Jiaqi Zhang, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i>Jing Zheng, </i><i>Tianwei Sheng, </i><i><a href="http://www.eecg.toronto.edu/~yuan" class="nodec">Ding Yuan</a>, </i><i><a href="http://cseweb.ucsd.edu/~yyzhou" class="nodec">Yuanyuan Zhou</a>, </i><i>Shankar Pasupathy<br></i></span>
<a target="_blank" href="http://sigops.org/sosp/sosp13" class="conf"><b>SOSP 2013</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/spex.bib">citation</a>
</li>
<li>
<a target="_blank" href="/paper/edoctor-nsdi13.pdf">eDoctor: Automatically Diagnosing Abnormal Battery Drain Issues on Smartphones</a><br>
<span class="authorlist"><i>Xiao Ma, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i><a href="http://cseweb.ucsd.edu/~x7jin" class="nodec">Xinxin Jin</a>, </i><i>Pei Wang, </i><i>Soyeon Park, </i><i><a href="http://cseweb.ucsd.edu/~yyzhou" class="nodec">Yuanyuan Zhou</a>, </i><i><a href="http://cseweb.ucsd.edu/~saul" class="nodec">Lawrence K. Saul</a>, </i><i><a href="http://www.cs.ucsd.edu/~voelker" class="nodec">Geoffrey M. Voelker</a><br></i></span>
<a target="_blank" href="https://www.usenix.org/conference/nsdi13" class="conf"><b>NSDI 2013</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/edoctor.bib">citation</a>
</li>
<li>
<a target="_blank" href="http://people.cs.uchicago.edu/~ravenben/publications/pdf/latent-tweb13.pdf">Understanding Latent Interactions in Online Social Networks</a><br>
<span class="authorlist"><i>Jing Jiang, </i><i><a href="http://www.ccs.neu.edu/home/cbw" class="nodec">Christo Wilson</a>, </i><i>Xiao Wang, </i><i>Wenpeng Sha, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i>Yafei Dai, </i><i><a href="http://www.cs.ucsb.edu/~ravenben" class="nodec">Ben Y. Zhao</a><br></i></span>
<b>TWEB 7(4), Oct. 2013</b> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/latent-tweb13.bib">citation</a>
</li>
</ul>
<h2 id="publications">2012</h2>
<ul class="publications">
<li>
<a target="_blank" href="http://opera.ucsd.edu//paper/osdi12-errlog.pdf">Be Conservative: Enhancing Failure Diagnosis with Proactive Logging</a><br>
<span class="authorlist"><i><a href="http://www.eecg.toronto.edu/~yuan" class="nodec">Ding Yuan</a>, </i><i>Soyeon Park, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i>Yang Liu, </i><i>Michael M. Lee, </i><i>Xiaoming Tang, </i><i><a href="http://cseweb.ucsd.edu/~yyzhou" class="nodec">Yuanyuan Zhou</a>, </i><i><a href="http://cseweb.ucsd.edu/~savage" class="nodec">Stefan Savage</a><br></i></span>
<a target="_blank" href="http://www.usenix.org/events/osdi12" class="conf"><b>OSDI 2012</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/errlog.bib">citation</a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="http://opera.ucsd.edu/errlog.htm">dataset</a>
</li>
</ul>
<h2 id="publications">2010</h2>
<ul class="publications">
<li>
<a target="_blank" href="http://conferences.sigcomm.org/imc/2010/papers/p369.pdf">Understanding Latent Interactions in Online Social Networks</a><br>
<span class="authorlist"><i>Jing Jiang, </i><i><a href="http://www.ccs.neu.edu/home/cbw" class="nodec">Christo Wilson</a>, </i><i>Xiao Wang, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i>Wenpeng Sha, </i><i>Yafei Dai, </i><i><a href="http://www.cs.ucsb.edu/~ravenben" class="nodec">Ben Y. Zhao</a><br></i></span>
<a target="_blank" href="http://conferences.sigcomm.org/imc/2010" class="conf"><b>IMC 2010</b></a> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/osn.bib">citation</a>
</li>
<li>
<a target="_blank" href="http://link.springer.com/content/pdf/10.1007%2Fs11432-010-4087-5.pdf">A Multiple User Sharing Behaviors Based Approach for Fake File Detection in P2P Environments</a><br>
<span class="authorlist"><i>Jing Jiang, </i><i>Yongjun Li, </i><i>Qinyuan Feng, </i><i><a href="https://web.eecs.umich.edu/~ryanph" class="nodec">Peng Huang</a>, </i><i>Yafei Dai<br></i></span>
<b>SCIS 53(11), Nov. 2010</b> <a target="_blank" class="btn btn-outline-primary publinkitem" href="/paper/p2pfakefile.bib">citation</a>
</li>
</ul>
</div>
</section>