软工大牛Hakjoo Oh及其顶会论文解读

本文主要是介绍软工大牛Hakjoo Oh及其顶会论文解读，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

前言

本文介绍软件工程领域大牛Hakjoo Oh ，并且分析其今年顶会论文“MemFix: Static Analysis-Based Repair of Memory Deallocation Errors for C”

一、论文及作者信息

会议：FSE 2018: ACM SIGSOFT Symposium on the Foundations of Software Engineering
论文名称：MemFix: Static Analysis-Based Repair of Memory Deallocation Errors for C
第一作者：Junhee Lee
大学：Korea University
联系方式：junhee_lee@korea.ac.kr

所有作者：Junhee Lee*, Seongjoon Hong*, and Hakjoo Oh (*contributed equally)

在网上找到了Hakjoo Oh的主页（http://prl.korea.ac.kr/~pronto/home/）
，实在是太强了啊，这个人今年6篇顶会：FSE 2018 一篇，OOPSLA 2018 两篇，ASE 2018 一篇，IJCAI 2018 一篇, ICSE 2018 一篇。

确实是个大神。

作者Hakjoo Oh

有时间应该把这个实验室的文章系统的看一遍：
1）MemFix: Static Analysis-Based Repair of Memory Deallocation Errors for C
2）Precise and Scalable Points-to Analysis via Data-Driven Context Tunneling
3）Automatic Diagnosis and Correction of Logical Errors for Functional Programming Assignments
4）Template-Guided Concolic Testing via Online Learning
5）Synthesizing Pattern Programs from Examples
6）Automatically Generating Search Heuristics for Concolic Testing

二、论文内容

Given a buggy program, MemFix aims to repair the program by finding a set of deallocation statements that correctly deallocate all allocated memory objects without causing double-frees and use-after-frees.

We show that finding such a set of deallocation statements is essentially to solve an exact cover problem [13] that can be derived by a static analysis. The static analysis is a variant of type-state analysis [14], which tracks possible deallocation statements for each object. Finding an exact cover is a well-known NP-complete problem, for which we use an oﬀ-the-shelf SAT solver. Because MemFix is based on a sound static analysis, the MemFix-generated patches are guaranteed to correctly fx the target error without introducing new
errors that are absent in the original program.

[13] https://en.wikipedia.org/wiki/Exact_cover
[14] Stephen Fink, Eran Yahav, Nurit Dor, G. Ramalingam, and Emmanuel Geay. 2006. Eﬀective Typestate Verifcation in the Presence of Aliasing. In Proceedings of the 2006 International Symposium on Software Testing and Analysis. 133–144.

这个工具比2015年ICSE文章leakfix还要强，强得多。这个好厉害。原文：

We show that these results represent a signifcant advance over the state-of-the-art by comparing MemFix with LeakFix [15], a recently developed tool for fxing memory leaks in C programs.

写作功底也很强，这句话，我写不出来。

[15] Qing Gao, Yingfei Xiong, Yaqing Mi, Lu Zhang, Weikun Yang, Zhaoping Zhou, Bing Xie, and Hong Mei. 2015. Safe Memory-leak Fixing for C Programs. In Proceedings of the 37th International Conference on Software Engineering (ICSE ’15). 459–470.

作者工作做得也太足了吧！妥妥的顶会哇。。。因为他把memfix直接和leakfix在三个benchmark set上做了比较，了不起。

作者的贡献：
1）开发了memfix
2）通过实验，将memfix和leakfix做了比较
3）将工具和benchmark开源，而且自己提取了100个开源程序中的real error。（55-664 行代码，这里我觉得比较奇怪，可以这样提取的吗，这算是人工错误还是现实错误呢？）

感觉工作还是很实的，落到实处了，想到以前我的工作，我觉得自己确实不够，还不够。

MemFix works with two key ideas: 1) we use a static analysis that collects patch candidates for each allocated object, and 2) we reduce the problem of fnding a correct patch into an exact cover problem over allocated objects.
1) Static Analysis for Collecting Patch Candidates. 主要使用了静态分析来收集候选补丁。这里使用的分析技术是：a variant of typestate analysis [14].
2) Finding Correct Patches by Solving Exact Cover Problem. We fnd the correct patches by reducing the problem into an exact cover problem as follows. The exact cover problem is NP-complete [13]. We solve the problem by encoding it as boolean satisfability and leveraging an oﬀ-the-shelf SAT solver [20].

We implemented a prototype of MemFix on top of Sparrow, which provides a general framework for performing abstract interpretation of C programs. We instantiated the framework with the abstract domain and semantics function described in Section 3. Our focus is on faithfully implementing the analysis and algorithm in Section 3, rather than on optimizing its performance.

整个工具是基于开源软件sparrow做的（主要实现的就是MEMFIX的算法）

在related work中，作者把近年来开发的各种自动修复工具分成两类：special-purpose and general-purpose techniques 。
1）Special-Purpose Program Repair Techniques. MemFix belongs to the family of techniques that focus on fxing specifc classes of errors. Existing techniques aim to automatically fx memory leak [15, 43, 52], buﬀer overﬂow [39], integer overflow [7], null pointer exceptions [12], error handling bugs [46], race errors [1], etc. To our knowledge, MemFix is the frst technique that is able to fx memory deallocation errors (i.e., memory-leak, double-free, and use-after-free) in a unifed fashion.

By contrast, MemFix is designed to fx memory deallocation errors in general with a novel algorithm that solves the exact cover problem induced by a static analysis.

2）General-Purpose Program Repair Techniques. The general test-based approaches (e.g., [21–23, 26, 28–31, 40, 49, 50]) to automatic program repair is not adequate for fxing memory deallocation errors. These approaches work with a set of testcases, some of which expose the error in the program, and aim to find a patched program that behaves correctly on all inputs in the test-suite. These techniques can be classifed into generate-andvalidate and semantics-based techniques. Generate-and-validate approaches [21, 26, 28, 49] repeatedly search for candidate patches within a pre-defned search space until a program that can be validated against the test-suite is found. Semantics-based approaches [22, 23, 29–31] use symbolic execution to derive contraints on the correct patch and synthesize the patch by using program synthesis or constraint solving.

三、特点

1）这种顶会的摘要好像都比较短，一般是只有一段，而且开门见山（有时候论文背景，即background，都不说的）

2）竟然看到了CVE，这个值得关注一下。

For example, vulnerabilities CVE-2017-10810, CVE-2017-6353, and CVE-2017-8824 recently found in Linux kernel are due to ML, DF, and UAF, respectively.

此外文中还提到了CWE，不仅局限在CVE。所以这两个都可以关注下。

3）感觉这篇文章比较容易引起评委的兴趣，因为这个introduction写的非常好。

首先，把要解决的问题说清楚了：分成三个类型的error
然后，通过列举现在很出名的开源社区代码：linux，php，git，openssl。对Memory Deallocation Errors的普遍性进行了一个统计，这个好酷啊（从所有commits里面统计，真的我确实想不到）。
然后，还举例了CVE，CWE，这让我不了解这个领域的人，都觉得这个问题很重要，很需要解决，作者确实高明。

4）如果要学习作者的写作风格，我还得看一篇。不然没法确定。（比如是不是每篇文章的introduction都要搞一个小调查小统计，来吸引评委）

5）自己定义的专业术语（工具名称）：MEMFIX，是用latex写的，一看就很酷啊。。。
我不能再用word了。

6）作者在OVERVIEW这个章节里面给了图，例子，解释说明。给出一个有double-free error的程序，然后给出人为修复和memfix修复，讲的确实很清楚，没有可以指摘之处。

我不如。

7）作者在memfix用了两个关键技术：static analysis来生成候选补丁，SAT solver来求解正确补丁。其中前者是现有typestate analysis [14] 技术的变体，后者是现成的SAT solver [20]。

这说明了什么？？？他山之石可以攻玉。我们不一定要发明技术，如果你的想法很好，idea很不错，直接把其他领域或者其他方向的技术拿过来用，解决问题，那就是很酷的。

8）对一些专业术语使用不一样的字体，感觉很讲究。

（英文）段落中不一样的字体

9）参考文献基本上都是ICSE，ISSTA，POPL，PLDI这些顶会，还有ICSME等等。感觉使用参考文献也是比较讲究的。
不过这篇文章用的文献似乎不够新，没有18年的，17年的也比较少。16年及以前的比较多。

10）使用表格比较多。也画的挺清楚的，值得学习。

11）还讲了自己的limitation，感觉算是很实诚的了。有必要的时候，确实可以深入分析那些修复失败的benchmark set，找一下原因，也就是自己的limitation。

四、生词收集

vita 履历

Concolic testing：和符号执行有关的一种测试。专业术语。

deallocation 英[di:ælə’keɪʃən] 美[di:ælə’keɪʃən] n. 存储单元分配;

entrust 英[ɪnˈtrʌst]
美[ɪnˈtrʌst]
vt. 委托，托付;

CWE: Common Weakness Enumeration (CWE) is a list of software weaknesses. https://cwe.mitre.org/ 然而这个网站我现在打不开。

unified 英[‘ju:nɪfaɪd]
美[‘ju:nɪfaɪd]
adj. 统一的; 一元化的; (unify 的过去式和过去分词) ; 统一标准的;
v. 统一; 使联合( unify的过去式和过去分词 ); 使相同; 使一致;

off-the-shelf
英美 [‘ɔfðə’ʃɛlf]
adj. 现成的；常备的；成品的
adv. 现成地；无需作重大修改地

五、好句摘录

We show that these results represent a signifcant advance over the state-of-the-art by comparing MemFix with LeakFix [15], a recently developed tool for fxing memory leaks in C programs.

All experiments were done on a linux machine with Intel Core i5-4590 and 8GB RAM. 以后做实验都可以加上这一句。