ICSE 2023 Dry Runs

28/4/2023

Speaker

Ba Jinsheng, Zhiyu Fan, and Liu Jiahao

"Testing Database Engines via Query Plan Guidance" by Ba Jinsheng

Abstract

Jinsheng Ba is a Ph.D. student in the NUS TEST lab. Before coming to NUS, he was a security engineer at Huawei. His research interests focus on software security. He has published papers on the top-tier conferences Usenix Security, ASE, and ICSE. Notably, he has received an ACM SIGSOFT Distinguished Paper Award at ASE '22.

Bio

Database systems are widely used to store and query data. Test oracles have been proposed to find logic bugs in such systems, that is, bugs that cause the database system to compute an incorrect result. To realize a fully automated testing approach, such test oracles are paired with a test case generation technique; a test case refers to a database state and a query on which the test oracle can be applied. In this work, we propose the concept of Query Plan Guidance (QPG) for guiding automated testing towards ''interesting'' test cases. SQL and other query languages are declarative. Thus, to execute a query, the database system translates every operator in the source language to one of the potentially many so-called physical operators that can be executed; the tree of physical operators is referred to as the query plan. Our intuition is that by steering testing towards exploring a variety of unique query plans, we also explore more interesting behaviors---some of which are potentially incorrect. To this end, we propose a mutation technique that gradually applies promising mutations to the database state, causing the DBMS to create potentially unseen query plans for subsequent queries. We applied our method to three mature, widely-used, and extensively-tested database systems---SQLite, TiDB, and CockroachDB---and found 53 unique, previously unknown bugs. Our method exercises 4.85x-408.48x more unique query plans than a naive random generation method and 7.46x more than a code coverage guidance method. Since most database systems---including commercial ones---expose query plans to the user, we consider QPG a generally applicable, black-box approach and believe that the core idea could also be applied in other contexts (e.g., to measure the quality of a test suite).

"Automated Repair of Programs from Large Language Models" by Zhiyu Fan

Abstract

Large language models such as Codex, have shown the capability to produce code for many programming tasks. However, the success rate of existing models is low, especially for complex programming tasks. One of the reasons is that language models lack awareness of program semantics, resulting in incorrect programs, or even programs which do not compile. In this paper, we systematically study whether automated program repair (APR) techniques can fix the incorrect solutions produced by language models in LeetCode contests. The goal is to study whether APR techniques can enhance reliability in the code produced by large language models. Our study revealed that: (1) automatically generated code shares common programming mistakes with human-crafted solutions, indicating APR techniques may have potential to fix auto-generated code; (2) given bug location information provided by a statistical fault localization approach, the newly released Codex edit mode, which supports editing code, is similar to or better than existing Java repair tools TBar and Recoder in fixing incorrect solutions. By analyzing the experimental results generated by these tools, we provide several suggestions: (1) enhancing APR tools to surpass limitations in patch space (e.g., introducing more flexible fault localization) is desirable; (2) as large language models can derive more fix patterns by training on more data, future APR tools could shift focus from adding more fix patterns to synthesis/semantics based approaches, (3) combination of language models with APR to curate patch ingredients, is worth studying.

Bio

Zhiyu Fan is a Ph.D. student in the NUS SecureSW Lab, advised by Prof Abhik Roychoudhury. His research interests include automated program repair, large language models, with a specific focus on intelligent tutoring for programming education.

"Learning Graph-based Code Representations for Source-level Functional Similarity Detection" by Liu Jiahao

Abstract

Detecting code functional similarity forms the basis of various software engineering tasks. However, the detection is challenging as functionally similar code fragments can be implemented differently, e.g., with irrelevant syntax. Recent studies incorporate program dependencies as semantics to identify syntactically different yet semantically similar programs, but they often focus only on local neighborhoods (e.g., one-hop dependencies), limiting the expressiveness of program semantics in modeling functionalities. In this paper, we present Tailor that explicitly exploits graph-structured code features (e.g., multi-hop neighbors) for functional similarity detection. Given source-level programs, Tailor first represents them into code property graphs (CPGs) — which combine abstract syntax trees, control flow graphs, and data flow graphs — to collectively reason about program syntax and semantics. Then, Tailor learns representations of CPGs by applying a CPG-based neural network (CPGNN) to iteratively propagate information on them. It improves over prior work on code representation learning through a new graph neural network (GNN) tailored to CPG structures instead of the off-the-shelf GNNs used previously. We systematically evaluate Tailor on C and Java programs using two public benchmarks. Experimental results show that Tailor outperforms the state-of-the-art approaches, achieving 99.8% and 99.9% F-scores in code clone detection and 98.3% accuracy in source code classification.

Bio

Liu Jiahao is currently a third-year PhD student advised by Prof Liang, and his research is primarily focused on Program Analysis (e.g., source code analysis, and Android malware detection), as well as Log Analysis.