Optimizing Duplicate Size Thresholds in IDEs

Konstantin Grotov, Sergey Titov, Alexandr Suhinin, Yaroslav Golubev, and Timofey Bryksin

May, 2023. Published in the proceedings of MSR'23 (A).

Abstract. In this paper, we present an approach for transferring an optimal lower size threshold for clone detection from one language to another by analyzing their clone distributions. We showcase this method by transferring the threshold from regular Python scripts to Jupyter notebooks for using in two JetBrains IDEs, Datalore and DataSpell.

DOI Pre-print Data