This project was completed for a graduate course on open source engineering. The goal was to come up with a way to efficiently detect when a significant portion of open source code was included in another project. My project partner and I investigated the use of dependency graphs, and found that while computationally intensive graph matching was not practical, an extremely simplified form of matching proved to work well.
Clone detection is important to the open source community, where projects are their source code are shared free of charge. The health of these communities is threatened when open source software is included in another product in violation of its license agreement. We developed a clone detector that improves on previous work by Brown et al . We built a method dependency graph labelled with Brown et al’s n-grams, used to concisely represent each method’s byte code. Using an approximate graph matching algorithm, we efficiently report 100% similarity when comparing a Java Archive (JAR) file that has been modified without changing its meaning.
The paper summarizing our work is available as a PDF.