In Java software, two important flexibility mechanisms are dynamic class loading and reflection. Unfortunately, the vast majority of static analyses for Java handle these features either unsoundly or overly conservatively. Our work targets techniques that will increase static analyses' ability to handle dynamic features in a more precise manner.
Since many of these dynamic features rely on string values to specify their run-time behavior, some static analyses have used string analysis to aid in resolution of such features. There are two main concerns with this practice: (1) often a string analysis is not powerful enough to accurately model the needed string values, and (2) the computing costs associated with a precise string analysis make it impractical to incorporate into many static analysis frameworks. We address the first concern by presenting a novel semi-static approach for resolving dynamic class loading by combining static string analysis with dynamically gathered information about the execution environment. The insight behind the approach is that dynamic class loading often depends on characteristics of the environment that are encoded in various environment variables. An experimental evaluation on the Java 1.4 standard libraries shows that a state-of-the-art string analysis resolves only 28% of non-trivial sites while our approach resolves 74% of such sites.
For string analysis to be useful for resolution of dynamic features, it has to exhibit practical cost in term of running time and memory usage. We propose several techniques to improve the scalability of string analysis. Our approach parallelizes a significant portion of the analysis, allowing it to take advantage of modern multi-core architectures. We also propose several extensions which reduce the amount of irrelevant information processed by the analysis. We applied an implementation of our proposed enhancements to 25 benchmark applications. For all benchmarks, our implementation realized a speedup. For two benchmarks, the speedup was over 180 times.
With the cost of precise string analysis reduced, we incorporate it and our semi-static approach into a Class Hierarch Analysis (CHA) call graph construction algorithm. A call graph is a critical component of many static analyses. We investigate how a hierarchy of assumptions allows for the incorporation of techniques to resolve instances of certain dynamic features. We implemented a unique CHA call graph construction analysis for each level of the assumption hierarchy. These implementations were applied to 10 benchmark applications in an experimental evaluation of the effects of the assumptions and the corresponding resolution techniques. The results of this study indicate that by incorporating assumptions about casting operations and string values, it is possible to remain conservative and reduce the number of edges in the graph by 54% through the use of various resolution techniques.
This work is a step toward making static analysis tools better equipped to handle the dynamic features of Java. These include tools that facilitate software development, testing, and understanding. Increasing the precision of these tools can decrease development costs and increase software reliability.