UAST

In the near future, a component for finding errors based on a unified abstract syntax tree (UAST for short) will be integrated into Svace.

Motivation

Svace has a component for static analysis based on an Abstract Syntax Tree (AST) for C, C++, Java, Kotlin and Go programming languages.

Analysis structure can be generally represented like this:

  1. During the controlled build stage, the compiler for the corresponding language is run, generating the AST for the source program;
  2. Acquired AST is analysed by detectors.
    • In case with Java and Kotlin languages detectors are written in modified version of their compilers.
    • In case with Go Goa which is our own development.
  3. Warnings are then written into files in svres format, which is then imported to Svace during the main analysis stage.

This structure is quite fiddly:

In order to eliminate the described inconveniences, we add the UAST component to the analyzer, which will analyze unified abstract syntax tree for different programming languages.

UAST analysis internals

UAST consists of two stages:

Thus, all the detectors are present in the same code base, which allows a simpler creation of new detectors and simpler support of said detectors.

UAST usage can also ease addition of new programming languages into the analyzer. For example, if Python which is being added to Svace, was added the old way, then in order to implement AST analysis, we would have to add another language to the scheme above.

The analysis operates with three basic entities:

At the current moment UAST analysis is implemented for Java, Kotlin and Python. In the future it will also be implemented for Go programming language.

In case with Java and Kotlin translation is implemented via compiler plugins for said languages. That’s why translator has all the information compiler has, thus no information is lost.

For the Python language, translation is implemented via a standard library module ast. Also, as this language has dynamic typing, information about types is not calculated and not used during the analysis.

Limitations

UAST analysis works only within bounds of a single file. The cause to this is that this analysis has to be fast and lightweight.

Because of this UAST analysis is not capable of finding errors, the origin of which is located in several files.


Vitaly Afanasyev