Lines of Code and related lines-oriented-statistics with Perl

6 02 2011

A project I’m involved right now is making a Static code analyzer. The main goal is to produce a RoR front-end webapp with the capability to submit code ad analyze it statically. Our main goal languages are C/C++, but we will attach other tools to do some work for other languages. For Static I mean without the need to run the program. One thing we will never be able to answer with Static analysis is the behavior of the program, but we can answer some things that with dynamic analysis we can not, so there is not such a think like one is better or more complete than the other.

The RoR front-end is almost done and quick contribution I gave was a Perl script that receives a folder as input and analyzes all the source code available inside the folder (recursive). You can find the README.markdown under the same folder.
This analysis is just oriented to quantity of lines of code.

How to use

Well, if you don’t want to read the README.markdown file and experiment the script I will show you the output and the command you need to produce this images.

So, by default you only need to say the input folder and the output prefix name for the images, like so we an say:

[ulissesaraujocosta@maclisses:trab1]-$ perl -open ../../../Static-Code-Analyzer/ -out work

This will produce the following 3 images (click on the images to enlarge):

By default the script always produce this 3 images: number of files per language, number of lines per language and the ratio between them (the average of lines per file, per language).

We can also see this values in percentage, related to overall project (folder).
Image number 3 will be the same, because does not make sense ratio percentage.

[ulissesaraujocosta@maclisses:trab1]-$ perl -open ../../../Static-Code-Analyzer/ -out work_percent -percent

And my favorite, produce an image with a overall picture of the percentage use of each language on the project.

[ulissesaraujocosta@maclisses:trab1]-$ perl -open ../../../Static-Code-Analyzer/ -out work_All -all

With this script you can also generate pie charts, by default it uses bars charts, please read the README.markdown for more information.

How to improve and support more languages?

If you want to improve the script feel free to fork on github and maybe we can discuss more about the script.
To support more languages you just have to add one more entrance in the hashtable and write:

extension => {"nrFiles" => 0, "nrLines" => 0, "comments" => function_to_catch_comments, "nrComments" => 0, "percentageNrFiles" => 0, "percentageNrLines" => 0, "percentageNrComments" => 0}

This is an example of the entrace for C++:

"cpp"  => {"nrFiles" => 0, "nrLines" => 0, "comments" => sub { return shift =~ m/(\*(.|\n|\r)*?\*)|(^[ \t\n]*\/\/.*)/; },    "nrComments" => 0,
                         "percentageNrFiles" => 0, "percentageNrLines" => 0, "percentageNrComments" => 0

This is a simple presentation I gave about the module I used: GD::Graph