[discussion] Improvements in existing analyzers and Additions #18

inishchith · 2019-03-30T08:29:13Z

This thread is for discussion related to

Improvements in existing analyzer
Addition of new analyzers under corresponding backends

valeriocos · 2019-03-31T12:50:49Z

Thank you @inishchith to open this issue. We could start commenting on possible improvements and additions to trigger discussions and evaluations. Maybe later we could create other issues to focus on specific tasks. Find below some ideas, anyone is free to share his own ideas (cc @jgbarah ).

too much python
The backends CoQua, CoDep and CoVuln rely on analyzers for Python. It could be useful to include analyzers that target other (popular) languages.

how to deal with multi-language repos
Some repos rely on more than one programming language (e.g. frontend and backend languages), so Graal could execute different analyzers when processing a repo. A similar approach has been already implemented for CoCom, which relies on cloc and lizard (triggered based on the extensions of the files processed by Graal). Another option could be to execute Graal several times, one per each programming language, this would allow the user to focus on a specific language.

analysis of configuration files
Nowadays configuration files are pretty popular, but currently Graal doesn't take them into account. For instance, some tools already exist to validate Docker file, while other tools inspect the content of docker containers to extract useful information about dependencies and vulnerabilities (cc @neglectos).

jgbarah · 2019-04-01T09:07:46Z

Thanks for the opportunity. I share @valeriocos concerns and ideas. One additional improvement could be to run some tool / heuristics on every file to infer its programming language. This would allow to skip binary files from analysis, but also target specific tools to specific files based on language, for example.

One option for this would be to use linguist, which is the tool GitHub uses for this matter.

valeriocos · 2019-04-01T13:49:22Z

@jgbarah what do you think to integrate linguist as a backend in graal ?

jgbarah · 2019-04-01T17:14:47Z

Not sure if as a backend, but it could be an idea.

The fact is that now I'm not aware of telling graal to say something like "if it is C, run this set of tools, if it is Python, run this other set, and if it is source code in any other language, run this one". linguist could help in this. Maybe by running lingusist first (as a backend), and based on the output from it, decide on the other tools to run...

inishchith · 2019-04-02T18:05:25Z

One option for this would be to use linguist, which is the tool GitHub uses for this matter.

@valeriocos Adding linguist as a backend would be really leveraged on as the project progresses. I would be interested in working on adding the backend support once i've some clarity over the idea and it's functioning.

The fact is that now I'm not aware of telling graal to say something like "if it is C, run this set of tools, if it is Python, run this other set, and if it is source code in any other language, run this one". linguist could help in this. Maybe by running lingusist first (as a backend), and based on the output from it, decide on the other tools to run...

@jgbarah That sounds interesting. I'm thinking something related to adding flags to restrict analysis on to specific languages. I'll add a more clearer Idea here after thinking this through.

inishchith · 2019-04-05T10:22:38Z

@valeriocos I'd like to some clarity over adding linguist as a backend. Let me know what you think :)
Thanks

Edit: sorry i by mistake, closed the issue. Have reopened it

valeriocos · 2019-04-05T10:29:10Z

Sure @inishchith , I'll have a look at it and go back to you in the next days. Thanks

inishchith · 2019-04-10T04:28:41Z

@valeriocos After understanding @jgbarah 's suggestion. I'm thinking we could integrate linguist as a backend ( analyzer for CoCom ) in order to infer the programming language used in a repository. And can be useful to implement metrics based on Percentage Programming Language of a software development repository with multiple languages in future.

Let me know what you think :)
I'd be interested in working through a solution.
Thanks

valeriocos · 2019-04-10T07:01:37Z

Thank you @inishchith , I like the idea! Let's see how to proceed :)

Why you would like to add linguist as an analyzer for CoCom instead of creating a new backend (e.g., CoLang) ?

AFAIU linguist returns the percentage of programming languages used in a repo, taking as input the path of the repo (or the snapshot at a given commit), which seems to be incompatible with the logic used in CoCom, which analyzes every file in a commit.

The new backend could rely on two analyzers, linguist and cloc, but in this case the latter would be executed on the full repo (instead of on a single file as done for the CoCom backend).

What do you think ?

inishchith · 2019-04-10T07:24:13Z

@valeriocos Thanks for the response.

Why you would like to add linguist as an analyzer for CoCom instead of creating a new backend (e.g., CoLang)?

Sorry, I missed something out there. I was thinking on the lines of inferring the programming language may only be useful in Code complexity analysis, whereas it can be extensible and useful to other backends as well, also the idea of adding an analyzer for CoCom backend doesn't fit well as you later said.

AFAIU linguist returns the percentage of programming languages used in a repo, taking as input the path of the repo (or the snapshot at a given commit), which seems to be incompatible with the logic used in CoCom, which analyzes every file in a commit.

Exactly! In that case, we should have a new backend ( CoLang, as you suggested )

The new backend could rely on two analyzers, linguist and cloc, but in this case the latter would be executed on the full repo (instead of on a single file as done for the CoCom backend).

Yes. This adds more clarity to the idea!.

@valeriocos @jgbarah Please let me know if can start working on this. We can have a discussion on the structure of the output produced and tests to be added once incorporating the new Idea in the corresponding PR.

valeriocos · 2019-04-10T07:28:55Z

Thank you for your prompt reply @inishchith !
+1 from my side, let's wait for @jgbarah 's feedback

Just an idea that popped up right now.
Maybe the work to be done for this new backend could be shared with other people interested in this proposal. Since you have already some experience in writing an analyzer, you could focus on writing the backend, while the analyzers could be done by others. It is possible that the development will be slower, but it can be a good experience for those ones involved.

apoorvaanand1998 · 2019-04-10T08:04:14Z

Hi everyone, sorry for not being active in the discussions. College commitments are taking all of my time right now. As mentioned in the proposal, I'll be free from 16th and will work hard in thinking about CoLang as a backend

valeriocos · 2019-04-10T11:10:23Z

Thank you @apoorvaanand1998 for your interest. If you want, you can also explore how to integrate:

Sonarqube data
Other dependencies tools (e.g., SonarGraph)
Support for COBOL analysis tool <--- which would be really good to have :)

What do you think ?

List of tools:

https://github.com/mre/awesome-static-analysis

inishchith · 2019-04-10T13:52:03Z

Sorry for the delayed response. @valeriocos

Just an idea that popped up right now.
Maybe the work to be done for this new backend could be shared with other people interested in this proposal. Since you have already some experience in writing an analyzer, you could focus on writing the backend, while the analyzers could be done by others. It is possible that the development will be slower, but it can be a good experience for those ones involved.

I was thinking to add CoLang Backend along with linguist analyzer initially, and then open up the corresponding issue with a proper description for tasks remaining. Some of the splits being:

Integrating cloc analyzer with CoLang Backend
Adding appropriate unit tests
Adding documentation

Let me know what you think.
I'd be comfortable on making changes and going ahead with your suggestions.
Thanks :)

valeriocos · 2019-04-10T14:02:11Z

It sounds perfect @inishchith , feel free to start when you want, thanks.

inishchith · 2019-04-10T14:06:39Z

@valeriocos Thanks for the speedy response.

@apoorvaanand1998 Thanks for your interest in the discussion. Feel free to add your ideas here. We'll have some issues open in the next few days :)

inishchith · 2019-04-11T17:01:43Z

@valeriocos I needed some suggestions here regarding the result to be produced.

The output produced by linguist for a given repository. (An instance, for kibiter repository) would be:

91.14%  JavaScript
5.26%   HTML
3.40%   CSS
0.09%   Shell
0.06%   Dockerfile
0.04%   CartoCSS
0.02%   Batchfile

JavaScript:
Gruntfile.js
packages/eslint-config-kibana/.eslintrc.js
packages/eslint-config-kibana/jest.js
packages/eslint-plugin-kibana-custom/index.js
scripts/backport.js
........
.......

I'm thinking of the following structure of result for every snapshot at a given commit. ( breakdown in case of set details flag )

{
             "languages":{
                       "JavaScript": 91.14,
                       "HTML": 5.26,
                        "CSS": 3.40,
                                ...
                }
             "breakdown":{
                  "JavaScript": ["Gruntfile.js", "packages/eslint-config-kibana/.eslintrc.js", "packages/eslint-config-kibana/jest.js" ... ],
                  "HTML":  ..
                    ........
                    ........
               }
}

What do you think?

valeriocos · 2019-04-11T19:47:06Z

I'm not sure about the breakdown section, for large repositories this could be a really long list. We could start with the easiest solution, no breakdown section, and add it in the future (maybe, a breakdown at folder level).
What do you think @inishchith ?

inishchith · 2019-04-12T08:44:35Z

@valeriocos For large repositories, Yes ,it'd be a long list and cause a clutter in the result produced.
The idea of breakdown at folder level sounds good to me, would require an explict entrypoint from the user. I'll mark the breakdown task as a TODO
Thanks for the suggestion. I'll open a PR soon :)

valeriocos · 2019-04-12T08:49:27Z

great! thanks @inishchith

apoorvaanand1998 · 2019-04-17T08:32:32Z

Thank you @apoorvaanand1998 for your interest. If you want, you can also explore how to integrate:
* [Sonarqube](https://www.sonarqube.org/) data

* Other dependencies tools (e.g., [SonarGraph](https://github.com/sonargraph))

* Support for COBOL analysis tool <--- which would be really good to have :)
What do you think ?

List of tools:
* https://github.com/mre/awesome-static-analysis

Hi @valeriocos, sorry for the late response. I've been looking into COBOL analyzers, and I cannot find anything that is open source. Everything is a "product". The only thing I could find was this, but I couldn't find any documentation on it. I feel like this is a dead end.

SonarQube has an analyzer for COBOL called SonarCOBOL, but it is only available in the enterprise edition. The link you provided for the open source SonarGraph components also require SonarGraph which is also a commercial tool.

There is SonarQube community edition and SonarGraph explorer which are free and open source. Should I explore these? I don't know enough about them to know if they can even easily be integrated.

While doing my research, I found Yasca which has a "COBOL analyzer". Yasca is a depreciated open source project. It had this analyzer, which if I understand correctly only does one thing - Counts the number of getmains and freemains and sees if they're equal. I don't know enough about COBOL to understand what these are though, but IMO I don't think these produce enough data?

How do I proceed from here?

apoorvaanand1998 · 2019-04-19T09:50:27Z

@valeriocos Ping. I'm really stuck. Can you point me in the right direction?

valeriocos · 2019-04-20T15:13:11Z

Sorry @apoorvaanand1998 for the late reply. What do you think about improving the support for Java projects in Graal ?

A dependency analyzer for Java projects using maven or gradle could be a nice addition. Another option is to look on Internet for open source tools tailored to Java (e.g, https://devua.co/2017/07/19/java-code-quality-tools/?i=1) and select one to be included in graal.

apoorvaanand1998 · 2019-04-20T15:19:50Z

I shall check these out @valeriocos, thank you. I was also thinking of translating the yasca analyzer for Cobol to python, and sending a PR. At least this way we can get started with a COBOL analyzer. Does this sound like a good idea?

valeriocos · 2019-04-20T15:37:20Z

You're welcome @apoorvaanand1998 .

I was also thinking of translating the yasca analyzer for Cobol to python, and sending a PR

It sounds like too much work. Probably the idea of providing support for COBOL wasn't good, as you said there is almost nothing outside to be plugged into Graal. Maybe it is better to focus on other languages, more popular and with more available analyzers. What do you think ?

apoorvaanand1998 · 2019-04-20T15:43:04Z

I agree @valeriocos, I shall get started on my research and when I have a clear idea, I'll open an issue for more specific discussion. Is that okay?

valeriocos · 2019-04-20T15:59:59Z

that's perfect @apoorvaanand1998 , thank you :)

valeriocos mentioned this issue Mar 30, 2019

GSoC Idea: Support of Source Code Related Metrics chaoss/grimoirelab#182

Closed

inishchith closed this as completed Apr 5, 2019

inishchith reopened this Apr 5, 2019

inishchith mentioned this issue Apr 12, 2019

[backend] Add CoLang Backend and Linguist Analyzer #19

Merged

inishchith mentioned this issue Jun 24, 2019

[improvements] Evaluation of existing approaches and optimizations inishchith/gsoc#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[discussion] Improvements in existing analyzers and Additions #18

[discussion] Improvements in existing analyzers and Additions #18

inishchith commented Mar 30, 2019

valeriocos commented Mar 31, 2019

jgbarah commented Apr 1, 2019

valeriocos commented Apr 1, 2019

jgbarah commented Apr 1, 2019

inishchith commented Apr 2, 2019

inishchith commented Apr 5, 2019 •

edited

valeriocos commented Apr 5, 2019

inishchith commented Apr 10, 2019

valeriocos commented Apr 10, 2019

inishchith commented Apr 10, 2019 •

edited

valeriocos commented Apr 10, 2019 •

edited

apoorvaanand1998 commented Apr 10, 2019

valeriocos commented Apr 10, 2019 •

edited

inishchith commented Apr 10, 2019 •

edited

valeriocos commented Apr 10, 2019

inishchith commented Apr 10, 2019

inishchith commented Apr 11, 2019 •

edited

valeriocos commented Apr 11, 2019

inishchith commented Apr 12, 2019

valeriocos commented Apr 12, 2019

apoorvaanand1998 commented Apr 17, 2019

apoorvaanand1998 commented Apr 19, 2019

valeriocos commented Apr 20, 2019

apoorvaanand1998 commented Apr 20, 2019

valeriocos commented Apr 20, 2019

apoorvaanand1998 commented Apr 20, 2019

valeriocos commented Apr 20, 2019

[discussion] Improvements in existing analyzers and Additions #18

[discussion] Improvements in existing analyzers and Additions #18

Comments

inishchith commented Mar 30, 2019

valeriocos commented Mar 31, 2019

jgbarah commented Apr 1, 2019

valeriocos commented Apr 1, 2019

jgbarah commented Apr 1, 2019

inishchith commented Apr 2, 2019

inishchith commented Apr 5, 2019 • edited

valeriocos commented Apr 5, 2019

inishchith commented Apr 10, 2019

valeriocos commented Apr 10, 2019

inishchith commented Apr 10, 2019 • edited

valeriocos commented Apr 10, 2019 • edited

apoorvaanand1998 commented Apr 10, 2019

valeriocos commented Apr 10, 2019 • edited

inishchith commented Apr 10, 2019 • edited

valeriocos commented Apr 10, 2019

inishchith commented Apr 10, 2019

inishchith commented Apr 11, 2019 • edited

valeriocos commented Apr 11, 2019

inishchith commented Apr 12, 2019

valeriocos commented Apr 12, 2019

apoorvaanand1998 commented Apr 17, 2019

apoorvaanand1998 commented Apr 19, 2019

valeriocos commented Apr 20, 2019

apoorvaanand1998 commented Apr 20, 2019

valeriocos commented Apr 20, 2019

apoorvaanand1998 commented Apr 20, 2019

valeriocos commented Apr 20, 2019

inishchith commented Apr 5, 2019 •

edited

inishchith commented Apr 10, 2019 •

edited

valeriocos commented Apr 10, 2019 •

edited

valeriocos commented Apr 10, 2019 •

edited

inishchith commented Apr 10, 2019 •

edited

inishchith commented Apr 11, 2019 •

edited