Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] Improvements in existing analyzers and Additions #18

Open
inishchith opened this issue Mar 30, 2019 · 27 comments
Open

[discussion] Improvements in existing analyzers and Additions #18

inishchith opened this issue Mar 30, 2019 · 27 comments

Comments

@inishchith
Copy link
Contributor

This thread is for discussion related to

  • Improvements in existing analyzer
  • Addition of new analyzers under corresponding backends
@valeriocos
Copy link
Member

Thank you @inishchith to open this issue. We could start commenting on possible improvements and additions to trigger discussions and evaluations. Maybe later we could create other issues to focus on specific tasks. Find below some ideas, anyone is free to share his own ideas (cc @jgbarah ).

too much python
The backends CoQua, CoDep and CoVuln rely on analyzers for Python. It could be useful to include analyzers that target other (popular) languages.

how to deal with multi-language repos
Some repos rely on more than one programming language (e.g. frontend and backend languages), so Graal could execute different analyzers when processing a repo. A similar approach has been already implemented for CoCom, which relies on cloc and lizard (triggered based on the extensions of the files processed by Graal). Another option could be to execute Graal several times, one per each programming language, this would allow the user to focus on a specific language.

analysis of configuration files
Nowadays configuration files are pretty popular, but currently Graal doesn't take them into account. For instance, some tools already exist to validate Docker file, while other tools inspect the content of docker containers to extract useful information about dependencies and vulnerabilities (cc @neglectos).

@jgbarah
Copy link
Contributor

jgbarah commented Apr 1, 2019

Thanks for the opportunity. I share @valeriocos concerns and ideas. One additional improvement could be to run some tool / heuristics on every file to infer its programming language. This would allow to skip binary files from analysis, but also target specific tools to specific files based on language, for example.

One option for this would be to use linguist, which is the tool GitHub uses for this matter.

@valeriocos
Copy link
Member

@jgbarah what do you think to integrate linguist as a backend in graal ?

@jgbarah
Copy link
Contributor

jgbarah commented Apr 1, 2019

Not sure if as a backend, but it could be an idea.

The fact is that now I'm not aware of telling graal to say something like "if it is C, run this set of tools, if it is Python, run this other set, and if it is source code in any other language, run this one". linguist could help in this. Maybe by running lingusist first (as a backend), and based on the output from it, decide on the other tools to run...

@inishchith
Copy link
Contributor Author

One option for this would be to use linguist, which is the tool GitHub uses for this matter.

@valeriocos Adding linguist as a backend would be really leveraged on as the project progresses. I would be interested in working on adding the backend support once i've some clarity over the idea and it's functioning.

The fact is that now I'm not aware of telling graal to say something like "if it is C, run this set of tools, if it is Python, run this other set, and if it is source code in any other language, run this one". linguist could help in this. Maybe by running lingusist first (as a backend), and based on the output from it, decide on the other tools to run...

@jgbarah That sounds interesting. I'm thinking something related to adding flags to restrict analysis on to specific languages. I'll add a more clearer Idea here after thinking this through.

@inishchith
Copy link
Contributor Author

inishchith commented Apr 5, 2019

@valeriocos I'd like to some clarity over adding linguist as a backend. Let me know what you think :)
Thanks

Edit: sorry i by mistake, closed the issue. Have reopened it

@inishchith inishchith reopened this Apr 5, 2019
@valeriocos
Copy link
Member

Sure @inishchith , I'll have a look at it and go back to you in the next days. Thanks

@inishchith
Copy link
Contributor Author

@valeriocos After understanding @jgbarah 's suggestion. I'm thinking we could integrate linguist as a backend ( analyzer for CoCom ) in order to infer the programming language used in a repository. And can be useful to implement metrics based on Percentage Programming Language of a software development repository with multiple languages in future.

Let me know what you think :)
I'd be interested in working through a solution.
Thanks

@valeriocos
Copy link
Member

Thank you @inishchith , I like the idea! Let's see how to proceed :)

Why you would like to add linguist as an analyzer for CoCom instead of creating a new backend (e.g., CoLang) ?

AFAIU linguist returns the percentage of programming languages used in a repo, taking as input the path of the repo (or the snapshot at a given commit), which seems to be incompatible with the logic used in CoCom, which analyzes every file in a commit.

The new backend could rely on two analyzers, linguist and cloc, but in this case the latter would be executed on the full repo (instead of on a single file as done for the CoCom backend).

What do you think ?

@inishchith
Copy link
Contributor Author

inishchith commented Apr 10, 2019

@valeriocos Thanks for the response.

Why you would like to add linguist as an analyzer for CoCom instead of creating a new backend (e.g., CoLang)?

Sorry, I missed something out there. I was thinking on the lines of inferring the programming language may only be useful in Code complexity analysis, whereas it can be extensible and useful to other backends as well, also the idea of adding an analyzer for CoCom backend doesn't fit well as you later said.

AFAIU linguist returns the percentage of programming languages used in a repo, taking as input the path of the repo (or the snapshot at a given commit), which seems to be incompatible with the logic used in CoCom, which analyzes every file in a commit.

Exactly! In that case, we should have a new backend ( CoLang, as you suggested )

The new backend could rely on two analyzers, linguist and cloc, but in this case the latter would be executed on the full repo (instead of on a single file as done for the CoCom backend).

Yes. This adds more clarity to the idea!.

@valeriocos @jgbarah Please let me know if can start working on this. We can have a discussion on the structure of the output produced and tests to be added once incorporating the new Idea in the corresponding PR.

@valeriocos
Copy link
Member

valeriocos commented Apr 10, 2019

Thank you for your prompt reply @inishchith !
+1 from my side, let's wait for @jgbarah 's feedback

Just an idea that popped up right now.
Maybe the work to be done for this new backend could be shared with other people interested in this proposal. Since you have already some experience in writing an analyzer, you could focus on writing the backend, while the analyzers could be done by others. It is possible that the development will be slower, but it can be a good experience for those ones involved.

@apoorvaanand1998
Copy link

Hi everyone, sorry for not being active in the discussions. College commitments are taking all of my time right now. As mentioned in the proposal, I'll be free from 16th and will work hard in thinking about CoLang as a backend

@valeriocos
Copy link
Member

valeriocos commented Apr 10, 2019

Thank you @apoorvaanand1998 for your interest. If you want, you can also explore how to integrate:

  • Sonarqube data
  • Other dependencies tools (e.g., SonarGraph)
  • Support for COBOL analysis tool <--- which would be really good to have :)

What do you think ?

List of tools:

@inishchith
Copy link
Contributor Author

inishchith commented Apr 10, 2019

Sorry for the delayed response. @valeriocos

Just an idea that popped up right now.
Maybe the work to be done for this new backend could be shared with other people interested in this proposal. Since you have already some experience in writing an analyzer, you could focus on writing the backend, while the analyzers could be done by others. It is possible that the development will be slower, but it can be a good experience for those ones involved.

I was thinking to add CoLang Backend along with linguist analyzer initially, and then open up the corresponding issue with a proper description for tasks remaining. Some of the splits being:

  • Integrating cloc analyzer with CoLang Backend
  • Adding appropriate unit tests
  • Adding documentation

Let me know what you think.
I'd be comfortable on making changes and going ahead with your suggestions.
Thanks :)

@valeriocos
Copy link
Member

It sounds perfect @inishchith , feel free to start when you want, thanks.

@inishchith
Copy link
Contributor Author

@valeriocos Thanks for the speedy response.

@apoorvaanand1998 Thanks for your interest in the discussion. Feel free to add your ideas here. We'll have some issues open in the next few days :)

@inishchith
Copy link
Contributor Author

inishchith commented Apr 11, 2019

@valeriocos I needed some suggestions here regarding the result to be produced.

The output produced by linguist for a given repository. (An instance, for kibiter repository) would be:

91.14%  JavaScript
5.26%   HTML
3.40%   CSS
0.09%   Shell
0.06%   Dockerfile
0.04%   CartoCSS
0.02%   Batchfile

JavaScript:
Gruntfile.js
packages/eslint-config-kibana/.eslintrc.js
packages/eslint-config-kibana/jest.js
packages/eslint-plugin-kibana-custom/index.js
scripts/backport.js
........
.......

I'm thinking of the following structure of result for every snapshot at a given commit. ( breakdown in case of set details flag )

{
             "languages":{
                       "JavaScript": 91.14,
                       "HTML": 5.26,
                        "CSS": 3.40,
                                ...
                }
             "breakdown":{
                  "JavaScript": ["Gruntfile.js", "packages/eslint-config-kibana/.eslintrc.js", "packages/eslint-config-kibana/jest.js" ... ],
                  "HTML":  ..
                    ........
                    ........
               }
}

What do you think?

@valeriocos
Copy link
Member

I'm not sure about the breakdown section, for large repositories this could be a really long list. We could start with the easiest solution, no breakdown section, and add it in the future (maybe, a breakdown at folder level).
What do you think @inishchith ?

@inishchith
Copy link
Contributor Author

@valeriocos For large repositories, Yes ,it'd be a long list and cause a clutter in the result produced.
The idea of breakdown at folder level sounds good to me, would require an explict entrypoint from the user. I'll mark the breakdown task as a TODO
Thanks for the suggestion. I'll open a PR soon :)

@valeriocos
Copy link
Member

great! thanks @inishchith

@apoorvaanand1998
Copy link

Thank you @apoorvaanand1998 for your interest. If you want, you can also explore how to integrate:

* [Sonarqube](https://www.sonarqube.org/) data

* Other dependencies tools (e.g., [SonarGraph](https://github.com/sonargraph))

* Support for COBOL analysis tool <--- which would be really good to have :)

What do you think ?

List of tools:

* https://github.com/mre/awesome-static-analysis

Hi @valeriocos, sorry for the late response. I've been looking into COBOL analyzers, and I cannot find anything that is open source. Everything is a "product". The only thing I could find was this, but I couldn't find any documentation on it. I feel like this is a dead end.

SonarQube has an analyzer for COBOL called SonarCOBOL, but it is only available in the enterprise edition. The link you provided for the open source SonarGraph components also require SonarGraph which is also a commercial tool.

There is SonarQube community edition and SonarGraph explorer which are free and open source. Should I explore these? I don't know enough about them to know if they can even easily be integrated.

While doing my research, I found Yasca which has a "COBOL analyzer". Yasca is a depreciated open source project. It had this analyzer, which if I understand correctly only does one thing - Counts the number of getmains and freemains and sees if they're equal. I don't know enough about COBOL to understand what these are though, but IMO I don't think these produce enough data?

How do I proceed from here?

@apoorvaanand1998
Copy link

@valeriocos Ping. I'm really stuck. Can you point me in the right direction?

@valeriocos
Copy link
Member

Sorry @apoorvaanand1998 for the late reply. What do you think about improving the support for Java projects in Graal ?

A dependency analyzer for Java projects using maven or gradle could be a nice addition. Another option is to look on Internet for open source tools tailored to Java (e.g, https://devua.co/2017/07/19/java-code-quality-tools/?i=1) and select one to be included in graal.

@apoorvaanand1998
Copy link

I shall check these out @valeriocos, thank you. I was also thinking of translating the yasca analyzer for Cobol to python, and sending a PR. At least this way we can get started with a COBOL analyzer. Does this sound like a good idea?

@valeriocos
Copy link
Member

You're welcome @apoorvaanand1998 .

I was also thinking of translating the yasca analyzer for Cobol to python, and sending a PR

It sounds like too much work. Probably the idea of providing support for COBOL wasn't good, as you said there is almost nothing outside to be plugged into Graal. Maybe it is better to focus on other languages, more popular and with more available analyzers. What do you think ?

@apoorvaanand1998
Copy link

I agree @valeriocos, I shall get started on my research and when I have a clear idea, I'll open an issue for more specific discussion. Is that okay?

@valeriocos
Copy link
Member

that's perfect @apoorvaanand1998 , thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants