Skip to content

Official data sources for the Quality Attributes project

Notifications You must be signed in to change notification settings

quality-attributes/datasets

Repository files navigation

Databases

Official data sources for the Quality Attributes project to train, test and validate if Non-Functional Requirements related to Quality Attributes can be found on GitHub Issues reports.

Training Set

On December 14th, 2019 the site http://ctp.di.fct.unl.pt/RE2017/pages/submission/data_papers/ was visited to get the PROMISE dataset, included as part of a data challenge.

Sayyad Shirabad, J. and Menzies, T.J. (2005) The PROMISE Repository of Software Engineering Databases. School of Information Technology and Engineering, University of Ottawa, Canada. Available: http://promise.site.uottawa.ca/SERepository

The non-functional requirements' labels in this dataset (involving 15 different projects) are distributed as follows:

Class Quantity Percentage
Funcional (F) 255 40.80%
Availability (A) 21 3.36%
Fault Tolerance (FT) 10 1.60%
Legal (L) 13 2.08%
Look & Feel (LF) 38 6.08%
Maintainability (MN) 17 2.72%
Operational (O) 62 9.92%
Performance (PE) 54 8.64%
Portability (PO) 1 0.16%
Scalability (SC) 21 3.36%
Security (SE) 66 10.56%
Usability (US) 67 10.72%
Total 625 100%

For the purposes of this study, only a subset of this dataset was considered, as part of the quality attributes categories and due to imbalanced classes:

Class Quantity Percentage
Availability (A) 21 8.20%
Fault Tolerance (FT) 10 3.91%
Maintainability (MN) 17 6.64%
Performance (PE) 54 21.09%
Scalability (SC) 21 8.21%
Security (SE) 66 25.78%
Usability (US) 67 26.17%
Total 256 100%

Test Set

Based upon the book:

Miller, Roxanne E., 2009, The Quest for Software Requirements, MavenMark Books, Milwaukee, WI

40 different non-functional requirements associated to quality attributes where collected. From the following categories (matching the ones included in the training).

  • Access Security
  • Availability
  • Usability
  • Maintainability
  • Scalability

Validation Set

According to the State of the Octoverse in 2019, the most contributed open source project at GitHub were as follows:

Place Repository Contributors
01 microsoft/vscode 19.1k
02 MicrosoftDocs/azure-docs 14k
03 flutter/flutter 13k
04 firstcontributions/first-contributions 11.6k
05 tensorflow/tensorflow 9.9k
06 facebook/react-native 9.1k
07 kubernetes/kubernetes 6.9k
08 DefinitelyTyped/DefinitelyTyped 6.9k
09 ansible/ansible 6.8k
10 home-assistant/home-assistant 6.3k

The repositories selected describe different software systems, excluding documentations and projects with the same scope (i.e. flutter and react-native. Data collected using quality-attributes/issue-collector for the following repositories:

  1. microsoft/vscode
  2. flutter/flutter
  3. tensorflow/tensorflow
  4. kubernetes/kubernetes
  5. ansible/ansible

Note: Only the latest 100 issues (as of 02/20/2020) for each repository were collected, due to GitHub's API v4 limitations