Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#944] Add originality threshold flag #2122

Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
075d34d
[#2027] Fix date range bug (#2034)
jq1836 Sep 24, 2023
0c80ee6
[#2039] Update cypress minimum requirement to 12.15.0 (#2041)
chan-j-d Sep 30, 2023
46409aa
[#1936] Migrate c-segment.vue to typescript (#2035)
jq1836 Sep 30, 2023
43828fd
[#1936] Migrate load-font-awesome-icons.js to typescript (#2040)
jq1836 Sep 30, 2023
d4e2272
[#2045] Fix cypress zoom feature test (#2047)
jq1836 Oct 4, 2023
a8c3f00
[#1936] Migrate random-color-gen.js to typescript (#2043)
jq1836 Oct 4, 2023
93e850f
[#1936] Migrate c-segment-collection.vue to typescript (#2036)
jq1836 Oct 4, 2023
0d1cd99
[#1936] Migrate c-resizer.vue to typescript (#2038)
jq1836 Oct 4, 2023
6292688
Bump zod from 3.20.6 to 3.22.3 in /frontend (#2048)
dependabot[bot] Oct 4, 2023
7bc056a
Bump @cypress/request and cypress in /frontend/cypress (#2042)
dependabot[bot] Oct 4, 2023
0c4045d
[#1936] Migrate c-ramp.vue to typescript (#2037)
jq1836 Oct 11, 2023
174ecc5
Merge branch 'master' into 944-analyze-authorship
SkyBlaise99 Oct 20, 2023
e200e5e
Give partial credit if annotated author is not the same as the blame
SkyBlaise99 Oct 20, 2023
00cf40d
[#2054] Fix zoom view bug (#2055)
jq1836 Oct 28, 2023
4dae85d
[#1936] Migrate repo-sorter.js to typescript (#2052)
jq1836 Oct 28, 2023
7450425
[#1936] Migrate safari_date.js to typescript (#2053)
jq1836 Oct 28, 2023
056fa5f
Remove frontend JS lint (#2063)
jq1836 Oct 28, 2023
bdeb15a
use full and partial credit color
SkyBlaise99 Oct 29, 2023
54596ed
[#1929] Add dynamic positioning support for tooltips (#2056)
pratham31012002 Nov 7, 2023
a187d9c
Add test cases for annotated author overriding last author's credit
SkyBlaise99 Nov 7, 2023
58b7002
Merge branch 'master' into 944-analyze-authorship
SkyBlaise99 Nov 7, 2023
b296b83
revert merge from master
SkyBlaise99 Nov 7, 2023
4ce6545
revert merge from master 58b70025
SkyBlaise99 Nov 7, 2023
f29dc16
[#1928] Fix tooltip zIndex such that it doesn't occlude next file tit…
pratham31012002 Nov 8, 2023
e42c14e
[#1726] Update GitHub-specific references in codebase and docs (#2050)
chan-j-d Nov 8, 2023
4bd05a7
Trigger workflow
SkyBlaise99 Nov 8, 2023
950c912
Merge branch 'master' into 944-analyze-authorship
SkyBlaise99 Nov 8, 2023
a46d423
Revert "Merge branch 'master' into 944-analyze-authorship"
SkyBlaise99 Nov 8, 2023
bba556d
fix frontend test failing
SkyBlaise99 Nov 8, 2023
4d7d3aa
Merge branch '944-analyze-authorship' into 944-analyze-authorship
SkyBlaise99 Nov 12, 2023
1b25572
Merge branch '944-swap-color' into 944-analyze-authorship
SkyBlaise99 Nov 12, 2023
9e93961
Merge branch 'reposense:944-analyze-authorship' into 944-analyze-auth…
SkyBlaise99 Nov 12, 2023
896c55a
Merge branch 'reposense:944-analyze-authorship' into 944-analyze-auth…
SkyBlaise99 Jan 8, 2024
39c8058
Merge branch 'reposense:944-analyze-authorship' into 944-analyze-auth…
SkyBlaise99 Jan 31, 2024
a621357
add originality threshold flag
SkyBlaise99 Feb 18, 2024
25ef8ca
pass originality threshold param down
SkyBlaise99 Feb 18, 2024
6e33b3d
update analyzeAuthorship to use input originalityThreshold
SkyBlaise99 Feb 18, 2024
ffbea17
add test cases for OriginalityThresholdArgumentType
SkyBlaise99 Mar 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/main/java/reposense/RepoSense.java
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ public static void main(String[] args) {
cliArguments.isSinceDateProvided(), cliArguments.isUntilDateProvided(),
cliArguments.getNumCloningThreads(), cliArguments.getNumAnalysisThreads(),
TimeUtil::getElapsedTime, cliArguments.getZoneId(), cliArguments.isFreshClonePerformed(),
cliArguments.isAuthorshipAnalyzed());
cliArguments.isAuthorshipAnalyzed(), cliArguments.getOriginalityThreshold());

FileUtil.zipFoldersAndFiles(reportFoldersAndFiles, cliArguments.getOutputFilePath().toAbsolutePath(),
".json");
Expand Down
9 changes: 6 additions & 3 deletions src/main/java/reposense/authorship/AuthorshipReporter.java
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,11 @@ public class AuthorshipReporter {

/**
* Generates and returns the authorship summary for each repo in {@code config}.
* Further analyzes the authorship of each line in the commit if {@code shouldAnalyzeAuthorship} is true.
* Further analyzes the authorship of each line in the commit if {@code shouldAnalyzeAuthorship} is true, based on
* {code originalityThreshold}.
*/
public AuthorshipSummary generateAuthorshipSummary(RepoConfiguration config, boolean shouldAnalyzeAuthorship) {
public AuthorshipSummary generateAuthorshipSummary(RepoConfiguration config, boolean shouldAnalyzeAuthorship,
double originalityThreshold) {
List<FileInfo> textFileInfos = fileInfoExtractor.extractTextFileInfos(config);

int numFiles = textFileInfos.size();
Expand All @@ -45,7 +47,8 @@ public AuthorshipSummary generateAuthorshipSummary(RepoConfiguration config, boo
}

List<FileResult> fileResults = textFileInfos.stream()
.map(fileInfo -> fileInfoAnalyzer.analyzeTextFile(config, fileInfo, shouldAnalyzeAuthorship))
.map(fileInfo -> fileInfoAnalyzer.analyzeTextFile(config, fileInfo, shouldAnalyzeAuthorship,
originalityThreshold))
.filter(Objects::nonNull)
.collect(Collectors.toList());

Expand Down
19 changes: 12 additions & 7 deletions src/main/java/reposense/authorship/FileInfoAnalyzer.java
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
package reposense.authorship;

import static reposense.parser.ArgsParser.DEFAULT_ORIGINALITY_THRESHOLD;

import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
Expand Down Expand Up @@ -45,11 +47,13 @@ public class FileInfoAnalyzer {
/**
* Analyzes the lines of the file, given in the {@code fileInfo}, that has changed in the time period provided
* by {@code config}.
* Further analyzes the authorship of each line in the commit if {@code shouldAnalyzeAuthorship} is true.
* Further analyzes the authorship of each line in the commit if {@code shouldAnalyzeAuthorship} is true, based on
* {@code originalityThreshold}.
* Returns null if the file is missing from the local system, or none of the
* {@link Author} specified in {@code config} contributed to the file in {@code fileInfo}.
*/
public FileResult analyzeTextFile(RepoConfiguration config, FileInfo fileInfo, boolean shouldAnalyzeAuthorship) {
public FileResult analyzeTextFile(RepoConfiguration config, FileInfo fileInfo, boolean shouldAnalyzeAuthorship,
double originalityThreshold) {
String relativePath = fileInfo.getPath();

if (Files.notExists(Paths.get(config.getRepoRoot(), relativePath))) {
Expand All @@ -61,7 +65,7 @@ public FileResult analyzeTextFile(RepoConfiguration config, FileInfo fileInfo, b
return null;
}

aggregateBlameAuthorModifiedAndDateInfo(config, fileInfo, shouldAnalyzeAuthorship);
aggregateBlameAuthorModifiedAndDateInfo(config, fileInfo, shouldAnalyzeAuthorship, originalityThreshold);
fileInfo.setFileType(config.getFileType(fileInfo.getPath()));

AnnotatorAnalyzer.aggregateAnnotationAuthorInfo(fileInfo, config.getAuthorConfig(), shouldAnalyzeAuthorship);
Expand All @@ -83,7 +87,7 @@ public FileResult analyzeTextFile(RepoConfiguration config, FileInfo fileInfo, b
* {@link Author} specified in {@code config} contributed to the file in {@code fileInfo}.
*/
public FileResult analyzeTextFile(RepoConfiguration config, FileInfo fileInfo) {
return analyzeTextFile(config, fileInfo, false);
return analyzeTextFile(config, fileInfo, false, DEFAULT_ORIGINALITY_THRESHOLD);
}

/**
Expand Down Expand Up @@ -153,10 +157,11 @@ private FileResult generateBinaryFileResult(RepoConfiguration config, FileInfo f
* The {@code config} is used to obtain the root directory for running git blame as well as other parameters used
* in determining which author to assign to each line and whether to set the last modified date for a
* {@code lineInfo}.
* Further analyzes the authorship of each line in the commit if {@code shouldAnalyzeAuthorship} is true.
* Further analyzes the authorship of each line in the commit if {@code shouldAnalyzeAuthorship} is true, based on
* {@code originalityThreshold}.
*/
private void aggregateBlameAuthorModifiedAndDateInfo(RepoConfiguration config, FileInfo fileInfo,
boolean shouldAnalyzeAuthorship) {
boolean shouldAnalyzeAuthorship, double originalityThreshold) {
String blameResults;

if (!config.isFindingPreviousAuthorsPerformed()) {
Expand Down Expand Up @@ -199,7 +204,7 @@ private void aggregateBlameAuthorModifiedAndDateInfo(RepoConfiguration config, F
if (shouldAnalyzeAuthorship && !author.equals(Author.UNKNOWN_AUTHOR)) {
String lineContent = fileInfo.getLine(lineCount / 5 + 1).getContent();
boolean isFullCredit = AuthorshipAnalyzer.analyzeAuthorship(config, fileInfo.getPath(), lineContent,
commitHash, author);
commitHash, author, originalityThreshold);
fileInfo.setIsFullCredit(lineCount / 5, isFullCredit);
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,6 @@
*/
public class AuthorshipAnalyzer {
private static final Logger logger = LogsManager.getLogger(AuthorshipAnalyzer.class);

private static final double ORIGINALITY_THRESHOLD = 0.51;

private static final String DIFF_FILE_CHUNK_SEPARATOR = "\ndiff --git a/.*\n";
private static final Pattern FILE_CHANGED_PATTERN =
Pattern.compile("\n(-){3} a?/(?<preImageFilePath>.*)\n(\\+){3} b?/(?<postImageFilePath>.*)\n");
Expand All @@ -45,11 +42,11 @@ public class AuthorshipAnalyzer {
private static final String DELETED_LINE_SYMBOL = "-";

/**
* Analyzes the authorship of {@code lineContent} in {@code filePath}.
* Analyzes the authorship of {@code lineContent} in {@code filePath} based on {@code originalityThreshold}.
* Returns {@code true} if {@code currentAuthor} should be assigned full credit, {@code false} otherwise.
*/
public static boolean analyzeAuthorship(RepoConfiguration config, String filePath, String lineContent,
String commitHash, Author currentAuthor) {
String commitHash, Author currentAuthor, double originalityThreshold) {
// Empty lines are ignored and given full credit
if (lineContent.isEmpty()) {
return true;
Expand All @@ -58,7 +55,7 @@ public static boolean analyzeAuthorship(RepoConfiguration config, String filePat
CandidateLine deletedLine = getDeletedLineWithLowestOriginality(config, filePath, lineContent, commitHash);

// Give full credit if there are no deleted lines found or deleted line is more than originality threshold
if (deletedLine == null || deletedLine.getOriginalityScore() > ORIGINALITY_THRESHOLD) {
if (deletedLine == null || deletedLine.getOriginalityScore() > originalityThreshold) {
return true;
}

Expand All @@ -80,7 +77,7 @@ public static boolean analyzeAuthorship(RepoConfiguration config, String filePat

// Check the previous version as currentAuthor is the same as author of the previous version
return analyzeAuthorship(config, deletedLine.getFilePath(), deletedLine.getLineContent(),
deletedLineInfo.getCommitHash(), deletedLineInfo.getAuthor());
deletedLineInfo.getCommitHash(), deletedLineInfo.getAuthor(), originalityThreshold);
}

/**
Expand Down
20 changes: 19 additions & 1 deletion src/main/java/reposense/model/CliArguments.java
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ public class CliArguments {
private final ZoneId zoneId;
private final boolean isFindingPreviousAuthorsPerformed;
private final boolean isAuthorshipAnalyzed;
private final double originalityThreshold;

private boolean isTestMode = ArgsParser.DEFAULT_IS_TEST_MODE;
private boolean isFreshClonePerformed = ArgsParser.DEFAULT_SHOULD_FRESH_CLONE;
Expand Down Expand Up @@ -81,6 +82,7 @@ private CliArguments(Builder builder) {
this.reportConfigFilePath = builder.reportConfigFilePath;
this.reportConfiguration = builder.reportConfiguration;
this.isAuthorshipAnalyzed = builder.isAuthorshipAnalyzed;
this.originalityThreshold = builder.originalityThreshold;
}

public ZoneId getZoneId() {
Expand Down Expand Up @@ -195,6 +197,10 @@ public boolean isAuthorshipAnalyzed() {
return isAuthorshipAnalyzed;
}

public double getOriginalityThreshold() {
return originalityThreshold;
}

@Override
public boolean equals(Object other) {
// short circuit if same object
Expand Down Expand Up @@ -233,7 +239,8 @@ public boolean equals(Object other) {
&& Objects.equals(this.authorConfigFilePath, otherCliArguments.authorConfigFilePath)
&& Objects.equals(this.groupConfigFilePath, otherCliArguments.groupConfigFilePath)
&& Objects.equals(this.reportConfigFilePath, otherCliArguments.reportConfigFilePath)
&& this.isAuthorshipAnalyzed == otherCliArguments.isAuthorshipAnalyzed;
&& this.isAuthorshipAnalyzed == otherCliArguments.isAuthorshipAnalyzed
&& Objects.equals(this.originalityThreshold, otherCliArguments.originalityThreshold);
}

/**
Expand Down Expand Up @@ -268,6 +275,7 @@ public static final class Builder {
private Path reportConfigFilePath;
private ReportConfiguration reportConfiguration;
private boolean isAuthorshipAnalyzed;
private double originalityThreshold;

public Builder() {
}
Expand Down Expand Up @@ -520,6 +528,16 @@ public Builder isAuthorshipAnalyzed(boolean isAuthorshipAnalyzed) {
return this;
}

/**
* Adds the {@code originalityThreshold} to CliArguments.
*
* @param originalityThreshold the originality threshold.
*/
public Builder originalityThreshold(double originalityThreshold) {
this.originalityThreshold = originalityThreshold;
return this;
}

/**
* Builds CliArguments.
*
Expand Down
13 changes: 12 additions & 1 deletion src/main/java/reposense/parser/ArgsParser.java
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ public class ArgsParser {
public static final int DEFAULT_NUM_ANALYSIS_THREADS = Runtime.getRuntime().availableProcessors();
public static final boolean DEFAULT_IS_TEST_MODE = false;
public static final boolean DEFAULT_SHOULD_FRESH_CLONE = false;
public static final double DEFAULT_ORIGINALITY_THRESHOLD = 0.51;

public static final String[] HELP_FLAGS = new String[] {"--help", "-h"};
public static final String[] CONFIG_FLAGS = new String[] {"--config", "-c"};
Expand All @@ -63,6 +64,7 @@ public class ArgsParser {
public static final String[] TEST_MODE_FLAG = new String[] {"--test-mode"};
public static final String[] FRESH_CLONING_FLAG = new String[] {"--fresh-cloning"};
public static final String[] ANALYZE_AUTHORSHIP_FLAGS = new String[] {"--analyze-authorship", "-A"};
public static final String[] ORIGINALITY_THRESHOLD_FLAGS = new String[] {"--originality-threshold", "-ot"};

private static final Logger logger = LogsManager.getLogger(ArgsParser.class);

Expand Down Expand Up @@ -201,6 +203,13 @@ private static ArgumentParser getArgumentParser() {
.action(Arguments.storeTrue())
.help("A flag to perform analysis of code authorship.");

parser.addArgument(ORIGINALITY_THRESHOLD_FLAGS)
.dest(ORIGINALITY_THRESHOLD_FLAGS[0])
.metavar("(0.0 ~ 1.0)")
.type(new OriginalityThresholdArgumentType())
.setDefault(DEFAULT_ORIGINALITY_THRESHOLD)
.help("The originality threshold for analysis of code authorship.");

// Mutex flags - these will always be the last parameters in help message.
mutexParser.addArgument(CONFIG_FLAGS)
.dest(CONFIG_FLAGS[0])
Expand Down Expand Up @@ -280,6 +289,7 @@ public static CliArguments parse(String[] args) throws HelpScreenException, Pars
boolean shouldFindPreviousAuthors = results.get(FIND_PREVIOUS_AUTHORS_FLAGS[0]);
boolean isTestMode = results.get(TEST_MODE_FLAG[0]);
boolean isAuthorshipAnalyzed = results.get(ANALYZE_AUTHORSHIP_FLAGS[0]);
double originalityThreshold = results.get(ORIGINALITY_THRESHOLD_FLAGS[0]);
int numCloningThreads = results.get(CLONING_THREADS_FLAG[0]);
int numAnalysisThreads = results.get(ANALYSIS_THREADS_FLAG[0]);

Expand All @@ -299,7 +309,8 @@ public static CliArguments parse(String[] args) throws HelpScreenException, Pars
.numCloningThreads(numCloningThreads)
.numAnalysisThreads(numAnalysisThreads)
.isTestMode(isTestMode)
.isAuthorshipAnalyzed(isAuthorshipAnalyzed);
.isAuthorshipAnalyzed(isAuthorshipAnalyzed)
.originalityThreshold(originalityThreshold);

LogsManager.setLogFolderLocation(outputFolderPath);

Expand Down
SkyBlaise99 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
package reposense.parser;

import net.sourceforge.argparse4j.inf.Argument;
import net.sourceforge.argparse4j.inf.ArgumentParser;
import net.sourceforge.argparse4j.inf.ArgumentParserException;
import net.sourceforge.argparse4j.inf.ArgumentType;

/**
* Verifies and parses a string-formatted double, between 0.0 and 1.0, to an {@link Double} object.
*/
public class OriginalityThresholdArgumentType implements ArgumentType<Double> {
private static final String PARSE_EXCEPTION_MESSAGE_THRESHOLD_OUT_OF_BOUND =
"Invalid threshold. It must be a number between 0.0 and 1.0.";

@Override
public Double convert(ArgumentParser parser, Argument arg, String value) throws ArgumentParserException {
double threshold = Double.parseDouble(value);

Check warning on line 17 in src/main/java/reposense/parser/OriginalityThresholdArgumentType.java

View check run for this annotation

Codecov / codecov/patch

src/main/java/reposense/parser/OriginalityThresholdArgumentType.java#L17

Added line #L17 was not covered by tests

if (Double.compare(threshold, 0.0) < 0 || Double.compare(threshold, 1.0) > 0) {
throw new ArgumentParserException(PARSE_EXCEPTION_MESSAGE_THRESHOLD_OUT_OF_BOUND, parser);

Check warning on line 20 in src/main/java/reposense/parser/OriginalityThresholdArgumentType.java

View check run for this annotation

Codecov / codecov/patch

src/main/java/reposense/parser/OriginalityThresholdArgumentType.java#L20

Added line #L20 was not covered by tests
}

return threshold;

Check warning on line 23 in src/main/java/reposense/parser/OriginalityThresholdArgumentType.java

View check run for this annotation

Codecov / codecov/patch

src/main/java/reposense/parser/OriginalityThresholdArgumentType.java#L23

Added line #L23 was not covered by tests
}
}
Loading
Loading