Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#944] Improve performance #2108

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
93c0f75
use AY2223S2 tp repo
SkyBlaise99 Sep 12, 2023
613faeb
override equals for candidate line
SkyBlaise99 Sep 12, 2023
f45f81f
prevent frontend from installing everytime
SkyBlaise99 Sep 12, 2023
7a04297
Merge branch '944-test-base' into 944-test-cache
SkyBlaise99 Sep 12, 2023
57018b2
prevent build frontend instead
SkyBlaise99 Sep 12, 2023
48d2c8f
Merge branch '944-test-base' into 944-test-cache
SkyBlaise99 Sep 12, 2023
075d34d
[#2027] Fix date range bug (#2034)
jq1836 Sep 24, 2023
0c80ee6
[#2039] Update cypress minimum requirement to 12.15.0 (#2041)
chan-j-d Sep 30, 2023
46409aa
[#1936] Migrate c-segment.vue to typescript (#2035)
jq1836 Sep 30, 2023
43828fd
[#1936] Migrate load-font-awesome-icons.js to typescript (#2040)
jq1836 Sep 30, 2023
595b6fd
cache git diff results
SkyBlaise99 Oct 3, 2023
65acc53
compute more then cache
SkyBlaise99 Oct 3, 2023
d4e2272
[#2045] Fix cypress zoom feature test (#2047)
jq1836 Oct 4, 2023
a8c3f00
[#1936] Migrate random-color-gen.js to typescript (#2043)
jq1836 Oct 4, 2023
93e850f
[#1936] Migrate c-segment-collection.vue to typescript (#2036)
jq1836 Oct 4, 2023
0d1cd99
[#1936] Migrate c-resizer.vue to typescript (#2038)
jq1836 Oct 4, 2023
6292688
Bump zod from 3.20.6 to 3.22.3 in /frontend (#2048)
dependabot[bot] Oct 4, 2023
7bc056a
Bump @cypress/request and cypress in /frontend/cypress (#2042)
dependabot[bot] Oct 4, 2023
7736d6f
fix bug in analysis
SkyBlaise99 Oct 8, 2023
6767123
rename key
SkyBlaise99 Oct 11, 2023
857201f
add cache for git log
SkyBlaise99 Oct 11, 2023
0c4045d
[#1936] Migrate c-ramp.vue to typescript (#2037)
jq1836 Oct 11, 2023
174ecc5
Merge branch 'master' into 944-analyze-authorship
SkyBlaise99 Oct 20, 2023
e200e5e
Give partial credit if annotated author is not the same as the blame
SkyBlaise99 Oct 20, 2023
65d8e99
Merge branch '944-analyze-authorship' into 944-test-cache-git-diff-v2…
SkyBlaise99 Oct 23, 2023
00cf40d
[#2054] Fix zoom view bug (#2055)
jq1836 Oct 28, 2023
4dae85d
[#1936] Migrate repo-sorter.js to typescript (#2052)
jq1836 Oct 28, 2023
7450425
[#1936] Migrate safari_date.js to typescript (#2053)
jq1836 Oct 28, 2023
056fa5f
Remove frontend JS lint (#2063)
jq1836 Oct 28, 2023
bdeb15a
use full and partial credit color
SkyBlaise99 Oct 29, 2023
1b51351
add SimilarityThresholdArgumentType
SkyBlaise99 Oct 29, 2023
2dc14b5
add SIMILARITY_THRESHOLD_FLAGS to ArgsParser
SkyBlaise99 Oct 29, 2023
396df49
pass similarity score down the chain
SkyBlaise99 Oct 29, 2023
e3ee0ed
fix test case
SkyBlaise99 Oct 29, 2023
c079709
Merge branch '944-similarity-threshold-flag' into 944-test-cache-git-…
SkyBlaise99 Oct 29, 2023
54596ed
[#1929] Add dynamic positioning support for tooltips (#2056)
pratham31012002 Nov 7, 2023
a187d9c
Add test cases for annotated author overriding last author's credit
SkyBlaise99 Nov 7, 2023
58b7002
Merge branch 'master' into 944-analyze-authorship
SkyBlaise99 Nov 7, 2023
b296b83
revert merge from master
SkyBlaise99 Nov 7, 2023
4ce6545
revert merge from master 58b70025
SkyBlaise99 Nov 7, 2023
f29dc16
[#1928] Fix tooltip zIndex such that it doesn't occlude next file tit…
pratham31012002 Nov 8, 2023
e42c14e
[#1726] Update GitHub-specific references in codebase and docs (#2050)
chan-j-d Nov 8, 2023
4bd05a7
Trigger workflow
SkyBlaise99 Nov 8, 2023
950c912
Merge branch 'master' into 944-analyze-authorship
SkyBlaise99 Nov 8, 2023
a46d423
Revert "Merge branch 'master' into 944-analyze-authorship"
SkyBlaise99 Nov 8, 2023
bba556d
fix frontend test failing
SkyBlaise99 Nov 8, 2023
4d7d3aa
Merge branch '944-analyze-authorship' into 944-analyze-authorship
SkyBlaise99 Nov 12, 2023
1b25572
Merge branch '944-swap-color' into 944-analyze-authorship
SkyBlaise99 Nov 12, 2023
9e93961
Merge branch 'reposense:944-analyze-authorship' into 944-analyze-auth…
SkyBlaise99 Nov 12, 2023
896c55a
Merge branch 'reposense:944-analyze-authorship' into 944-analyze-auth…
SkyBlaise99 Jan 8, 2024
39c8058
Merge branch 'reposense:944-analyze-authorship' into 944-analyze-auth…
SkyBlaise99 Jan 31, 2024
d217cbd
Add cache for git log and git diff
SkyBlaise99 Jan 8, 2024
3f933e4
reduce space complexity down to O(min(s, t))
SkyBlaise99 Feb 4, 2024
bbb2dc9
reduce time complexity
SkyBlaise99 Feb 5, 2024
ef2d67d
add early termination
SkyBlaise99 Feb 5, 2024
fb95942
early termination if limit is reached
SkyBlaise99 Feb 5, 2024
f1fb667
early termination
SkyBlaise99 Feb 5, 2024
8f682a8
add cache for git log and git diff
SkyBlaise99 Feb 6, 2024
5e7e1e6
reduce space complexity to O(min(s, t))
SkyBlaise99 Feb 6, 2024
11f16e2
add several early termination
SkyBlaise99 Feb 6, 2024
ba457b3
Merge branch '944-cache' into 944-improve-performance
SkyBlaise99 Feb 6, 2024
551f5d5
Merge branch '944-lev-dist' into 944-improve-performance
SkyBlaise99 Feb 6, 2024
9e9f961
fix checkstyle and update comments
SkyBlaise99 Feb 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 69 additions & 18 deletions src/main/java/reposense/authorship/analyzer/AuthorshipAnalyzer.java
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

import java.nio.file.Paths;
import java.time.ZonedDateTime;
import java.util.ArrayList;
import java.util.concurrent.ConcurrentHashMap;
import java.util.logging.Logger;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
Expand Down Expand Up @@ -44,6 +46,10 @@
private static final String ADDED_LINE_SYMBOL = "+";
private static final String DELETED_LINE_SYMBOL = "-";

private static final ConcurrentHashMap<String, String[]> GIT_LOG_CACHE = new ConcurrentHashMap<>();
private static final ConcurrentHashMap<String, ArrayList<ArrayList<String>>> GIT_DIFF_CACHE =
new ConcurrentHashMap<>();

/**
* Analyzes the authorship of {@code lineContent} in {@code filePath}.
* Returns {@code true} if {@code currentAuthor} should be assigned full credit, {@code false} otherwise.
Expand Down Expand Up @@ -88,32 +94,73 @@
*/
private static CandidateLine getDeletedLineWithLowestOriginality(RepoConfiguration config, String filePath,
String lineContent, String commitHash) {
String gitLogResults = GitLog.getParentCommits(config.getRepoRoot(), commitHash);
String[] parentCommits = gitLogResults.split(" ");

CandidateLine lowestOriginalityLine = null;

String gitLogCacheKey = config.getRepoRoot() + commitHash;
String[] parentCommits;
if (GIT_LOG_CACHE.containsKey(gitLogCacheKey)) {
parentCommits = GIT_LOG_CACHE.get(gitLogCacheKey);
} else {
String gitLogResults = GitLog.getParentCommits(config.getRepoRoot(), commitHash);
parentCommits = gitLogResults.split(" ");
GIT_LOG_CACHE.put(gitLogCacheKey, parentCommits);
}

for (String parentCommit : parentCommits) {
// Generate diff between commit and parent commit
String gitDiffResult = GitDiff.diffCommits(config.getRepoRoot(), parentCommit, commitHash);
String[] fileDiffResultList = gitDiffResult.split(DIFF_FILE_CHUNK_SEPARATOR);
String gitDiffCacheKey = config.getRepoRoot() + parentCommit + commitHash;
ArrayList<String> fileDiffResultList;
ArrayList<String> preImageFilePathList;
ArrayList<String> postImageFilePathList;

if (GIT_DIFF_CACHE.containsKey(gitDiffCacheKey)) {
ArrayList<ArrayList<String>> cacheResults = GIT_DIFF_CACHE.get(gitDiffCacheKey);
fileDiffResultList = cacheResults.get(0);
preImageFilePathList = cacheResults.get(1);
postImageFilePathList = cacheResults.get(2);
} else {
fileDiffResultList = new ArrayList<>();
preImageFilePathList = new ArrayList<>();
postImageFilePathList = new ArrayList<>();

// Generate diff between commit and parent commit
String gitDiffResult = GitDiff.diffCommits(config.getRepoRoot(), parentCommit, commitHash);
String[] fileDiffResults = gitDiffResult.split(DIFF_FILE_CHUNK_SEPARATOR);

for (String fileDiffResult : fileDiffResults) {
Matcher filePathMatcher = FILE_CHANGED_PATTERN.matcher(fileDiffResult);
if (!filePathMatcher.find()) {
continue;

Check warning on line 132 in src/main/java/reposense/authorship/analyzer/AuthorshipAnalyzer.java

View check run for this annotation

Codecov / codecov/patch

src/main/java/reposense/authorship/analyzer/AuthorshipAnalyzer.java#L132

Added line #L132 was not covered by tests
}

for (String fileDiffResult : fileDiffResultList) {
Matcher filePathMatcher = FILE_CHANGED_PATTERN.matcher(fileDiffResult);
if (!filePathMatcher.find()) {
continue;
// If file was added in the commit
String preImageFilePath = filePathMatcher.group(PRE_IMAGE_FILE_PATH_GROUP_NAME);
if (preImageFilePath.equals(FILE_ADDED_SYMBOL)) {
continue;
}

String postImageFilePath = filePathMatcher.group(POST_IMAGE_FILE_PATH_GROUP_NAME);

fileDiffResultList.add(fileDiffResult);
preImageFilePathList.add(preImageFilePath);
postImageFilePathList.add(postImageFilePath);
}

String preImageFilePath = filePathMatcher.group(PRE_IMAGE_FILE_PATH_GROUP_NAME);
String postImageFilePath = filePathMatcher.group(POST_IMAGE_FILE_PATH_GROUP_NAME);
ArrayList<ArrayList<String>> cacheResults = new ArrayList<>();
cacheResults.add(fileDiffResultList);
cacheResults.add(preImageFilePathList);
cacheResults.add(postImageFilePathList);

GIT_DIFF_CACHE.put(gitDiffCacheKey, cacheResults);
}

// If file was added in the commit or file name does not match
if (preImageFilePath.equals(FILE_ADDED_SYMBOL) || !postImageFilePath.equals(filePath)) {
for (int i = 0; i < fileDiffResultList.size(); i++) {
// If file name does not match
if (!postImageFilePathList.get(i).equals(filePath)) {
continue;
}

CandidateLine candidateLine = getDeletedLineWithLowestOriginalityInDiff(
fileDiffResult, lineContent, parentCommit, preImageFilePath);
fileDiffResultList.get(i), lineContent, parentCommit, preImageFilePathList.get(i));
if (candidateLine == null) {
continue;
}
Expand Down Expand Up @@ -155,7 +202,11 @@

if (lineChanged.startsWith(DELETED_LINE_SYMBOL)) {
String deletedLineContent = lineChanged.substring(DELETED_LINE_SYMBOL.length());
double originalityScore = computeOriginalityScore(lineContent, deletedLineContent);
double lowestOriginalityScore = lowestOriginalityLine == null
? Integer.MAX_VALUE
: lowestOriginalityLine.getOriginalityScore();
double originalityScore = computeOriginalityScore(lineContent, deletedLineContent,
lowestOriginalityScore);

if (lowestOriginalityLine == null
|| originalityScore < lowestOriginalityLine.getOriginalityScore()) {
Expand Down Expand Up @@ -195,8 +246,8 @@
/**
* Calculates the originality score of {@code s} with {@code baseString}.
*/
private static double computeOriginalityScore(String s, String baseString) {
double levenshteinDistance = StringsUtil.getLevenshteinDistance(s, baseString);
private static double computeOriginalityScore(String s, String baseString, double limit) {
double levenshteinDistance = StringsUtil.getLevenshteinDistance(s, baseString, limit * baseString.length());
return levenshteinDistance / baseString.length();
}

Expand Down
67 changes: 52 additions & 15 deletions src/main/java/reposense/util/StringsUtil.java
Original file line number Diff line number Diff line change
Expand Up @@ -94,35 +94,72 @@ public static boolean isNumeric(String string) {

/**
* Calculates the Levenshtein Distance between two strings using Dynamic Programming.
* Insertion, deletion, and substitution are all of cost 1.
* This version improves the space complexity down to O(min(s, t))
* <p></p>
* The dp will stop if the {@code limit} is reached, this means that if the final distance is 7 and the limit is set
* to 3, the algorithm ends early once it reaches 3. This is possible as we are using this method to find the string
* with the lowest Levenshtein distance.
* <p></p>
* Returns {@code Integer.MAX_VALUE} if limit is reached, else returns the computed Levenshtein distance.
*/
public static int getLevenshteinDistance(String s, String t) {
// dp[i][j] stores the distance between s.substring(0, i) and t.substring(0, j) -> distance(s[:i], t[:j])
int[][] dp = new int[s.length() + 1][t.length() + 1];
public static int getLevenshteinDistance(String s, String t, double limit) {
// Early termination if either string is empty, lev dist is just the length of the other string.
if (s.isEmpty()) {
return t.length();
}

if (t.isEmpty()) {
return s.length();
}

// Distance between a string and an empty string is the length of the string
for (int i = 0; i <= s.length(); i++) {
dp[i][0] = i;
// The final lev dist is at least k where k = difference in length = number of insert/delete.
if (Math.abs(s.length() - t.length()) >= limit) {
return Integer.MAX_VALUE;
}

if (s.length() < t.length()) {
// Swap s and t to ensure s is always the longer string
String temp = s;
s = t;
t = temp;
}

int[] dp = new int[t.length() + 1];
for (int i = 0; i <= t.length(); i++) {
dp[0][i] = i;
dp[i] = i;
}

for (int i = 1; i <= s.length(); i++) {
// Store the value of the previous row's column
int prev = dp[0];
dp[0] = i;

// If for this row, all the values are at least k, then the final lev dist computed will also be at least k.
// hasLower will check for values smaller than the limit, and terminate early if limit is reached.
boolean hasLower = false;

for (int j = 1; j <= t.length(); j++) {
// If s[i-1] and t[j-1] are equal, distance(s[:i], t[:j]) equals to distance(s[:i-1], t[:j-1])
int temp = dp[j];

if (s.charAt(i - 1) == t.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1];
dp[j] = prev;
} else {
// distance(s[:i], t[:j]) is the minimum of:
// 1) distance(s[:i-1], t[:j]) + 1 -> add s[i]
// 2) distance(s[:i], t[:j-1]) + 1 -> add t[j]
// 3) distance(s[:i-1], t[:j-1]) + 1 -> substitute s[i] with t[j]
dp[i][j] = Math.min(dp[i - 1][j], Math.min(dp[i][j - 1], dp[i - 1][j - 1])) + 1;
dp[j] = Math.min(prev, Math.min(dp[j - 1], dp[j])) + 1;
}

prev = temp;

if (dp[j] < limit) {
hasLower = true;
}
}

if (!hasLower) {
return Integer.MAX_VALUE;
}
}

return dp[s.length()][t.length()];
return dp[t.length()];
}
}
30 changes: 23 additions & 7 deletions src/test/java/reposense/util/StringsUtilTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -91,36 +91,52 @@ public void addQuotesForFilePath_specialBashCharacters_success() {

@Test
public void getLevenshteinDistance_success() {
Assertions.assertEquals(4, StringsUtil.getLevenshteinDistance("potato", "tomatoes"));
Assertions.assertEquals(4, StringsUtil.getLevenshteinDistance("potato", "tomatoes", Integer.MAX_VALUE));
}

@Test
public void getLevenshteinDistance_insertion_success() {
Assertions.assertEquals(2, StringsUtil.getLevenshteinDistance("abcd", "abcdef"));
Assertions.assertEquals(2, StringsUtil.getLevenshteinDistance("abcd", "abcdef", Integer.MAX_VALUE));
}

@Test
public void getLevenshteinDistance_deletion_success() {
Assertions.assertEquals(3, StringsUtil.getLevenshteinDistance("abcde", "ab"));
Assertions.assertEquals(3, StringsUtil.getLevenshteinDistance("abcde", "ab", Integer.MAX_VALUE));
}

@Test
public void getLevenshteinDistance_substitution_success() {
Assertions.assertEquals(4, StringsUtil.getLevenshteinDistance("xxxxefg", "abcdefg"));
Assertions.assertEquals(4, StringsUtil.getLevenshteinDistance("xxxxefg", "abcdefg", Integer.MAX_VALUE));
}

@Test
public void getLevenshteinDistance_identicalStrings_success() {
Assertions.assertEquals(0, StringsUtil.getLevenshteinDistance("abcdefg", "abcdefg"));
Assertions.assertEquals(0, StringsUtil.getLevenshteinDistance("abcdefg", "abcdefg", Integer.MAX_VALUE));
}

@Test
public void getLevenshteinDistance_emptyStrings_success() {
Assertions.assertEquals(0, StringsUtil.getLevenshteinDistance("", ""));
Assertions.assertEquals(0, StringsUtil.getLevenshteinDistance("", "", Integer.MAX_VALUE));
}

@Test
public void getLevenshteinDistance_emptyString_success() {
Assertions.assertEquals(6, StringsUtil.getLevenshteinDistance("abcdef", ""));
Assertions.assertEquals(6, StringsUtil.getLevenshteinDistance("abcdef", "", Integer.MAX_VALUE));
}

@Test
public void getLevenshteinDistance_belowLimit_success() {
Assertions.assertEquals(4, StringsUtil.getLevenshteinDistance("xxxxefg", "abcdefg", 4.0001));
Assertions.assertEquals(4, StringsUtil.getLevenshteinDistance("xxxxefg", "abcdefg", 123.456));
Assertions.assertEquals(4, StringsUtil.getLevenshteinDistance("xxxxefg", "abcdefg", Integer.MAX_VALUE));
}

@Test
public void getLevenshteinDistance_exceedLimit_success() {
Assertions.assertEquals(Integer.MAX_VALUE, StringsUtil.getLevenshteinDistance("xxxxefg", "abcdefg", 4.000));
Assertions.assertEquals(Integer.MAX_VALUE, StringsUtil.getLevenshteinDistance("xxxxefg", "abcdefg", 3.99999));
Assertions.assertEquals(Integer.MAX_VALUE, StringsUtil.getLevenshteinDistance("xxxxefg", "abcdefg", 0.89014));

Assertions.assertEquals(Integer.MAX_VALUE, StringsUtil.getLevenshteinDistance("a", "1234567", 2.0));
}
}
Loading