Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#2091] Minor Enhancements to Existing Regex Code #2115

Merged
merged 6 commits into from
Mar 26, 2024

Conversation

asdfghjkxd
Copy link
Contributor

[#2091] Suggestions on improvement for memory performance regarding Regex matching

Proposed commit message

From the previous issue, it was discovered that some areas of the code
used regex matching and splitting during an iteration. It is not
recommended to do so due to the overhead incurred with the creation of
`Matcher` objects during the match or split process.

Let's move to change the code such that we avoid such patterns within
the codebase.

Other information

This was something that was missed out during the previous PR that aimed to resolve issue #2091.

@asdfghjkxd asdfghjkxd requested a review from a team February 16, 2024 12:39
Copy link
Contributor

@sopa301 sopa301 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! It seems that many other places in the code use matchers in split function calls as well. Would it be worth creating a matcher in StringsUtil for all these other split calls?

@sopa301 sopa301 requested a review from a team February 16, 2024 16:14
@asdfghjkxd
Copy link
Contributor Author

@sopa301 It does seem worthwhile to consolidate common Regex patterns into one single class to avoid creating multiple Pattern/Matcher objects. I will look into doing that!

@gok99
Copy link
Contributor

gok99 commented Feb 19, 2024

It does seem worthwhile to consolidate common Regex patterns into one single class to avoid creating multiple Pattern/Matcher objects. I will look into doing that!

Do you intend to do this in this PR? Also, in case you've done the profile, I'm curious about the extent of performance hit that split incurs.

@asdfghjkxd
Copy link
Contributor Author

@gok99 I will work on the aforementioned comment soon and update you with the results from the profiler!

@sopa301 sopa301 changed the title Minor Enhancements to Existing Regex Code [#2091] Minor Enhancements to Existing Regex Code Feb 19, 2024
@asdfghjkxd
Copy link
Contributor Author

asdfghjkxd commented Feb 21, 2024

@gok99 After some preliminary testing and profiling, it seems that the refactoring does not have much of an impact on the overall performance of the code. The increases or decreases may likely be the result of randomness when executing the program. Here are the findings:

Time

  • Before refactor: 6079ms
  • After refactor: 5820ms

Space

  • Before refactor: 857.902 MB
  • After refactor: 862.132 MB

These findings were made before we consolidated Regex patterns into one single class to avoid duplication, hence results may vary later on.

@asdfghjkxd
Copy link
Contributor Author

asdfghjkxd commented Feb 23, 2024

@sopa301 and @gok99, I have implemented the suggestion to consolidate all commonly used Regex patterns into the StringsUtil class. Please take a look and let me know where I can improve the code!

So far, I'm not too sure if reusing Pattern objects in this manner as a bid to save some memory usage would warrant the increase in coupling between all other classes and StringsUtil; do let me know what you guys think!

@sopa301
Copy link
Contributor

sopa301 commented Feb 23, 2024

Thanks @asdfghjkxd for consolidating the patterns! Do you have any data on the effect of this refactoring on the performance? I agree that if the performance increase is negligible/insignificant, we'll probably be better off leaving those patterns alone.

I'm under the impression that these matchers would be repeatedly created especially with the frequency that these functions are called. Perhaps the compiler may have automatically optimised it for us.

@gok99
Copy link
Contributor

gok99 commented Mar 18, 2024

I definitely think this is worth doing if just for the improved readability and handy regex utils (are there other non-trivial but handy regexes elsewhere that could go here?). I don't think coupling is as much of concern since these are just temporary static helper functions.

Will leave @reposense/active-developers to comment / approve first.

Copy link
Contributor

@sopa301 sopa301 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! As discussed, there is no significant performance improvements, but it makes the code easier to modify with the compilation of patterns. It will take some effort getting used to this way of doing things, but I think it's not significantly difficult.

Copy link
Contributor

@gok99 gok99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @asdfghjkxd feel free to add other common patterns to utils before this gets merged, if you spot them

@asdfghjkxd
Copy link
Contributor Author

@gok99 Sure thing, will look through the codebase to find other places with repeated regex patterns!

@gok99 gok99 requested a review from MarcusTXK March 18, 2024 06:27
Copy link
Member

@MarcusTXK MarcusTXK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think consolidating these patterns into a single util is great as we can avoid having to visit multiple files to hunt down changes. I think that any additional patterns found can be done in a separate PR.

@MarcusTXK MarcusTXK merged commit 3140c31 into reposense:master Mar 26, 2024
10 checks passed
Copy link
Contributor

The following links are for previewing this pull request:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants