-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Providing Byte-Offsets for Every Match #46
Comments
fabianovasi
changed the title
Incomplete Match Information for Repeated Patterns
Providing Byte-Offsets for Every Match
Oct 24, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Feature Request
Description:
Hello,
I've noticed that if matches overlap, byte-offsets are only provided for the beginning of the matched part. As a result, the number of matches obtained with --count-matches flag is larger than the number of obtained byte-offsets with the -o -b flags. I suggest the addition of a new option or modification to existing options that allows users to obtain byte-offsets for every match, even when matches overlap.
Providing all byte-offsets for overlapping matches directly would streamline workflows which require byte-offsets for all matches.
Steps to Reproduce:
Text in a.txt: "012a34"
Pattern: "\p{N}{2}"
Use the regular expression to search for matches in a.txt:
hg -e "\p{N}{2}" -b -o a.txt
Result:
The number of matches obtained with the --count-matches flag is 3. It would be nice to be able to also obtain three byte-offsets (0,1 and 4 in this example).
Thank you for considering this feature request. I appreciate your work for enabling regex pattern searches with Hyperscan.
Notice: I edited this issue since I realized the matching mechanism is working with a sliding window.
The text was updated successfully, but these errors were encountered: