Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing Byte-Offsets for Every Match #46

Open
fabianovasi opened this issue Oct 10, 2023 · 0 comments
Open

Providing Byte-Offsets for Every Match #46

fabianovasi opened this issue Oct 10, 2023 · 0 comments

Comments

@fabianovasi
Copy link

fabianovasi commented Oct 10, 2023

Feature Request
Description:

Hello,
I've noticed that if matches overlap, byte-offsets are only provided for the beginning of the matched part. As a result, the number of matches obtained with --count-matches flag is larger than the number of obtained byte-offsets with the -o -b flags. I suggest the addition of a new option or modification to existing options that allows users to obtain byte-offsets for every match, even when matches overlap.

Providing all byte-offsets for overlapping matches directly would streamline workflows which require byte-offsets for all matches.

Steps to Reproduce:

Text in a.txt: "012a34"
Pattern: "\p{N}{2}"
Use the regular expression to search for matches in a.txt:

hg -e "\p{N}{2}" -b -o a.txt

Result:

The number of matches obtained with the --count-matches flag is 3. It would be nice to be able to also obtain three byte-offsets (0,1 and 4 in this example).

Thank you for considering this feature request. I appreciate your work for enabling regex pattern searches with Hyperscan.

Notice: I edited this issue since I realized the matching mechanism is working with a sliding window.

@fabianovasi fabianovasi changed the title Incomplete Match Information for Repeated Patterns Providing Byte-Offsets for Every Match Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant