Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Implement flock(2) advisory locks on linux #4

Open
nil0x42 opened this issue Apr 6, 2021 · 3 comments
Open

[FEATURE] Implement flock(2) advisory locks on linux #4

nil0x42 opened this issue Apr 6, 2021 · 3 comments

Comments

@nil0x42
Copy link

nil0x42 commented Apr 6, 2021

Hi ! After posting this tweet, @sn0int mentioned anewer tool, which is an awesome, modern replacement for anew.

My feature request is about supporting advisory locks flock(2) on linux. Indeed, when 2 instances of anew/anewer try to write to the same file, both may end up using an outdated hashmap, and then allowing to add duplicate results to the file.

Implementing flock on the opened file would prevent this.

I understand that it may sound overkill, and i don't know if it's easy to implement in rust

@kpcyrd
Copy link
Contributor

kpcyrd commented Apr 6, 2021

I think an important question is: Do you want to run eg. 5 programs and they all write into the file like this:

all of a
everything new from b
everything new from c
everything new from d
everything new from e

In this case you might end up with only one program executing at the same time because for all other programs the stdio buffer is full and anewer is waiting for its turn on the lock. After it got the lock it would read the whole database before it starts processing its stdin, at which point the program its reading from may resume. Even though you worst-case end up with non-parallelized, sequential execution speed this might be the most efficient way to use newer as-is if you have multiple writers.

If we assume anewer is always strictly append-only we could use an rwlock and lock per line, so if an anewer instance received a new line it would attempt to acquire a write lock on the file (which also prevents all other processes from acquiring a read lock).

If another instances starts it would first attempt to acquire a read lock, after that worked it would test for the current file length, release the read lock and read the file to that position. Afterwards we would use tokio::select! to read from either stdin or the file, if we get data from the file first we would attempt to acquire a read lock (before that data we already got is used!), then we again check the current end of the file, release the read lock and catch up the data that has been written in the meantime.

In the worst case all writers are getting unique data and there's only one writing at a time while all others are either waiting for their turn to write or catching up, in the best case they all discover data they already know about so they never (or very rarely) need to acquire a write lock.

The file could look like:

a
a
b
e
d
c
a
b
[...]

Since the tweet is specifically about subdomains, sn0int might work for you as a database (this is likely more efficient than the first solution, but definitely way less efficient than the second anewer-native rwlock solution. There's also no nice way to process the output in a machine-readable way like with anewer unless you can fit your use-case into the notification system):

ws=$(pwgen -1sA)
echo "[*] Starting subdomain scanners..."
./prog1 scan-subdomains | sn0int -w $ws add --stdin subdomains &
./prog2 scan-subdomains | sn0int -w $ws add --stdin subdomains &
./prog3 scan-subdomains | sn0int -w $ws add --stdin subdomains &
wait
echo "[+] All scanners finished!"
sn0int -w $ws select --json subdomains | jq -r .value
sn0int workspace --delete -f $ws

Output would look like this:

% sn0int -w 5dxhuwp9 add --stdin subdomain 
example.com
[*] Adding subdomain "example.com"
www.example.com
[*] Adding subdomain "www.example.com"
www.example.com
www.example.com
foo.example.com
[*] Adding subdomain "foo.example.com"
^D
%

@ysf
Copy link
Owner

ysf commented Apr 10, 2021

Thank you nil0x42, I'm very happy that you like anewer so far. I followed you on twitter instantly xD

Of course implementing locks is possible. I'm currently unsure if this is helpful to what you're trying to solve or if this would help your workflow in general. It seems that you're reinventing a job/queue where you want to add new findings to other tools to fill in the jobqueue again etc. In the past I used some GNU parallel job queue magic to mimic this. Don't know if this is suitable for you.

If you want a way to add input to anewer from multiple sources, an other way could be to read from a named pipe/fifo where your tools write to. This could be more consistent and easier to manage. If this is worth trying I can come up with some parameters that will create those pipes for you.

That said. I'll gladly implement "flocking" (or accept a PR) if it's what is helpful to you, but I'm not sure if this is truly the best way to solve your prob, or just "a" way to solve it. Shoot me a line and I go from there.

@nil0x42
Copy link
Author

nil0x42 commented Apr 10, 2021

Hi ! thank you @kpcyrd for you answer, featuring sn0int usage (a very great tool)
Thank you also @ysf for the answer.

From now, the flock use case was more related to the fact that some of my automation is messy, and it can happen, sometimes that 2 processes try to anewer into the same output file (but it's not expected to happen frequently). So flock usage is more a ways to prevent it from happening in the rare cases it happens.

But you're right, i should really consider using FIFOs in scenarios where it's more frequent.
And i didn't know about this feature of GNU parallel:


Screenshot_2021-04-10_19-03-29

which definitely has interesting use cases

Your awesome rust tools make me want so bad to get started writing in rust!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants