Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rate limiting limits requests to the rate at which MB can return a full response, not 1 second #204

Open
zzzeek opened this issue Aug 21, 2016 · 4 comments

Comments

@zzzeek
Copy link

zzzeek commented Aug 21, 2016

Per musicbrainz, the rate limit is one request per second:

https://musicbrainz.org/doc/XML_Web_Service/Rate_Limiting "The rate at which your IP address is making requests is measured. If that rate is too high, all your requests will be declined (http 503) until the rate drops again. Currently that rate is (on average) 1 request per second. "

However, it appears that the _rate_limit decorator is wrapping the _mb_request() method in its entirety, and is including the time spent waiting for musicbrainz to return a response and for that response to be fully delivered.

The scenario is this: suppose musicbrainz is taking 5 seconds to return a response. I want to make ten requests as fast as possible in separate threads. That means, in ten seconds, I should have made ten requests (one request per second), and at the tenth second, I would have gotten back results from the first five requests. This isn't possible now because _rate_limit puts a mutex around the entire call. it should only be mutexing the part at which the request is sent over the wire to musicbrainz, and the waiting for the response part of it would be outside of the mutex.

@zzzeek
Copy link
Author

zzzeek commented Aug 21, 2016

a way to fix this would be to not use a mutex around the method at all. just store the last timestamp a request was made. new request comes in, wait in a mutexted loop for the timestamp to grow by at least one second, then invoke the _mb_request() outside of the mutex entirely.

@zzzeek
Copy link
Author

zzzeek commented Aug 21, 2016

admittedly im not understanding their document fully - they talk about "50 req sec for the python/musicbrainz user agent", is that...for all python/musicbrainz requests from all sources IPs (eg. shared w/ the world)? in which case, OK, I'd be better off not even bothering to use any threads if the service is really that low of a scale. feel free to clarify / close if this is the case.

@mineo
Copy link
Collaborator

mineo commented Aug 21, 2016

I think I can at least clarify the 50 requests/sec limit for python-musicbrainz. This is for the older python-musicbrainz2 library, which was used for the older musicbrainz web service prior to the launch of the new one in 2011. Until version 0.7.4 (I think) this didn't support customizing the user agent in any way. Since the web service that library uses is deprecated and a version with a customizable user agent has been released, I'm not sure, but it's been a few years, there's the global limit of 50 requests/sec for the old version.

@zzzeek
Copy link
Author

zzzeek commented Aug 21, 2016

anyway I'm locally doing something like this:

import threading
import time
import musicbrainzngs as mb

# unwrap mb's rate limiting
mb.musicbrainz._mb_request = mb.musicbrainz._mb_request.fun


last_mb_call = None
last_mb_mutex = threading.Lock()

def _mb_fn(self, fn, *arg, **kw):
    global last_mb_call

    last_mb_mutex.acquire()
    try:
        now = time.time()
        if last_mb_call is None:
            last_mb_call = now
        else:
            if now - last_mb_call < 1:
                time.sleep(1 - (now - last_mb_call))
                now = time.time()
            last_mb_call = now
    finally:
        last_mb_mutex.release()

    return fn(*arg, **kw)

# ... later ...

results = _mb_fn(mb.search_recordings, **kwargs)

and this is working great. That is, put the mutex around just the part where you're checking the time. there seem to be a lot more options in _rate_limit() but also right now it's not really possible to set "no" rate limit due to the thread serializing the requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants