Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close function doesn't actually close everything #191

Open
agitelzon opened this issue Jan 5, 2023 · 9 comments
Open

Close function doesn't actually close everything #191

agitelzon opened this issue Jan 5, 2023 · 9 comments

Comments

@agitelzon
Copy link

agitelzon commented Jan 5, 2023

Is this a support request?
Support Ticket #31907

Describe the bug
We are running a python flask app on a set of linux servers and in an AWS lambda. In our app, we create a ldclient instance and try to close() it when we are done. The ldclient.get().close() function call didn't close all threads. The left over threads pile up and over time hit the system thread limit, because the application process is still running and is called on a new task. We start getting an error of RuntimeError: can't start new thread

In short, the bug is that the close() function didn't close all threads. If set_config get's called a 2nd time, as documented, a new ldclient instance is created, the old one never completely goes away.

To reproduce

import atexit
import ldclient
import threading

def __set_launchdarkly_configuration(app):
    ld_key = config.envars.get("LAUNCHDARKLY_SDK_KEY")
    if ld_key:
        ld_config = ldclient.config.Config(ld_key)
    else:
        ld_config = ldclient.config.Config("NOT_CONFIGURED", offline=True)
        logger.warning(
            "LAUNCHDARKLY_SDK_KEY is not set, feature flags will always return defaults"
        )
    ldclient.set_config(ld_config)
    atexit.register(lambda: ldclient.get().close())

    @atexit.register
    def teardown_launchdarkly_configuration(exception=None):
        logger.info(f"teardown launchdarkly_configuration, exception?: {exception}")
        logger.info(f"Number of active threads: {threading.active_count()}")
 
# Call __set_launchdarkly_configuration function, ldclient.get(), and client.variation

Expected behavior
Ultimately, I expected the function ldclient.get().close() to close all threads and release all ldclient objects. I would expect my next ldclient.get().is_initialized() to be false because I closed ldclient so it should be in a state where I have to call set_config

I expected ldclient.set_config(ld_config) to be smart enough to only create new instances if there doesn't exist an active instance or if the config values changed from the previous call. I know that ldclient.get() enforces the singleton pattern, then I would expect `ldclient.set_config() to as well.

Maybe there should be something like a get_config function to allow me to verify that the settings in code are the same as the active config and then I could re-use the existing ldclient instead of creating a new one.

Logs

builtins.RuntimeError: can't start new thread
Traceback (most recent call last):
<removed>
File /function/ldclient/__init__.py, line 42, in set_config
	{code_line}
File /function/ldclient/client.py, line 97, in __init__
	{code_line}
File /function/ldclient/client.py, line 120, in _set_event_processor
	{code_line}
File /function/ldclient/event_processor.py, line 396, in __init__
	{code_line}
File /function/ldclient/repeating_timer.py, line 17, in start
	{code_line}
File /usr/local/lib/python3.7/threading.py, line 852, in start
	{code_line}
RuntimeError: can't start new thread

SDK version
We updated to version 7.5.1 and still saw this error. We tried v8.0.0 as well, same issue.

Language version, developer tools
Python 3.7

OS/platform
Docker container running python:3.7-slim and Ubuntu 20.04

Additional context

@eli-darkly
Copy link
Contributor

eli-darkly commented Jan 5, 2023

It looks to me like this is probably a very long-standing problem with the implementation of close(). I think the thread that is not being stopped is the one that reads the flag data SSE stream. Since the server has not closed the connection, this thread is blocked waiting for more data. If I'm right, then using the SDK in polling mode (set stream=False in the configuration), so that the request does not stay open, would mean you would not see any lingering threads after calling close(). That wouldn't be a long-term solution, just a way to test the theory.

The obvious solution would be to force a close of the connection from the client side, but I've been having trouble finding a workable way to do this— urllib3's HTTPResponse.close() appears to simply hang if you try to call it from thread A (in this case the application thread that's trying to close the LDClient) while thread B (the SSE reader) is doing a blocking read. I'm still investigating what the right way is to do this; urllib3 documentation and Python documentation are not very clear on this subject since it's a somewhat unusual use case.

@agitelzon
Copy link
Author

We tried stream=False and it seemed to not help.

@eli-darkly
Copy link
Contributor

@agitelzon Hmm— that's unexpected, so maybe there is something else going on. We're continuing to investigate.

@eli-darkly
Copy link
Contributor

Can you tell me exactly how many excess threads you are seeing?

@eli-darkly
Copy link
Contributor

Also— I missed one detail in your original post that I should have commented on. You said:

I expected ldclient.set_config(ld_config) to be smart enough to only create new instances if there doesn't exist an active instance or if the config values changed from the previous call. I know that ldclient.get() enforces the singleton pattern, then I would expect ldclient.set_config() to as well.

That is not the documented behavior of set_config. Please see the documentation:

Sets the configuration for the shared SDK client instance.
If this is called prior to ldclient.get(), it stores the configuration that will be used when the client is initialized. If it is called after the client has already been initialized, the client will be re-initialized with the new configuration (this will result in the next call to ldclient.get() returning a new client instance).

In other words, if you only want the client to be created once, and the configuration has not changed, you should not be calling set_config more than once.

@agitelzon
Copy link
Author

agitelzon commented Jan 7, 2023

Can you tell me exactly how many excess threads you are seeing?

Every time a new job would run in the code, we'd see 11 new threads created, 9 threads shutdown, and 2 threads would stick around forever. The next time a job would run, again 11 new, 9 closed, 2 left over.

This would keep happening until the application ran out of threads in the system.

In other words, if you only want the client to be created once, and the configuration has not changed, you should not be calling set_config more than once.

For now, our workaround was this block of code at the beginning of the __set_launchdarkly_configuration function.

    try:
        if ldclient.get().is_initialized():
            logger.info("LAUNCHDARKLY already initialized")
            return
    except Exception as e:
        logger.info("LAUNCHDARKLY not initialized yet. Setting config...")
        pass

@eli-darkly
Copy link
Contributor

Thanks. If it's 5 threads, then my original guess about what was going on was wrong, but that does give us a clue as to what else to look into.

@eli-darkly
Copy link
Contributor

Sorry if this is a silly question, but... in the first code excerpt you posted, is it for sure that the ldclient.get().close() is happening before the teardown_launchdarkly_configuration method runs and prints the thread count? I'm not sure how atexit hooks are ordered when one of them is from a method annotation.

@agitelzon
Copy link
Author

Thanks. If it's 5 threads, then my original guess about what was going on was wrong, but that does give us a clue as to what else to look into.

Please take a look at my post again. I went back through our logs and got an accurate count and updated my post.

Sorry if this is a silly question, but... in the first code excerpt you posted, is it for sure that the ldclient.get().close() is happening before the teardown_launchdarkly_configuration method runs and prints the thread count? I'm not sure how atexit hooks are ordered when one of them is from a method annotation.

We went through a few iterations with testing the block of code and putting in print statements. I just grabbed a simplified version to post here. At first, we had the ldclient.get().close() statement in the function block with print statements before and after, so we could be sure on the number of threads. Once we had a good idea that the culprit was ldclient, we removed most of the print statements to clean up the code.

I could update the sample code with more statements if that would be helpful.

LaunchDarklyReleaseBot pushed a commit that referenced this issue Jan 30, 2023
skip tests that use a self-signed TLS cert in Python 3.7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants