Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seastar must listen on ALL shards, otherwise requests can be lost #2183

Open
robinchrist opened this issue Apr 14, 2024 · 7 comments
Open

Seastar must listen on ALL shards, otherwise requests can be lost #2183

robinchrist opened this issue Apr 14, 2024 · 7 comments

Comments

@robinchrist
Copy link

robinchrist commented Apr 14, 2024

TL;DR:
Currently Seastar must listen on ALL shards, otherwise requests can be lost.

This is not documented anywhere as far as I'm aware - So either this limitation should be removed or it should be documented.

Original Issue:

Take the following code:

#include <boost/iterator/counting_iterator.hpp>

#include <seastar/core/app-template.hh>
#include <seastar/core/do_with.hh>
#include <seastar/core/future-util.hh>
#include <seastar/core/future.hh>
#include <seastar/core/gate.hh>
#include <seastar/core/reactor.hh>
#include <seastar/core/seastar.hh>
#include <seastar/core/sleep.hh>
#include <seastar/net/api.hh>
#include <seastar/util/log.hh>

namespace ss = seastar;

using namespace std::chrono_literals;

const char* canned_response = "Seastar is the future!\n";

ss::future<> service_loop()
{
    ss::listen_options lo;
    lo.reuse_address = true;
    return ss::do_with(ss::listen(ss::make_ipv4_address({1234}), lo), [](auto& listener) {
        return ss::keep_doing([&listener]() {
            return listener.accept().then([](ss::accept_result res) {
                auto s   = std::move(res.connection);
                auto out = s.output();
                return ss::do_with(std::move(s), std::move(out), [](auto& s, auto& out) {
                    return out.write(canned_response).then([&out] {
                        return out.close();
                    });
                });
            });
        });
    });
}

ss::future<> f()
{
    return ss::parallel_for_each(boost::irange<unsigned>(1, ss::smp::count), [](unsigned c) {
        return ss::smp::submit_to(c, service_loop);
    });
}

int main(int argc, char** argv)
{
    ss::app_template app;
    try {
        app.run(argc, argv, f);
    } catch (...) {
        std::cerr << "Couldn't start application: " << std::current_exception() << "\n";
        return 1;
    }
    return 0;
}

(taken from https://github.com/Rjerk/seastar-tutorial/blob/master/network-write.cc)

It will not work.

ninja && ./my_app -c 4 -m 128G
[2/2] Linking CXX executable my_app
INFO  2024-04-14 03:45:19,002 seastar - Reactor backend: io_uring
INFO  2024-04-14 03:45:19,003 seastar - Perf-based stall detector creation failed (EACCESS), try setting /proc/sys/kernel/perf_event_paranoid to 1 or less to enable kernel backtraces: falling back to posix timer.

and in another window:

netcat -v localhost 1234
netcat: connect to localhost (127.0.0.1) port 1234 (tcp) failed: Connection refused

Only once you change to boost::irange<unsigned>(0... instead of boost::irange<unsigned>(1... it will work.

6b7b16a

@nyh
Copy link
Contributor

nyh commented Apr 14, 2024

You example code doesn't prove that connections are accepted only on core 0 - rather it proves that you must listen also on core 0 for the setup to work.

Can you please try the full loop (with 0) and add printouts which core gets each request, to verify that all of them - not just core 0 - gets requests?

@nyh
Copy link
Contributor

nyh commented Apr 14, 2024

By the way, I'm not happy that the current situation is that you're not allowed to only listen on a subset of the cores. It's not because anyone might really want to do that (on a sharded application, you would definitely want to listen on all cores), but it makes the user experience frustrating: If you forget to listen on every one the cores, you get mysterious results with only some of the requests (if any) being handled, instead of getting a clear error message or everything just working just with fewer cores.

@robinchrist
Copy link
Author

You example code doesn't prove that connections are accepted only on core 0 - rather it proves that you must listen also on core 0 for the setup to work.

Can you please try the full loop (with 0) and add printouts which core gets each request, to verify that all of them - not just core 0 - gets requests?

Yes, that works, as expected:

INFO  2024-04-14 11:55:06,626 [shard 0:main] Kataklysmos - Accepted connection from 127.0.0.1:40354
INFO  2024-04-14 11:55:06,627 [shard 1:main] Kataklysmos - Accepted connection from 127.0.0.1:40356
INFO  2024-04-14 11:55:06,628 [shard 2:main] Kataklysmos - Accepted connection from 127.0.0.1:40364
INFO  2024-04-14 11:55:06,628 [shard 3:main] Kataklysmos - Accepted connection from 127.0.0.1:40378
INFO  2024-04-14 11:55:06,629 [shard 0:main] Kataklysmos - Accepted connection from 127.0.0.1:40390

By the way, I'm not happy that the current situation is that you're not allowed to only listen on a subset of the cores. It's not because anyone might really want to do that (on a sharded application, you would definitely want to listen on all cores),

My plan was to do different things on different cores - e.g. most of the cores would only listen, but some would be tasked with some background tasks etc... Is that discouraged?

But please, in any case: Please document this limitation with a fat red banner so that nobody can miss it.
And please describe exactly in which circumstances it applies, for which type of socket, etc...

but it makes the user experience frustrating: If you forget to listen on every one the cores, you get mysterious results with only some of the requests (if any) being handled, instead of getting a clear error message or everything just working just with fewer cores.

I must say that my onboarding experience has indeed been quite frustrating. I filed 3 issues including this, and almost a fourth due to the unexpected auto_handle_sigint_sigterm (you create a simple hello world server and can't kill it with CTRL+C...)

@nyh
Copy link
Contributor

nyh commented Apr 14, 2024

Yes, that works, as expected:

Good. So this issue should be closed, or at least its title and description need to be rewritten (the "connection only accepted in shard id = 0" isn't true, or doesn't accurately describe the problem).

My plan was to do different things on different cores - e.g. most of the cores would only listen, but some would be tasked with some background tasks etc... Is that discouraged?

Yes - although in some cases it makes sense to do some rare operations only one one CPU - e.g., to make them easy to serialize - usually you'd want the bulk of the operations, like processing requests, to happen in parallel on all cores. If you decide that some cores only listen, other cores only do background tasks, etc., you'll need to work hard to ensure a balanced load (and not leave some shard idle while other nodes are working) and you will have more communication between shards.

That being said, as I also admitted above, I agree that the user experience of this really sucks. Ideally if the Seastar application only listens on core 2 and 7, then it should work - and all requests would be handled on core 2 and 7. Or, if this is NOT supported, we should abort the application with an error. Or, as you said, at least:

But please, in any case: Please document this limitation with a fat red banner so that nobody can miss it. And please describe exactly in which circumstances it applies, for which type of socket, etc...

I agree. The situation with the Seastar tutorial (doc/tutorial.md) is not good. I started writing it a few years ago, and when funding for that project stopped. In some areas we have good doxygen comments, but they are not detailed enough.

If you can send a patch improving tutorial.md and/or doxygen comments in the area, it would be very welcome.

I must say that my onboarding experience has indeed been quite frustrating. I filed 3 issues including this, and almost a fourth due to the unexpected auto_handle_sigint_sigterm (you create a simple hello world server and can't kill it with CTRL+C...)

I agree with every word. I opened an issue about control-C seven years ago - #261 :-(

@nyh
Copy link
Contributor

nyh commented Apr 14, 2024

By the way: A plea to managers of commercial projects over Seastar such as ScyllaDB (CC @avikivity) and RedPanda: Please consider assigning people to improve Seastar documentation, and while at it also to fix bugs that hurt usability and onboarding. While no paying customers care about these things, they are important for the productivity of new developers that you hire for these commercial projects. Wouldn't it be great to have a "Seastar book" that every new employee could read and learn Seastar? That was my original intention when I started tutorial.md.

@robinchrist robinchrist changed the title POSIX connections only accepted in shard id = 0? Seastar must listen on ALL shards, otherwise requests can be lost Apr 14, 2024
@robinchrist
Copy link
Author

Good. So this issue should be closed, or at least its title and description need to be rewritten (the "connection only accepted in shard id = 0" isn't true, or doesn't accurately describe the problem).

Done.

Yes - although in some cases it makes sense to do some rare operations only one one CPU - e.g., to make them easy to serialize - usually you'd want the bulk of the operations, like processing requests, to happen in parallel on all cores. If you decide that some cores only listen, other cores only do background tasks, etc., you'll need to work hard to ensure a balanced load (and not leave some shard idle while other nodes are working) and you will have more communication between shards.

Well yes - That as a drawback I was willing to take. One of the background tasks should be scheduled at more or less price times (60fps "output") and not be interrupted by other things. I know the scheduler takes care of this, but I'd prefer to have more explicit control.

That being said, as I also admitted above, I agree that the user experience of this really sucks. Ideally if the Seastar application only listens on core 2 and 7, then it should work - and all requests would be handled on core 2 and 7. Or, if this is NOT supported, we should abort the application with an error. Or, as you said, at least:

Yep, I agree 100%

I agree. The situation with the Seastar tutorial (doc/tutorial.md) is not good. I started writing it a few years ago, and when funding for that project stopped. In some areas we have good doxygen comments, but they are not detailed enough.

If you can send a patch improving tutorial.md and/or doxygen comments in the area, it would be very welcome.

Well the point is I'm just getting started with Seastar (I'm working on a project called Kataklysmos which is supposed to become the fastest Pixelflut implementation in the world), so any doc PRs would just be more or less educated guesses. Not sure whether this is a good idea.

. I started writing it a few years ago, and when funding for that project stopped.

Oh, I didn't know funding for Seastar was stopped

@nyh
Copy link
Contributor

nyh commented Apr 14, 2024

Oh, I didn't know funding for Seastar was stopped

Just to clarify: Seastar development is still almost entirely done by developers paid by commercial companies (like my own employer, ScyllaDB). This hasn't stopped. But there is no one who is specifically funding, or assigning, documenting Seastar. So the result is partial documentation of varying breadth and depth. When Seastar development started, it was partially funded by an EU project (the Horizon 2020 project, https://mikelangelo-project.eu/), and one of the requirements of that funding was documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants