Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Socket Connection Timeout to CallbackClient and PythonClient #526

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Leniox
Copy link

@Leniox Leniox commented Jul 13, 2023

Solution 1 as part of #525

@Leniox Leniox changed the title Add socket connect timeout to CallbackClient and PythonClient Add Socket Connection Timeout to CallbackClient and PythonClient Jul 13, 2023
Copy link

@schlosna schlosna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to s/createSocketTimeout/connectTimeout/ and use the existing GatewayServer.DEFAULT_CONNECT_TIMEOUT to provide API backward compatibility. There are a few places that already have connectTimeout but aren't wiring it through to socket connect.

@@ -90,6 +90,7 @@ public class CallbackClient implements Py4JPythonClient {
protected final boolean enableMemoryManagement;

protected final int readTimeout;
protected final int createSocketTimeout;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be connectTimeout?

Suggested change
protected final int createSocketTimeout;
protected final int connectTimeout;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this renaming

@@ -150,7 +152,7 @@ public CallbackClient(int port, InetAddress address, long minConnectionTime, Tim
public CallbackClient(int port, InetAddress address, long minConnectionTime, TimeUnit minConnectionTimeUnit,
SocketFactory socketFactory, boolean enableMemoryManagement) {
this(port, address, minConnectionTime, minConnectionTimeUnit, socketFactory, enableMemoryManagement,
GatewayServer.DEFAULT_READ_TIMEOUT);
GatewayServer.DEFAULT_READ_TIMEOUT, GatewayServer.DEFAULT_CREATE_SOCKET_CONNECTION_TIMEOUT);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GatewayServer.DEFAULT_READ_TIMEOUT, GatewayServer.DEFAULT_CREATE_SOCKET_CONNECTION_TIMEOUT);
GatewayServer.DEFAULT_READ_TIMEOUT, GatewayServer.DEFAULT_CONNECT_TIMEOUT);

@@ -176,9 +178,9 @@ public CallbackClient(int port, InetAddress address, long minConnectionTime, Tim
* Python program is closed.
*/
public CallbackClient(int port, InetAddress address, long minConnectionTime, TimeUnit minConnectionTimeUnit,
SocketFactory socketFactory, boolean enableMemoryManagement, int readTimeout) {
SocketFactory socketFactory, boolean enableMemoryManagement, int readTimeout, int createSocketTimeout) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SocketFactory socketFactory, boolean enableMemoryManagement, int readTimeout, int createSocketTimeout) {
SocketFactory socketFactory, boolean enableMemoryManagement, int readTimeout, int connectTimeout) {

this(port, address, null, minConnectionTime, minConnectionTimeUnit, socketFactory, enableMemoryManagement,
readTimeout);
readTimeout, createSocketTimeout);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
readTimeout, createSocketTimeout);
readTimeout, connectTimeout);

Comment on lines 180 to +181
public CallbackClient(int port, InetAddress address, long minConnectionTime, TimeUnit minConnectionTimeUnit,
SocketFactory socketFactory, boolean enableMemoryManagement, int readTimeout) {
SocketFactory socketFactory, boolean enableMemoryManagement, int readTimeout, int createSocketTimeout) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid an API break, probably want to keep a constructor of the existing signature and use default socket connect timeout.

public CallbackClient(int port, InetAddress address, long minConnectionTime, TimeUnit minConnectionTimeUnit,
			SocketFactory socketFactory, boolean enableMemoryManagement, int readTimeout) {
    this(port, address, null, minConnectionTime, minConnectionTimeUnit, socketFactory, enableMemoryManagement,
				readTimeout, GatewayServer.DEFAULT_CONNECT_TIMEOUT);
}

@@ -48,9 +48,10 @@ public class InstrPythonClient extends PythonClient {

public InstrPythonClient(Gateway gateway, List<Class<? extends Command>> customCommands, int pythonPort,
InetAddress pythonAddress, long minConnectionTime, TimeUnit minConnectionTimeUnit,
SocketFactory socketFactory, Py4JJavaServer javaServer, boolean enableMemoryManagement, int readTimeout) {
SocketFactory socketFactory, Py4JJavaServer javaServer, boolean enableMemoryManagement, int readTimeout,
int createSocketTimeout) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int createSocketTimeout) {
int connectTimeout) {

and 🔭 constructor for API back compat

super(gateway, customCommands, pythonPort, pythonAddress, minConnectionTime, minConnectionTimeUnit,
socketFactory, javaServer, enableMemoryManagement, readTimeout);
socketFactory, javaServer, enableMemoryManagement, readTimeout, createSocketTimeout);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
socketFactory, javaServer, enableMemoryManagement, readTimeout, createSocketTimeout);
socketFactory, javaServer, enableMemoryManagement, readTimeout, connectTimeout);

@@ -42,7 +42,8 @@ public void startServer2() {
InstrClientServer server2 = new InstrClientServer(GatewayServer.DEFAULT_PORT + 5,
GatewayServer.defaultAddress(), GatewayServer.DEFAULT_PYTHON_PORT + 5, GatewayServer.defaultAddress(),
GatewayServer.DEFAULT_CONNECT_TIMEOUT, GatewayServer.DEFAULT_READ_TIMEOUT,
ServerSocketFactory.getDefault(), SocketFactory.getDefault(), this, false, true);
GatewayServer.DEFAULT_CREATE_SOCKET_CONNECTION_TIMEOUT, ServerSocketFactory.getDefault(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already have GatewayServer.DEFAULT_CONNECT_TIMEOUT

Suggested change
GatewayServer.DEFAULT_CREATE_SOCKET_CONNECTION_TIMEOUT, ServerSocketFactory.getDefault(),
ServerSocketFactory.getDefault(),

Comment on lines +64 to +65
GatewayServer.DEFAULT_CREATE_SOCKET_CONNECTION_TIMEOUT, ServerSocketFactory.getDefault(),
SocketFactory.getDefault(), this, false, true);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already have GatewayServer.DEFAULT_CONNECT_TIMEOUT

Suggested change
GatewayServer.DEFAULT_CREATE_SOCKET_CONNECTION_TIMEOUT, ServerSocketFactory.getDefault(),
SocketFactory.getDefault(), this, false, true);
ServerSocketFactory.getDefault(), SocketFactory.getDefault(), this, false, true);

@@ -98,7 +99,8 @@ public static void main(String[] args) {
GatewayServer.turnLoggingOff();
CallbackClient callbackClient = new CallbackClient(GatewayServer.DEFAULT_PYTHON_PORT,
GatewayServer.defaultIPv6Address(), CallbackClient.DEFAULT_MIN_CONNECTION_TIME,
CallbackClient.DEFAULT_MIN_CONNECTION_TIME_UNIT, SocketFactory.getDefault(), false, 250);
CallbackClient.DEFAULT_MIN_CONNECTION_TIME_UNIT, SocketFactory.getDefault(), false, 250,
GatewayServer.DEFAULT_CREATE_SOCKET_CONNECTION_TIMEOUT);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GatewayServer.DEFAULT_CREATE_SOCKET_CONNECTION_TIMEOUT);
GatewayServer.DEFAULT_CONNECT_TIMEOUT);

@Leniox
Copy link
Author

Leniox commented Jul 14, 2023

Hey @HyukjinKwon! Thank you for taking the time to look at this.

Context

The reason I made a separate configuration came from my testing. Right now, if I set the connectTimeout to 10000 (10s), I consistently get errors (see attached video)
https://github.com/py4j/py4j/assets/15280977/5d3570ee-69db-46e5-93c6-ac839ff6e2a7

This is confusing, as it appears I've connected to the gateway, sent commands, and during the execution of the command, I get this error. If I use the connectTimeout then it may fix my problem on the initial Socket Connection, but looks like it'll present other problems.

Potential Problem:

On this line in the PythonClient, we set the SoTimeout as the readTimeout

However, on this line in GatewayServer, we use the connectTimeout. It's not clear to me why this is throwing exceptions when I set this value to > 0. Perhaps this is because we do not establish a keepalive (See below)

Problem:

Given the context, I'd be OK to use the connect timeout. But, I have concern that it would present other problems that I encountered above. In order to provide the right solution, I want to ask the following questions to help debug:

Questions:

  1. Should we be using the readTimeout instead of the connectTimeout in the GatewayServer? It feels strange to me that I've made a connection, but I'm still getting these errors during my testing. But, I could be missing something
  2. Is the problem I'm getting related to the fact that we do not establish a KeepAlive? In which case, we'd want to do

The combination of (1) and (2) would look like below, when we create sockets inside both our PythonClient and GatewayServer

        Socket socket = socketFactory.createSocket();
        socket.connect(new InetSocketAddress(address, port), **connectTimeout**);
        socket.setSoTimeout(**readTimeout**);
+       socket.setKeepAlive(true);
+       if (socket.getLocalAddress().isLoopbackAddress()) {
+           socket.setTcpNoDelay(true);
+       }

Any clarification would greatly help me figure out the right solution!

@HyukjinKwon
Copy link
Member

FYI, I am trying to make the tests stable first (e.g, reducing matrix first #519). I think we should make the CI green first ..

@Leniox
Copy link
Author

Leniox commented Aug 8, 2023

I have not had time to back to this, but once I get the time to revisit, I will make the contribution

@HyukjinKwon
Copy link
Member

Would be great if anyone finds some time to fix up the CI first. I will try around late this month too.

@HyukjinKwon
Copy link
Member

Guys, sorry for the delay. Now the tests are fixed per #519. Would you mind rebasing this one please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants