Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support TLS Grpc communication between clusters. #11549

Merged
merged 50 commits into from May 15, 2024

Conversation

stone-98
Copy link
Contributor

@stone-98 stone-98 commented Dec 24, 2023

What is the purpose of the change

For #11456

Currently, Nacos only supports TLS communication between SDK and Server, and there is a need to support communication between Server instances.

Brief changelog

Overview of Previous Logic

In the original system, only gRPC TLS communication from the SDK to the server was supported, with TLS configurations divided into client (RpcClientTlsConfig) and server (RpcServerTlsConfig) parts. The client had three dependencies on TLS configurations, while the server relied on a protocol negotiator to build TLS configurations.

Dependency Diagram:

image-20240416104125966

Adjustments Made

  1. TLS Configuration Logic Adjusted to Factory Pattern: Introduced the RpcTlsConfigFactory interface and its implementation classes to create TLS configurations according to the specific requirements of clients and servers.
  2. Adjustment of Protocol Negotiator's Singleton Mode: Abstracted the singleton mode of the protocol negotiator into AbstractProtocolNegotiatorBuilderSingleton, and implemented SdkProtocolNegotiatorBuilderSingleton and ClusterProtocolNegotiatorBuilderSingleton based on it, for SDK and cluster protocol negotiation, respectively. Support for TLS gRPC interaction in clusters was added.
  3. Adjustment of SSL Context Refresher: RpcServerSslContextRefresherHolder was adjusted to include support for cluster SSL context refresh, enhancing security.
  4. Addition of Configuration Items: Introduced a series of configuration items specifically for cluster TLS communication.

Dependency Diagram:

Updated Dependency Diagram

Verifying this change

Demonstrating a three-node cluster

Configuring cluster.conf

The content of cluster.conf is as follows:

192.168.6.26:8848
192.168.6.26:8850
192.168.6.26:8852

Start Nacos on ports 8848, 8850, and 8852 respectively.

Generating certificates & signing

#!/bin/bash

# Generating a self-signed root certificate (CA)
openssl req -x509 -newkey rsa:4096 -days 365 -nodes \
    -keyout ca-key.pem -out ca-cert.pem \
    -subj "/C=CN/ST=Hunan/L=Changsha/O=nacos/OU=nacos/CN=nacos.io/[email protected]"

# Generating certificate requests and private keys for each member
for member in member1 member2 member3; do
    openssl req -newkey rsa:4096 -nodes \
        -keyout "${member}-key.pem" -out "${member}-req.pem" \
        -subj "/C=CN/ST=Hunan/L=Changsha/O=nacos/OU=nacos/CN=nacos.io/emailAddress=${member}@gmail.com"
done

# Signing each member's certificate request with the root certificate
for member in member1 member2 member3; do
    openssl x509 -req -in "${member}-req.pem" -days 60 \
        -CA ca-cert.pem -CAkey ca-key.pem -CAcreateserial \
        -out "${member}-cert.pem"
done

The generated files are as follows:

[root@yg-itom-test2 ca1]# ll
Total 76
-rw-r--r--. 1 root root 2090 Apr 12 17:24 ca-cert.pem		———— Self-signed root certificate file.
-rw-r--r--. 1 root root   17 Apr 12 17:24 ca-cert.srl		———— Root certificate serial number file, used to manage certificate serial numbers.
-rw-r--r--. 1 root root 3272 Apr 12 17:24 ca-key.pem		———— Private key file for the self-signed root certificate.
-rw-r--r--. 1 root root 1976 Apr 12 17:24 client-cert.pem	———— Certificate file for the SDK client.
-rw-r--r--. 1 root root 3272 Apr 12 17:24 client-key.pem	———— Private key file for the SDK client.
-rw-r--r--. 1 root root 1736 Apr 12 17:24 client-req.pem	———— Certificate request file for the SDK client.
-rwxrwxrwx. 1 root root 1541 Apr 12 15:16 createCa.sh		———— Bash script for creating certificates.
-rw-r--r--. 1 root root 1976 Apr 12 17:24 member1-cert.pem	———— Certificate file for member1.
-rw-r--r--. 1 root root 3272 Apr 12 17:24 member1-key.pem	———— Private key file for member1.
-rw-r--r--. 1 root root 1740 Apr 12 17:24 member1-req.pem	———— Certificate request file for member1.
-rw-r--r--. 1 root root 1976 Apr 12 17:24 member2-cert.pem	———— Certificate file for member2.
-rw-r--r--. 1 root root 3272 Apr 12 17:24 member2-key.pem	———— Private key file for member2.
-rw-r--r--. 1 root root 1740 Apr 12 17:24 member2-req.pem	———— Certificate request file for member2.
-rw-r--r--. 1 root root 1976 Apr 12 17:24 member3-cert.pem	———— Certificate file for member3.
-rw-r--r--. 1 root root 3272 Apr 12 17:24 member3-key.pem	———— Private key file for member3.
-rw-r--r--. 1 root root 1740 Apr 12 17:24 member3-req.pem	———— Certificate request file for member3.
-rw-r--r--. 1 root root 1976 Apr 12 17:24 server-cert.pem	———— Certificate file for the server.
-rw-r--r--. 1 root root 3272 Apr 12 17:24 server-key.pem	———— Private key file for the server.
-rw-r--r--. 1 root root 1736 Apr 12 17:24 server-req.pem	———— Certificate request file for the server.

Starting three Nacos instances

Start each Nacos instance with the following specified parameters:

bashCopy code# Nacos instance 1
-Dnacos.inetutils.ip-address=192.168.6.26
-Dnacos.core.member.lookup.type=file
-DembeddedStorage=true
-Dnacos.home=C:/Users/admin/nacos/cluster1
-Dserver.port=8848
-Dnacos.remote.peer.rpc.tls.enableTls=true
-Dnacos.remote.peer.rpc.tls.compatibility=false
-Dnacos.remote.peer.rpc.tls.certChainFile=ca/member1-cert.pem
-Dnacos.remote.peer.rpc.tls.certPrivateKey=ca/member1-key.pem
-Dnacos.remote.peer.rpc.tls.trustCollectionCertFile=ca/ca-cert.pem

# Nacos instance 2
-Dnacos.inetutils.ip-address=192.168.6.26
-Dnacos.core.member.lookup.type=file
-DembeddedStorage=true
-Dnacos.home=C:/Users/admin/nacos/cluster2
-Dserver.port=8850
-Dnacos.remote.peer.rpc.tls.enableTls=true
-Dnacos.remote.peer.rpc.tls.compatibility=false
-Dnacos.remote.peer.rpc.tls.certChainFile=ca/member2-cert.pem
-Dnacos.remote.peer.rpc.tls.certPrivateKey=ca/member2-key.pem
-Dnacos.remote.peer.rpc.tls.trustCollectionCertFile=ca/ca-cert.pem

# Nacos instance 3
-Dnacos.inetutils.ip-address=192.168.6.26
-Dnacos.core.member.lookup.type=file
-DembeddedStorage=true
-Dnacos.home=C:/Users/admin/nacos/cluster3
-Dserver.port=8852
-Dnacos.remote.peer.rpc.tls.enableTls=true
-Dnacos.remote.peer.rpc.tls.compatibility=false
-Dnacos.remote.peer.rpc.tls.certChainFile=ca/member3-cert.pem
-Dnacos.remote.peer.rpc.tls.certPrivateKey=ca/member3-key.pem
-Dnacos.remote.peer.rpc.tls.trustCollectionCertFile=ca/ca-cert.pem

Result

The cluster is running successfully:

image-20240412154623595

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a Github issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a Github issue. Your pull request should address just this issue, without pulling in other changes - one PR resolves one issue.
  • Format the pull request title like [ISSUE #123] Fix UnknownException when host config not exist. Each commit in the pull request should have a meaningful subject line and body.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test to verify your logic correction, more mock a little better when cross module dependency exist. If the new feature or significant change is committed, please remember to add integration-test in test module.
  • Run mvn -B clean package apache-rat:check findbugs:findbugs -Dmaven.test.skip=true to make sure basic checks pass. Run mvn clean install -DskipITs to make sure unit-test pass. Run mvn clean test-compile failsafe:integration-test to make sure integration-test pass.

@stone-98
Copy link
Contributor Author

@KomachiSion Please take a preliminary look to see if the implemented approach is correct. If it is, I will proceed to refine the code and enhance unit tests.

@codecov-commenter
Copy link

codecov-commenter commented Dec 24, 2023

Codecov Report

Attention: Patch coverage is 71.42857% with 74 lines in your changes are missing coverage. Please review.

Project coverage is 67.83%. Comparing base (15fbf92) to head (072f350).

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             develop   #11549      +/-   ##
=============================================
- Coverage      67.85%   67.83%   -0.02%     
- Complexity      8915     8918       +3     
=============================================
  Files           1237     1241       +4     
  Lines          40466    40518      +52     
  Branches        4291     4285       -6     
=============================================
+ Hits           27459    27487      +28     
- Misses         11038    11062      +24     
  Partials        1969     1969              
Files Coverage Δ
...alibaba/nacos/client/config/impl/ClientWorker.java 76.40% <100.00%> (-0.17%) ⬇️
...ient/naming/remote/gprc/NamingGrpcClientProxy.java 98.32% <100.00%> (ø)
...a/nacos/common/remote/client/RpcClientFactory.java 95.83% <ø> (ø)
...nacos/common/remote/client/RpcClientTlsConfig.java 100.00% <ø> (ø)
...s/common/remote/client/grpc/GrpcClusterClient.java 100.00% <ø> (ø)
...nacos/common/remote/client/grpc/GrpcSdkClient.java 100.00% <ø> (ø)
...ba/nacos/common/remote/tls/RpcServerTlsConfig.java 0.00% <ø> (ø)
...cos/core/cluster/remote/ClusterRpcClientProxy.java 73.61% <100.00%> (ø)
...baba/nacos/core/remote/grpc/GrpcClusterServer.java 71.42% <100.00%> (-2.26%) ⬇️
.../alibaba/nacos/core/remote/grpc/GrpcSdkServer.java 72.72% <100.00%> (+12.72%) ⬆️
... and 15 more

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15fbf92...072f350. Read the comment docs.

@DSSLP
Copy link

DSSLP commented Jan 3, 2024

May I ask if it can also support TLS RPC communication between raft cluster?

@stone-98
Copy link
Contributor Author

stone-98 commented Jan 3, 2024

May I ask if it can also support TLS RPC communication between raft cluster?

This PR only provides support for GRPC TLS communication between cluster nodes and does not yet support TLS communication between Raft clusters.

@KomachiSion
Copy link
Collaborator

May I ask if it can also support TLS RPC communication between raft cluster?

it should be submit to jraft community

@@ -34,11 +34,11 @@
* @version $Id: RpcClientFactory.java, v 0.1 2020年07月14日 3:41 PM liuzunfei Exp $
*/
public class RpcClientFactory {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent problem.
Please use nacos code style to reformat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@stone-98
Copy link
Contributor Author

stone-98 commented Feb 2, 2024

@KomachiSion @shiyiyue1102 Please review.

@KomachiSion KomachiSion added this to the 2.4.0 milestone Feb 27, 2024
*
* @author githubcheng2978.
* @author stone-98
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要修改作者,可以添加一行author

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改错了🤣,我改回来。

@@ -14,53 +14,60 @@
* limitations under the License.
*/

package com.alibaba.nacos.core.remote.tls;
package com.alibaba.nacos.common.remote.tls;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server的其实可以不动, 继续放在core里。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

放common里,但是client用不到这个类, 是在增加client的大小

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

但是client中依赖RpcTlsFactory创建TlsClientConfig,所以RpcTlsFactory必须放在common,而RpcTlsFactory又直接依赖它,如果放在core将引用不到。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

那应该把RpcTlsFactory进一步拆分, 而且我没找到RpcTlsFactory这个类在哪。。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是RpcTlsConfigFactory,你的意思是要将他拆分成client的factory和cluster的factory是吗?然后分别位于common和core里面

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done。

/**
* reload ssl context.
*/
public void reloadProtocolNegotiator() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么要删除这个?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我理解因为之前仅仅支持sdk server tls,所以将它放在GrpcSdkServer,但是新增cluster server tls后,我将它下沉到BaseRpcServer中去了。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我理解因为之前仅仅支持sdk server tls,所以将它放在GrpcSdkServer,但是新增cluster server tls后,我将它下沉到BaseRpcServer中去了。

没看到BaseRpcServer中有这个method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
说错了是BaseGrpcServer

@stone-98 stone-98 changed the title Support TLS RPC communication between clusters. Support TLS Grpc communication between clusters. Apr 3, 2024
@@ -14,53 +14,60 @@
* limitations under the License.
*/

package com.alibaba.nacos.core.remote.tls;
package com.alibaba.nacos.common.remote.tls;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

那应该把RpcTlsFactory进一步拆分, 而且我没找到RpcTlsFactory这个类在哪。。

/**
* reload ssl context.
*/
public void reloadProtocolNegotiator() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我理解因为之前仅仅支持sdk server tls,所以将它放在GrpcSdkServer,但是新增cluster server tls后,我将它下沉到BaseRpcServer中去了。

没看到BaseRpcServer中有这个method

@stone-98
Copy link
Contributor Author

我把RpcTlsConfigFactory拆分为client和server的,然后调整cluster的参数指定为"nacos.remote.peer.rpc.tls"

@KomachiSion KomachiSion merged commit 5169f06 into alibaba:develop May 15, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature type/feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants