Skip to content

Latest commit

 

History

History
88 lines (64 loc) · 9.51 KB

error_code.md

File metadata and controls

88 lines (64 loc) · 9.51 KB

中文版

brpc use brpc::Controller to set and get parameters for one RPC. Controller::ErrorCode() and Controller::ErrorText() return error code and description of the RPC respectively, only accessible after completion of the RPC, otherwise the result is undefined. ErrorText() is defined by the base class of the Controller: google::protobuf::RpcController, while ErrorCode() is defined by brpc::Controller. Controller also has a method Failed() to tell whether RPC fails or not. Relations between the three methods:

  • When Failed() is true, ErrorCode() must be non-zero and ErrorText() be non-empty.
  • When Failed() is false, ErrorCode() is 0 and ErrorText() is undefined (it's empty in brpc currently, but you'd better not rely on this)

Mark RPC as failed

Both client and server in brpc have Controller, which can be set with setFailed() to modify ErrorCode and ErrorText. Multiple calls to Controller::SetFailed leave the last ErrorCode and concatenate ErrorTexts rather than leaving the last one. The framework elaborates ErrorTexts by adding extra prefixes: number of retries at client-side and address of the server at server-side.

Controller::SetFailed() at client-side is usually called by the framework, such as sending failure, incomplete response, and so on. Error may be set at client-side under some situations. For example, you may set error to the RPC if an additional check before sending the request is failed.

Controller::SetFailed() at server-side is often called by the user in the service callback. Generally speaking when error occurs, users call SetFailed(), release all the resources, and return from the callback. The framework fills the error code and message into the response according to communication protocol. When the response is received, the error inside are set into the client-side Controller so that users can fetch them after end of RPC. Note that server does not print errors to clients by default, as frequent loggings may impact performance of the server significantly due to heavy disk IO. A client crazily producing errors could slow the entire server down and affect all other clients, which can even become an attacking method against the server. If you really want to see error messages on the server, turn on the gflag -log_error_text (modifiable at run-time), the server will log the ErrorText of corresponding Controller of each failed RPC.

Error Code in brpc

All error codes in brpc are defined in errno.proto, in which those begin with SYS_ are defined by linux system and exactly same with the ones defined in /usr/include/errno.h. The reason that we put it in .proto is to cross language. The rest of the error codes are defined by brpc.

berror(error_code) gets description for the error code, and berror() gets description for current system errno. Note that ErrorText() != berror(ErorCode()) since ErrorText() contains more specific information. brpc includes berror by default so that you can use it in your project directly.

Following table shows common error codes and their descriptions:

Error Code Value Retry Description Logging message
EAGAIN 11 Yes Too many requests at the same time, hardly happening as it's a soft limit. Resource temporarily unavailable
ENODATA 61 1. The server list returned by Naming Service is empty. 2. When Naming Service changes with all instances modified, Naming Service updates LB by first Remove all and then Add all, the LB instance list may become empty within a short period of time. Fail to select server from xxx
ETIMEDOUT 110 Yes Connection timeout. Connection timed out
EHOSTDOWN 112 Yes Possible reasons: A. The list returned by Naming Server is not empty, but LB cannot select an available server, and LB returns an EHOSTDOWN error. Specific possible reasons: a. Server is exiting (returned ELOGOFF) b. Server was blocked because of some previous failure, the specific logic of the block: 1. For single connection type, the only connection socket is blocked by SetFail, and there are many occurrences of SetFailed in the code to trigger this block. 2. For pooled/short connection type, only when the error number meets does_error_affect_main_socket (ECONNREFUSED, ENETUNREACH, EHOSTUNREACH or EINVAL) will it be blocked 3. After blocking, there is a CheckHealth thread to do health check, Just try to connect, the check interval is controlled by the health_check_interval_s of SocketOptions, and the Socket will be unblocked if it is connected successfully. B. Use the SingleServer method to initialize the Channel (without LB), and the only connection is LOGOFF or blocked (same as above) "Fail to select server from …" "Not connected to … yet"
ENOSERVICE 1001 No Can't locate the service, hardly happening and usually being ENOMETHOD instead
ENOMETHOD 1002 No Can't locate the method. Misc forms, common ones are "Fail to find method=…"
EREQUEST 1003 No fail to serialize the request, may be set on either client-side or server-side Misc forms: "Missing required fields in request: …" "Fail to parse request message, …" "Bad request"
EAUTH 1004 No Authentication failed "Authentication failed"
ETOOMANYFAILS 1005 No Too many sub-channel failures inside a ParallelChannel "%d/%d channels failed, fail_limit=%d"
EBACKUPREQUEST 1007 Yes Set when backup requests are triggered. Not returned by ErrorCode() directly, viewable from spans in /rpcz "reached backup timeout=%dms"
ERPCTIMEDOUT 1008 No RPC timeout. "reached timeout=%dms"
EFAILEDSOCKET 1009 Yes The connection is broken during RPC "The socket was SetFailed"
EHTTP 1010 No HTTP responses with non 2xx status code are treated as failure and set with this code. No retry by default, changeable by customizing RetryPolicy. Bad http call
EOVERCROWDED 1011 Yes Too many messages to buffer at the sender side. Usually caused by lots of concurrent asynchronous requests. Modifiable by -socket_max_unwritten_bytes, 64MB by default. The server is overcrowded
EINTERNAL 2001 No The default error for Controller::SetFailed without specifying a one. Internal Server Error
ERESPONSE 2002 No fail to serialize the response, may be set on either client-side or server-side Misc forms: "Missing required fields in response: …" "Fail to parse response message, " "Bad response"
ELOGOFF 2003 Yes Server has been stopped "Server is going to quit"
ELIMIT 2004 Yes Number of requests being processed concurrently exceeds ServerOptions.max_concurrency "Reached server's limit=%d on concurrent requests"

User-defined Error Code

In C/C++, error code can be defined in macros, constants or enums:

#define ESTOP -114                // C/C++
static const int EMYERROR = 30;   // C/C++
const int EMYERROR2 = -31;        // C++ only

If you need to get the error description through berror, register it in the global scope of your c/cpp file by BAIDU_REGISTER_ERRNO(error_code, description), for example:

BAIDU_REGISTER_ERRNO(ESTOP, "the thread is stopping")
BAIDU_REGISTER_ERRNO(EMYERROR, "my error")

Note that strerror and strerror_r do not recognize error codes defined by BAIDU_REGISTER_ERRNO. Neither does the %m used in printf. You must use %s paired with berror:

errno = ESTOP;
printf("Describe errno: %m\n");                              // [Wrong] Describe errno: Unknown error -114
printf("Describe errno: %s\n", strerror_r(errno, NULL, 0));  // [Wrong] Describe errno: Unknown error -114
printf("Describe errno: %s\n", berror());                    // [Correct] Describe errno: the thread is stopping
printf("Describe errno: %s\n", berror(errno));               // [Correct] Describe errno: the thread is stopping

When the registration of an error code is duplicated, a linking error is generated provided it's defined in C++:

redefinition of `class BaiduErrnoHelper<30>'

Or the program aborts before start:

Fail to define EMYERROR(30) which is already defined as `Read-only file system', abort

You have to make sure that different modules have same understandings on same ErrorCode. Otherwise, interactions between two modules that interpret an error code differently may be undefined. To prevent this from happening, you'd better follow these:

  • Prefer system error codes which have fixed values and meanings, generally.
  • Share code on error definitions between multiple modules to prevent inconsistencies after modifications.
  • Use BAIDU_REGISTER_ERRNO to describe new error code to ensure that same error code is defined only once inside a process.