Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GB28181: When camera restart, can not connect to SRS. #3944

Open
daveyang05 opened this issue Feb 2, 2024 · 32 comments · May be fixed by #3947
Open

GB28181: When camera restart, can not connect to SRS. #3944

daveyang05 opened this issue Feb 2, 2024 · 32 comments · May be fixed by #3947
Assignees
Labels
GB28181 For GB28181. TransByAI Translated by AI/GPT.

Comments

@daveyang05
Copy link

daveyang05 commented Feb 2, 2024

Integrating Hikvision cameras using the GB28181 protocol, after the camera restarts, it takes more than two hours for the video stream to recover. Before recovery, the session status in SRS remains in the 'established' state. Approximately two hours later, the camera sends a remote 'reset' command, after which SRS disconnects the media stream, and then normal operation resumes.

@winlinvip winlinvip changed the title 使用GB28181协议接入海康摄像头,摄像头重启后,2多个小时视频流才恢复,恢复之前SRS的session会话状态一直是established状态,摄像头在两个小时左右发送了一个远程关闭reset操作后,SRS断开媒体流,才恢复正常 Integrating Hikvision cameras using the GB28181 protocol, after the camera restarts, it takes more than two hours for the video stream to recover. Before recovery, the session status in SRS remains in the 'established' state. Approximately two hours later, the camera sends a remote 'reset' command, after which SRS disconnects the media stream, and then normal operation resumes. Feb 2, 2024
@winlinvip winlinvip added the TransByAI Translated by AI/GPT. label Feb 2, 2024
@yushimeng
Copy link

"'reset' command" means tcp reset pkt over sip?

@daveyang05
Copy link
Author

daveyang05 commented Feb 2, 2024

Could you please clarify if GB28181 is capable of actively detecting when a media stream is disconnected and subsequently transitioning the state of the camera session to an initial state, among other potential state changes?

TRANS_BY_GPT4

@daveyang05
Copy link
Author

daveyang05 commented Feb 2, 2024

2024-01-30 10:23:44.903][ERROR][1][47l067d9][104] SIP: Receive err code=1007(SocketRead)(Socket read data failed) : parse message : parse message : grow buffer : read bytes : read
thread [1][47l067d9]: do_cycle() [./src/app/srs_app_gb28181.cpp:1077][errno=104]
thread [1][47l067d9]: parse_message() [./src/protocol/srs_protocol_http_conn.cpp:103][errno=104]
thread [1][47l067d9]: parse_message_imp() [./src/protocol/srs_protocol_http_conn.cpp:153][errno=104]
thread [1][47l067d9]: grow() [./src/protocol/srs_protocol_stream.cpp:162][errno=104]
thread [1][47l067d9]: read() [./src/protocol/srs_protocol_st.cpp:566][errno=104](Connection reset by peer)

Hello! The above text is a log printout from SRS. According to the log, SRS disconnects the media stream connection after receiving the "Connection reset by peer" command. Subsequently, the status transitions from established to init, at which point it can accept registration messages from the camera. Preliminary analysis suggests this is the case.

The initial suspicion is that the support for SRS to access the 28181 protocol did not detect the media stream fault.

TRANS_BY_GPT4

@yushimeng
Copy link

It looks like sip recv thread exit did not notify sip conn thread.

@winlinvip winlinvip changed the title Integrating Hikvision cameras using the GB28181 protocol, after the camera restarts, it takes more than two hours for the video stream to recover. Before recovery, the session status in SRS remains in the 'established' state. Approximately two hours later, the camera sends a remote 'reset' command, after which SRS disconnects the media stream, and then normal operation resumes. GB28181: When camera restart, can not connect to SRS. Feb 2, 2024
@winlinvip winlinvip self-assigned this Feb 2, 2024
@winlinvip winlinvip added the GB28181 For GB28181. label Feb 2, 2024
@winlinvip
Copy link
Member

winlinvip commented Feb 2, 2024

@yushimeng This seems quite reasonable and commendable. If a thread exits abnormally, there might indeed be an issue with how this logic is handled.

TRANS_BY_GPT4

@yushimeng
Copy link

@daveyang05 try this pull #3947
give me a feedback if dont work

@winlinvip
Copy link
Member

@yushimeng Nice work!

@daveyang05
Copy link
Author

daveyang05 commented Feb 4, 2024

Hello! The developer responsible for interfacing with the development of 28181 has taken leave due to personal matters at home. They will commence the verification process in this area immediately upon their return after the New Year, and the results will be promptly communicated to you. Additionally, could you please confirm if it is branch #3947?

TRANS_BY_GPT4

@yushimeng
Copy link

yushimeng commented Feb 5, 2024

When the media connection is disconnected, the session will be directly destroyed, but when the SIP connection is disconnected, the session will not be immediately destroyed. If we follow the current approach, when a new SIP connection is connected, sip/session status recovery and authentication issues need to be handled specially. My idea is to also directly destroy the session when the SIP connection is disconnected
Although I have processed the session recovery logic when SIP immediately reconnects in my current submission, I can also consider deleting this recovery logic in the future
Additionally, I have added SrsResourceManager: erase to avoid bind session before session resource destruction. I am not sure if it will disrupt original desion of the Lazy sweep and resource manager.

@daveyang05
Copy link
Author

daveyang05 commented Feb 5, 2024

Hello! The code modification response is very fast. Is there an anomaly in the SRS resource management, and are there relevant test cases in the original design? Evaluate the impact on the original system design by regressing these test cases.

TRANS_BY_GPT4

@daveyang05
Copy link
Author

daveyang05 commented Feb 18, 2024

@yushimeng, I would like to inquire: Have you not yet officially committed your code modifications to the development branch of SRS?

TRANS_BY_GPT4

@yushimeng
Copy link

yushimeng commented Feb 18, 2024

The previous submission was based on my incorrect understanding of the code. Could you provide the logs and configuration so that I can further pinpoint the issue with greater accuracy?

TRANS_BY_GPT4

@daveyang05
Copy link
Author

daveyang05 commented Feb 18, 2024

Okay, attached is the log information from that time: (Note: The file upload for "Camera Restart Recovery Time Exceeding 2 Hours Log.zip" appears to be incomplete or pending.)

TRANS_BY_GPT4

@daveyang05
Copy link
Author

daveyang05 commented Feb 18, 2024

Okay, the attachment contains the log print information from that time.

TRANS_BY_GPT4

@daveyang05
Copy link
Author

daveyang05 commented Feb 18, 2024

Configuration of the camera's IP settings, GB28181 integration, TCP protocol.
(Attachment: Configuration information for GB28181 camera is being uploaded...)

TRANS_BY_GPT4

@daveyang05
Copy link
Author

daveyang05 commented Feb 21, 2024

@yushimeng, may I inquire about the progress of the issue resolution?

TRANS_BY_GPT4

@daveyang05
Copy link
Author

daveyang05 commented Mar 5, 2024

For SIP terminal registration messages, the CSeq field can be used to determine whether the message is an initial registration or a subsequent periodic registration. This field increments with each report from the terminal. For initial registration messages, the previous session data should be initialized and the process should start anew. If it is a periodic registration message, the current session information should be retained. To differentiate between initial and periodic registration messages, the SRS should keep track of the last reported SIP message's CSeq value, which normally increases continuously. If a decrease is observed (and the last message did not reach or approach 0xffffffff), it can be inferred that the message is an initial registration.

TRANS_BY_GPT4

@winlinvip
Copy link
Member

@daveyang05 Nice work, welcome to file a patch to fix this issue. :)

@yushimeng
Copy link

10.2 Constructing the REGISTER Request
Call-ID: All registrations from a UAC SHOULD use the same Call-ID
header field value for registrations sent to a particular
registrar.

       If the same client were to use different Call-ID values, a
       registrar could not detect whether a delayed REGISTER request
       might have arrived out of order.

  CSeq: The CSeq value guarantees proper ordering of REGISTER
       requests.  A UA MUST increment the CSeq value by one for each
       REGISTER request with the same Call-ID.

@daveyang05
Copy link
Author

daveyang05 commented Mar 7, 2024

SIP registration messages are typically sent at minute-level intervals. Observations from Hikvision cameras indicate that they initiate registration messages at least every 10 minutes, making the likelihood of message disorder occurring within a few minutes quite low. To determine whether a message from a camera is the initial registration or a subsequent message, one can check if the CSeq number has been reversed and if the Call-ID is the same as the previous registration message. Within the same session, the initial registration message and subsequent session messages should have the same Call-ID, as confirmed by the requirements of the SIP protocol and packet captures from Hikvision cameras.

TRANS_BY_GPT4

@daveyang05
Copy link
Author

daveyang05 commented Mar 7, 2024

@yu Gong, you can enhance your original modifications by adding appropriate logic to compare incoming SIP registration messages with previously stored session registration information. If there is a change in the Call-ID or if the CSeq number is lower than before, then clear the existing session and initiate a new SIP session. Otherwise, maintain the existing session.

TRANS_BY_GPT4

@daveyang05
Copy link
Author

daveyang05 commented Mar 20, 2024

General Yang and Engineer Yu, after modifying the GB28181 code, we have tested and verified that the Hikvision cameras can quickly recover the video stream after a restart. The code changes have been made in the version 5.0 branch.
srs_app_gb28181.zip

TRANS_BY_GPT4

@codeex
Copy link

codeex commented Mar 24, 2024

@daveyang05 can you publish a docker image for patch it? I don't find the release package to fix it.

@daveyang05
Copy link
Author

daveyang05 commented Mar 25, 2024

@yu Gong, we are currently engaged in development and validation for version 5.0. The attached Docker container has been modified and released based on that version branch.

TRANS_BY_GPT4

@daveyang05
Copy link
Author

daveyang05 commented Mar 25, 2024

As mentioned above.

TRANS_BY_GPT4

@codeex
Copy link

codeex commented Mar 25, 2024

@daveyang05 , I can't find branch v5.0 to compile it, what can I do to find it for docker or source code?

@daveyang05
Copy link
Author

daveyang05 commented Mar 26, 2024

The text appears to be a link to a downloadable ZIP file named "srs_app_gb28181 camera restart 2 hours recovery code modification.zip" hosted on the GitHub platform under the repository 'ossrs/srs'. The file name suggests that it contains modifications to the code for an application related to the GB28181 protocol, which is a Chinese national standard for video surveillance systems. The modifications might be for a feature that allows a camera to recover or restart after 2 hours.

TRANS_BY_GPT4

@daveyang05
Copy link
Author

daveyang05 commented Mar 26, 2024

Code

TRANS_BY_GPT4

@codeex
Copy link

codeex commented Mar 27, 2024

I downloaded the version 5.0 release branch, substituted the altered GB28181 file, and subsequently recompiled to create the image. Despite redeployment, the changes do not seem to be applied. Restarting the Hikvis
srs5-disconnect.log
ion camera did not resolve the issue, as it still fails to reconnect to the video stream, although the camera status indicates it is online. The cause of the problem is unclear.

TRANS_BY_GPT4

@daveyang05
Copy link
Author

daveyang05 commented Mar 27, 2024

srs_error_t SrsLazyGbSipTcpConn::bind_session(SrsSipMessage* msg, SrsLazyObjectWrapper** psession)
{
srs_error_t err = srs_success;

string device = msg->device_id();
if (device.empty()) return err;

// Only create session for REGISTER request.
if (msg->type_ != HTTP_REQUEST || msg->method_ != HTTP_REGISTER) return err;

// The lazy-sweep wrapper for this resource.
SrsLazyObjectWrapper<SrsLazyGbSipTcpConn>* wrapper = wrapper_root_;
srs_assert(wrapper); // It MUST never be NULL, because this method is in the cycle of coroutine of receiver.

// Find exists session for register, might be created by another object and still alive.
SrsLazyObjectWrapper<SrsLazyGbSession>* session = dynamic_cast<SrsLazyObjectWrapper<SrsLazyGbSession>*>(_srs_gb_manager->find_by_id(device));

// If a session is found by device ID and the current message is a registration message
**if (session && msg->is_register()) {
    // If the cseq number decreased or the call id changed
    _if (msg->cseq_number_ < register_->cseq_number_ || msg->call_id_ != register_->call_id_) {
        // Remove resource from GB manager
        _srs_gb_manager->remove(session);

        // Set session to NULL
        session = NULL;
    }
}_**

if (!session) {
    // Create new GB session.
    session = new SrsLazyObjectWrapper<SrsLazyGbSession>();

    if ((err = session->resource()->initialize(conf_)) != srs_success) {
        srs_freep(session);
        return srs_error_wrap(err, "initialize");
    }

Please verify if the bind_session function in the downloaded code contains the following code.
if (session && msg->is_register()) {
// If the cseq number decreased or the call id changed
if (msg->cseq_number < register_->cseq_number_ || msg->call_id_ != register_->call_id_) {
// Remove resource from GB manager
_srs_gb_manager->remove(session);

        // Set session to NULL
        session = NULL;
    }
}_

TRANS_BY_GPT4

@codeex
Copy link

codeex commented Mar 28, 2024

@daveyang05 yes, I edit this file and the content is below.

if (session && msg->is_register()) {
        srs_trace("SIP: receive register message %s", device.c_str());
        // If the cseq number decreased or the call id changed
        if (msg->cseq_number_ < register_->cseq_number_ || msg->call_id_ != register_->call_id_) {
            // Remove resource from GB manager
            srs_trace("SIP: remove session");
            _srs_gb_manager->remove(session);
            
            // Set session to NULL
            session = NULL;
        }
    }

but I hasn't found the log.

@winlinvip
Copy link
Member

winlinvip commented Mar 28, 2024

First, thank @daveyang05 @codeex @yushimeng to describe the issue and background, which is very important for future bug fixing.

Please do not discussion code in issue, instead please file a PullRequest and discuss in the pullrequest.

If we discuss code changes in issues, there will be incorrect and temporary code changes that confuses other developers.

So I will freeze this issue for too heated, please file an pull request and discuss there.

@ossrs ossrs locked as too heated and limited conversation to collaborators Mar 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
GB28181 For GB28181. TransByAI Translated by AI/GPT.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants