Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio consumer changeProducer API implementation (PoC) #768

Draft
wants to merge 4 commits into
base: v3
Choose a base branch
from

Conversation

vpalmisano
Copy link
Contributor

@vpalmisano vpalmisano commented Feb 14, 2022

This PR implements a proof-of-concept allowing audio consumers to dynamically switch the producer instance, without any client renegotiations:

consumer.changeProducer(producerId1) // consumer will receive data from producerId1
...
consumer.changeProducer(producerId2) // consumer now will receive data from producerId2

The motivation for adding this feature is supporting large rooms scenarios with hundred (or thousands) of audio producers. In this case, it is difficult, at client side, to handle such a large number of consumers, giving the fact that browsers have limitations on the number <audio> objects (and in general to the number of peer connection transceivers) that can be created inside the same page. With this API, the client can create one single audio consumer, and at server side we can use the active speaker data to automatically feed that single audio consumer with the active producer.
Comments are welcomed!

The new API can be tested using this medianode-demo branch enabling the singleAudioConsumerMode query parameter (this PR is not intended to be merged, the only scope is demonstrating the new feature).

Current TODO list:

  • Check if the introduced API works.
  • Enforce the changeProducer availability to audio-only consumers.
  • Implement a feature that allows to create audio Consumers with not connected producers; in this case, with the new API, it will be possible to connect producers later.

@jmillan
Copy link
Member

jmillan commented Feb 17, 2022

@vpalmisano, what about the rest of Consumer types? Simulcast, SVC ?

Being mediasoup a library we should try not to do changes solely for a specific use case.

@nazar-pc
Copy link
Collaborator

Both frontend and backend assumed producerId to always be constant. While I see how this would be convenient, I feel like this might be achieved in more elegant way with something like pipe transport internally instead of exposing this on every consumer.

Something like producer.changeSourceProducer(producerId2).

@vpalmisano
Copy link
Contributor Author

vpalmisano commented Feb 17, 2022

what about the rest of Consumer types? Simulcast, SVC ?

In those cases the implementation should also check if that switch is possible (e.g. checking codec compatibility before switching producer).

feel like this might be achieved in more elegant way with something like pipe transport internally instead of exposing this on every consumer.

So are you proposing to implement the switch logic at producer side, connecting the webrtc consumer to this piped producer?
Keep in mind that the switch operation should happen in a very fast way, otherwise we will miss too much audio content from the next speaker.

@nazar-pc
Copy link
Collaborator

what about the rest of Consumer types? Simulcast, SVC ?

Simulcast audio? 🙃

So are you proposing to implement the switch logic at producer side, connecting the webrtc consumer to this piped producer?

I'm not saying that it should use pipe transport, but I think would be nice if mechanism was similar. This way consumers will continue to have fixed producer ID and producer will contain all of the logic.

@jmillan
Copy link
Member

jmillan commented Feb 17, 2022

Simulcast audio? 🙃

Of course not talking about audio only Consumers :-)

@nazar-pc
Copy link
Collaborator

Well, this PR is just for audio. Video will be more difficult and will only work for the same codec.

@jmillan
Copy link
Member

jmillan commented Feb 17, 2022

Well, this PR is just for audio.

I know, I know this PR is just for audio...

@ibc ibc requested a review from jmillan February 17, 2022 13:32
@ibc
Copy link
Member

ibc commented Feb 17, 2022

Thanks for this effort, @vpalmisano. Similar to what Jose said, this feature (in its current state) is not suitable for mediasoup but for a super specific use case in which all participants send audio with exactly the same RTP parameters. This PR doesn't consider cases in which Alice and Bob may be producing OPUS with different ptime or any other different codec parameter. In mediasoup, the Consumer knows the exact RTP parameters of the sending side so its remote SDP includes them and the decoder knows exactly what to expect. However, in this PR it may happen that Carol is receiving Alice's OPUS Producer and suddenly it receives Bob's OPUS Producer without Carol being notified about changes in those RTP parameters and without the required SDP renegotiation in its client side.

We cannot make mediasoup implement these kinds of super specific use cases that make too many assumptions. Anyway I understand this PR as a PoC to be considered for future updates.

@micaelgallego
Copy link

Maybe I'm saying something stupid, but what if the check if the producers are "compatible" regarding to RTP parameters throwing an error if not?

Then, the application could force somehow all participants in the same session have the same paramters. I don't know, maybe it is not possible to force all the parameters.

I don't think is very specific use case. It is a common way to implement sessions with a big number of participants. But I understand you don't want it included in mediasoup codebase as it is not generic (only audio).

I suppose the current way to send to a participant the audio of last N speakers is with SDP renegotiation, but maybe the time elapsed between the participant has started to speak and when his audio is finally arriving to the others' participant browsers is too much, with potential audio loss.

Following @nazar-pc comment, this dynamic routing capability maybe can be implemented in another process receiving audio streams of all participants and selecting only one of them. Also, it could be changed dynamically. The generated audio stream can be sent to a mediasoup router and from it to the user by means of WebRTC. It is not ideal becase that interprocess communication increase latency and CPU resources, but solves the issue of re-negotiation delay.

@jmillan
Copy link
Member

jmillan commented Mar 11, 2022

Considering that it is assumed (and asserted) that a Consumer will always consume from a Producer with the same RTP Parameters, I believe a more generic solution would be the following:

  • Have a new PluggableProducer class. transport.createPlugableProducer(producerOptions)
  • Consumers can consume from this Producer as if it was a normal one, by specifying it’s ID and so on.
  • A real Producer can be plugged into this one pluggableProducer.plug(producer)
  • The RTP traffic to PluggableProducer does not arrive from any socket but from the plugged Producer instead.
  • The logic is now in Router:
    • It contains a map of PluggableProducer to Producer and vice versa.
    • When a RTP packet comes from a Producer associated to a PluggableProducer, Router passes the packet to the corresponding PluggableProducer and later to its corresponding Consumers.
    • PluggableProducer::ReceiveRtpPacket is responsible for rewriting the SSRCs and the needed info in order to mask the original Producer and face its consumers with the encoding parameters that were negotiated.
    • On the other side, when a Consumer requests a key frame to the corresponding PluggableProducer, the Router would rely the request to its associated Producer.
time 1: ProducerA -> PluggableProducerX -> Consumers
time 2: ProducerB -> PluggableProducerX -> Consumers
time 3: ProducerC -> PluggableProducerX -> Consumers

@nazar-pc
Copy link
Collaborator

@jmillan I like the proposal. And we can ensure that pluggableProducer.plug(producer) is only called on the producer with compatible RTP parameters or fail otherwise.

@ibc
Copy link
Member

ibc commented Mar 12, 2022

If we go that way we will need PluggableSimpleConsumer, PluggableSimulcastConsumer and so on. For v4 I think we should merge all XxxxConsumer classes (including PipeConsumer and DirectConsumer) into one and also merge PlainTransport and PipeTransport into one (both are already basically the same with a few differences plus the latter uses PipeConsumers instead of normal ones). We should have a single Consumer class with capabilities to deal with simulcast, SVC, spatial and temporal layers, "pipe" option (to behave as a PipeConsumer which lets all simulcast stream go together to the endpoint), etc. We should make such a Consumer capable of dynamically switching codecs, RTP parameters, simulcast, SVC, etc. And we are done. Why? Because right now we have a SimpleConsumer class that cannot deal with temporal layers and in case of VP8 with a single stream and N temporal layers we need to use SimulcastConsumer, which makes no sense at all. We should have a single Consumer class capable of dealing with everything (spatial, temporal layers, etc) and make the code behave different (when needed) based on the codec. That's why we abstract the codec into the PayloadDescriptor class.

The proposal of yet another PluggableConsumer (and friends) goes against that direction.

@jmillan
Copy link
Member

jmillan commented Mar 12, 2022

The proposal of yet another PluggableConsumer (and friends) goes against that direction.

I think you have misunderstood it. The proposal is about PluggableProducer, not PluggableConsumer. It's the opposite.

@jmillan
Copy link
Member

jmillan commented Mar 12, 2022

There is zero change on Consumers within the proposal.

#768 (comment)

Yet, it is not 100% defined, neither is it the intention.

@ibc
Copy link
Member

ibc commented Mar 12, 2022

PluggableProducer::ReceiveRtpPacket is responsible for rewriting the SSRCs and the needed info in order to mask the original Producer and face its consumers with the encoding parameters that were negotiated.

Still I don't understand how it solves the consuming side in which RTP parameters may be different.

@jmillan
Copy link
Member

jmillan commented Mar 12, 2022

Still I don't understand how it solves the consuming side in which RTP parameters may be different.

Just to clarify, let me summarise from the beginning.

There are N Producers in a Router. Let's imagine that each Producer represents a single endpoint. In a typical scenario each endpoint would like to consume all the Router's Producers but itself, meaning each endpoint would have N-1 Consumers. In a Router with 100 Producers, each endpoint would have 99 Consumers.

This feature aims to reduce the number of Producers that are consumed at a given time in order to reduce to a known number the Consumers created for each endpoint. Imagine there are 100 Producers in a Router, but the application will limit to 5 the number of consumable Producers at a given time (it will base which Producers to consume based on application logic: LastN, etc.). In this scenarios Consumers will not be created out of Producers but out of PluggableProducers instead. The job of the PluggableProducer is to be consumed as if it was a real one, but a PluggableProducer is just a facade; it is no source of RTP by any means, its source of RTP is a real Producer which is plugged into it at a given moment. Consumers are hence created out of the PluggableProducer which RTP parameters will not change on its whole life time.

The PluggableProducer needs to present its Consumers the RTP with the encodings expected by them, it hence needs to replace the SSRCs and any other needed info of the real Producer with its own.

When it comes to unplug one Producer from the PluggableProducer and plug another one, PluggableProducer's Consumers need to be paused and resumed respectively so the sequence number, timestamps, referenceSpatialLayer, etc are reseted too and new ones are considered from scratch (we are already ready for that).

@ibc
Copy link
Member

ibc commented Mar 12, 2022

For some reason I did read about "PluggableConsumer" in your previous test and hence my confusion. It makes sense as you said. But, regarding RTP parameters, remember that the consuming endpoint/device receives the RTP parameters of the corresponding Producer and those are needed for the SDP negotiation for things such as enabling stereo, ptime and so on. If we replace the producer a consumer is consuming, we must signal its RTP parameters (ok, the mapped ones as we always do) to the consuming device to run another SDP O/A and honor new stereo, ptime settings etc.

@nazar-pc
Copy link
Collaborator

If we replace the producer a consumer is consuming, we must signal its RTP parameters

Only if we allow changing real producer to one with different parameters. Initial design may not allow that and still be useful.

I think we should just close this and create an issue describing PluggableProducer mechanism then.

@jmillan
Copy link
Member

jmillan commented Mar 12, 2022

If we replace the producer a consumer is consuming, we must signal its RTP parameters

This would be one approach. Which is legit, but has it's drawbacks:

  • It requires a new API in mediasoup-client to update the RTP parameters on a Consumer. This is not an issue per se.
  • It requires a RTT every time an unplug/plug is made between Producers with incompatible RTP parameters.

Even if we went that way, the application could easily bypass it by doing which IMHO is a more library-ish way of doing it:

Instead of renegotiating the new RTP parameters with an existing Consumer, create a new PluggableProducer and the corresponding Consumers with the new set of RTP parameters when needed. It's the application responsibility to pause or destroy any existing Consumers that are not needed anymore, or just keep them as in an Object Pool for later usage at zero cost and zero setup time, instead of renegotiating the Consumers at the cost of one RTT each time a PluggableProducer is to be plugged a Producer that does not match the current Producer's RTP parameters.

With this approach we have one RTT just the first time we create a new PluggableProducer and its Consumers, but for the rest of the times that Consumer can be reused instantaneously, without any delay.

IMHO there is no real benefit on renegotiating, but the extra RTT implications. And it can anyway be bypassed by applications as exposed above. The exposed solution fits IMHO the real purpose of the request, which implies instantaneous[*] transition consuming two different Producers.

[*]: Only upon creation of the Consumers for the PluggableProducer, a RTT is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants