[Regression] ObservationWriter.AddList and DiscreteActionOutputApplier.Apply have become inefficient with Sentis 1.2.0+ #6112

HyperlightDennis · 2024-05-09T06:16:33Z

ObservationWriter.AddList loops over data and writes it to a tensor in Line 189.

for (var index = 0; index < data.Count; index++)
{
    var val = data[index];
    ((TensorFloat)m_Proxy.data)[m_Batch, index + m_Offset + writeOffset] = val;
}

In Barracuda 3.0, these writes would be written a cache (https://github.com/Unity-Technologies/barracuda-release/blob/release/3.0.1/Barracuda/Runtime/Core/Tensor.cs#L2300) before being uploaded to the tensor. This would result in 1 job per 1 sensor. However, in Sentis 1.2.0+, there is no cache. In BurstTensorData.cs, the following code is executed:

    public void Set<T>(int index, T value) where T : unmanaged
    {
        CompleteAllPendingOperations();
        m_Array.Set<T>(m_Offset + index, value);
    }

This results in 1 job executed per float observation.

The same applies with DiscreteActionOutputApplier.Apply, which fetches from a tensor in Line 94:

for (var j = 0; j < actionSize; j++)
{
    discreteBuffer[j] = ((TensorInt)tensorProxy.data)[agentIndex, j];
}

In Sentis 1.2.0+, there is no cache, so this results in 1 job per action, instead of 1 job total. In BurstTensorData.cs, the following code is executed:

    public T Get<T>(int index) where T : unmanaged
    {
        CompleteAllPendingOperations();
        return m_Array.Get<T>(m_Offset + index);
    }

A cache should probably be implemented on the MLAgents inference side to reduce the number of job requests.

To Reproduce

Run any model and profile it in the Unity Editor using Burst as a backend.
Inspect the GenerateTensors and ApplyTensors markers. Compare the results using MLAgents 2.0.1 (with Barracuda 3) vs MLAgents 3.0.0 (with Sentis).
Observed the increase in the number of job requests.

Screenshots
With MLAgents 2.0.1 + Barracuda 3, GenerateTensors created 3 jobs, one for each of my observation tensors. This took 0.377 ms.

With MLAgents 2.0.1 + Barracuda 3, ApplyTensors took 0.02 ms.

With MLAgents 3.0.0 + Sentis 1.2.0, GenerateTensors created 5,000+ jobs and took 8.97 ms (24x longer).

With MLAgents 3.0.0 + Sentis 1.2.0, ApplyTensors created 250+ jobs and took 0.22 ms (11x longer).

Environment (please complete the following information):

Unity Version: Unity 2022.3.27f1
OS + version: Windows 11
_ML-Agents version: 3.0.0

The text was updated successfully, but these errors were encountered:

HyperlightDennis added the bug Issue describes a potential bug in ml-agents. label May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Regression] ObservationWriter.AddList and DiscreteActionOutputApplier.Apply have become inefficient with Sentis 1.2.0+ #6112

[Regression] ObservationWriter.AddList and DiscreteActionOutputApplier.Apply have become inefficient with Sentis 1.2.0+ #6112

HyperlightDennis commented May 9, 2024

[Regression] ObservationWriter.AddList and DiscreteActionOutputApplier.Apply have become inefficient with Sentis 1.2.0+ #6112

[Regression] ObservationWriter.AddList and DiscreteActionOutputApplier.Apply have become inefficient with Sentis 1.2.0+ #6112

Comments

HyperlightDennis commented May 9, 2024