You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ObservationWriter.AddList loops over data and writes it to a tensor in Line 189.
for (var index = 0; index < data.Count; index++)
{
var val = data[index];
((TensorFloat)m_Proxy.data)[m_Batch, index + m_Offset + writeOffset] = val;
}
In Sentis 1.2.0+, there is no cache, so this results in 1 job per action, instead of 1 job total. In BurstTensorData.cs, the following code is executed:
public T Get<T>(int index) where T : unmanaged
{
CompleteAllPendingOperations();
return m_Array.Get<T>(m_Offset + index);
}
A cache should probably be implemented on the MLAgents inference side to reduce the number of job requests.
To Reproduce
Run any model and profile it in the Unity Editor using Burst as a backend.
Inspect the GenerateTensors and ApplyTensors markers. Compare the results using MLAgents 2.0.1 (with Barracuda 3) vs MLAgents 3.0.0 (with Sentis).
Observed the increase in the number of job requests.
Screenshots
With MLAgents 2.0.1 + Barracuda 3, GenerateTensors created 3 jobs, one for each of my observation tensors. This took 0.377 ms.
With MLAgents 2.0.1 + Barracuda 3, ApplyTensors took 0.02 ms.
With MLAgents 3.0.0 + Sentis 1.2.0, GenerateTensors created 5,000+ jobs and took 8.97 ms (24x longer).
With MLAgents 3.0.0 + Sentis 1.2.0, ApplyTensors created 250+ jobs and took 0.22 ms (11x longer).
Environment (please complete the following information):
Unity Version: Unity 2022.3.27f1
OS + version: Windows 11
_ML-Agents version: 3.0.0
The text was updated successfully, but these errors were encountered:
ObservationWriter.AddList loops over data and writes it to a tensor in Line 189.
In Barracuda 3.0, these writes would be written a cache (https://github.com/Unity-Technologies/barracuda-release/blob/release/3.0.1/Barracuda/Runtime/Core/Tensor.cs#L2300) before being uploaded to the tensor. This would result in 1 job per 1 sensor. However, in Sentis 1.2.0+, there is no cache. In BurstTensorData.cs, the following code is executed:
This results in 1 job executed per float observation.
The same applies with DiscreteActionOutputApplier.Apply, which fetches from a tensor in Line 94:
In Sentis 1.2.0+, there is no cache, so this results in 1 job per action, instead of 1 job total. In BurstTensorData.cs, the following code is executed:
A cache should probably be implemented on the MLAgents inference side to reduce the number of job requests.
To Reproduce
Screenshots
With MLAgents 2.0.1 + Barracuda 3, GenerateTensors created 3 jobs, one for each of my observation tensors. This took 0.377 ms.
With MLAgents 2.0.1 + Barracuda 3, ApplyTensors took 0.02 ms.
With MLAgents 3.0.0 + Sentis 1.2.0, GenerateTensors created 5,000+ jobs and took 8.97 ms (24x longer).
With MLAgents 3.0.0 + Sentis 1.2.0, ApplyTensors created 250+ jobs and took 0.22 ms (11x longer).
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: