-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MacOS下启用Metal加速后,效果较差,需要如何改进呢? #395
Labels
Comments
This issue is stale because it has been open for 30 days with no activity. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
我修改了如下的代码部分
首先,设置type为Metal
backend_config.precision = MNN::BackendConfig::Precision_High;
schedule_config.backendConfig = &backend_config;
schedule_config.type = MNN_FORWARD_METAL;
schedule_config.backupType = MNN_FORWARD_METAL;
然后注释掉void MNNRobustVideoMatting::initialize_context()中的初始化代码
// resize session
mnn_interpreter->resizeSession(mnn_session);
// init 0.
// std::fill_n(r1i_tensor->host(), r1i_size, 0.f);
// std::fill_n(r2i_tensor->host(), r2i_size, 0.f);
// std::fill_n(r3i_tensor->host(), r3i_size, 0.f);
// std::fill_n(r4i_tensor->host(), r4i_size, 0.f);
最后更新void MNNRobustVideoMatting::update_context(const std::map<std::string, MNN::Tensor *> &output_tensors)
void MNNRobustVideoMatting::update_context(const std::map<std::string, MNN::Tensor *> &output_tensors)
{
auto device_r1o_ptr = output_tensors.at("r1o");
auto device_r2o_ptr = output_tensors.at("r2o");
auto device_r3o_ptr = output_tensors.at("r3o");
auto device_r4o_ptr = output_tensors.at("r4o");
MNN::Tensor * cpu1 = MNN::Tensor::createHostTensorFromDevice(device_r1o_ptr, true);
MNN::Tensor * cpu2 = MNN::Tensor::createHostTensorFromDevice(device_r2o_ptr, true);
MNN::Tensor * cpu3 = MNN::Tensor::createHostTensorFromDevice(device_r3o_ptr, true);
MNN::Tensor * cpu4 = MNN::Tensor::createHostTensorFromDevice(device_r4o_ptr, true);
device_r1o_ptr->copyFromHostTensor(cpu1);
device_r2o_ptr->copyFromHostTensor(cpu2);
device_r3o_ptr->copyFromHostTensor(cpu3);
device_r4o_ptr->copyFromHostTensor(cpu4);
//device_r1o_ptr->copyToHostTensor(r1i_tensor);
//device_r2o_ptr->copyToHostTensor(r2i_tensor);
//device_r3o_ptr->copyToHostTensor(r3i_tensor);
//device_r4o_ptr->copyToHostTensor(r4i_tensor);
context_is_update = true;
}
最后的效果如下
The text was updated successfully, but these errors were encountered: