Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacOS下启用Metal加速后,效果较差,需要如何改进呢? #395

Open
xp19870106 opened this issue Oct 23, 2023 · 1 comment
Open
Labels

Comments

@xp19870106
Copy link

我修改了如下的代码部分
首先,设置type为Metal

backend_config.precision = MNN::BackendConfig::Precision_High;
schedule_config.backendConfig = &backend_config;
schedule_config.type = MNN_FORWARD_METAL;
schedule_config.backupType = MNN_FORWARD_METAL;

然后注释掉void MNNRobustVideoMatting::initialize_context()中的初始化代码

// resize session
mnn_interpreter->resizeSession(mnn_session);
// init 0.
// std::fill_n(r1i_tensor->host(), r1i_size, 0.f);
// std::fill_n(r2i_tensor->host(), r2i_size, 0.f);
// std::fill_n(r3i_tensor->host(), r3i_size, 0.f);
// std::fill_n(r4i_tensor->host(), r4i_size, 0.f);

最后更新void MNNRobustVideoMatting::update_context(const std::map<std::string, MNN::Tensor *> &output_tensors)

void MNNRobustVideoMatting::update_context(const std::map<std::string, MNN::Tensor *> &output_tensors)
{
auto device_r1o_ptr = output_tensors.at("r1o");
auto device_r2o_ptr = output_tensors.at("r2o");
auto device_r3o_ptr = output_tensors.at("r3o");
auto device_r4o_ptr = output_tensors.at("r4o");
MNN::Tensor * cpu1 = MNN::Tensor::createHostTensorFromDevice(device_r1o_ptr, true);
MNN::Tensor * cpu2 = MNN::Tensor::createHostTensorFromDevice(device_r2o_ptr, true);
MNN::Tensor * cpu3 = MNN::Tensor::createHostTensorFromDevice(device_r3o_ptr, true);
MNN::Tensor * cpu4 = MNN::Tensor::createHostTensorFromDevice(device_r4o_ptr, true);

device_r1o_ptr->copyFromHostTensor(cpu1);
device_r2o_ptr->copyFromHostTensor(cpu2);
device_r3o_ptr->copyFromHostTensor(cpu3);
device_r4o_ptr->copyFromHostTensor(cpu4);

//device_r1o_ptr->copyToHostTensor(r1i_tensor);
//device_r2o_ptr->copyToHostTensor(r2i_tensor);
//device_r3o_ptr->copyToHostTensor(r3i_tensor);
//device_r4o_ptr->copyToHostTensor(r4i_tensor);

context_is_update = true;
}

最后的效果如下

截屏2023-10-23 16 56 36
Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Mar 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant