Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong detection count threads in NUMA configuration in Windows #5524

Closed
GermanAizek opened this issue Feb 16, 2024 · 2 comments
Closed

Wrong detection count threads in NUMA configuration in Windows #5524

GermanAizek opened this issue Feb 16, 2024 · 2 comments

Comments

@GermanAizek
Copy link
Contributor

GermanAizek commented Feb 16, 2024

@ggerganov,
I solved similar issue in different projects on github, solution is simple to make wrapper function for geting threads, in Windows it is necessary to take into accumulate all logical processors in all NUMA nodes.

Problem lines in common, tests and examples llama.cpp:

unsigned int n_threads = std::thread::hardware_concurrency();
return n_threads > 0 ? (n_threads <= 4 ? n_threads : n_threads / 2) : 4;

llama.cpp/common/common.cpp

Lines 160 to 187 in 5f5808c

} else if (arg == "-t" || arg == "--threads") {
if (++i >= argc) {
invalid_param = true;
break;
}
params.n_threads = std::stoi(argv[i]);
if (params.n_threads <= 0) {
params.n_threads = std::thread::hardware_concurrency();
}
} else if (arg == "-tb" || arg == "--threads-batch") {
if (++i >= argc) {
invalid_param = true;
break;
}
params.n_threads_batch = std::stoi(argv[i]);
if (params.n_threads_batch <= 0) {
params.n_threads_batch = std::thread::hardware_concurrency();
}
} else if (arg == "-td" || arg == "--threads-draft") {
if (++i >= argc) {
invalid_param = true;
break;
}
params.n_threads_draft = std::stoi(argv[i]);
if (params.n_threads_draft <= 0) {
params.n_threads_draft = std::thread::hardware_concurrency();
}
} else if (arg == "-tbd" || arg == "--threads-batch-draft") {

llama.cpp/common/common.cpp

Lines 187 to 196 in 5f5808c

} else if (arg == "-tbd" || arg == "--threads-batch-draft") {
if (++i >= argc) {
invalid_param = true;
break;
}
params.n_threads_batch_draft = std::stoi(argv[i]);
if (params.n_threads_batch_draft <= 0) {
params.n_threads_batch_draft = std::thread::hardware_concurrency();
}
} else if (arg == "-p" || arg == "--prompt") {

llama.cpp/common/common.cpp

Lines 1089 to 1099 in 5f5808c

std::string get_system_info(const gpt_params & params) {
std::ostringstream os;
os << "system_info: n_threads = " << params.n_threads;
if (params.n_threads_batch != -1) {
os << " (n_threads_batch = " << params.n_threads_batch << ")";
}
os << " / " << std::thread::hardware_concurrency() << " | " << llama_print_system_info();
return os.str();
}

Solutions:

[❌] C variant detection is not done here: git-for-windows/git#4766

[✔️] C++11 Windows XP minimal (rewriten modern variant by @mrexodia): x64dbg/x64dbg@d2f6ba7

[✔️] Modern C++17 (variant by @GermanAizek and #llvm-project maintainers): GermanAizek/llvm-project@d1fa25f

  • If you need NUMA detection function and an addition to my variant function, then
    more optimized variant detection NUMA and return count threads on host, must added before GetLogicalProcessorInformationEx:
    if (!IsNUMA())
        return single_cpu_concurrency();

    if (GetLogicalProcessorInformationEx(RelationAll, nullptr, &length) != FALSE)
    {
        return single_cpu_concurrency();
    }
bool IsNUMA() noexcept
{
    ULONG HighestNodeNumber;
    return !(!GetNumaHighestNodeNumber(&HighestNodeNumber) || HighestNodeNumber == 0);
}
@GermanAizek
Copy link
Contributor Author

GermanAizek commented Mar 1, 2024

@ggerganov, can anyone confirm bug?

Here is another good example of bug fixing:
rizinorg/rizin#4167

Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant