Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched/tcb: use shared group for kthreads #12320

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

yf13
Copy link
Contributor

@yf13 yf13 commented May 10, 2024

Summary

Current task group data is embedded in the task TCB, this is also used for kernel threads, which seems can share the group as they all live in kernel space.

This patch uses tcb_s for kthreads with a shared group instance. Thus the overhead is reduced when more kthreads are in use.

Memory usage of group instance, i.e. sizeof(task_group_s), on rv-virt device:

config size
nsh 720
nsh64 1160
knsh32 248
knsh64 440

Kernel memory footprints collected from attached logs:

config kthreads # upstream patched saved
nsh 1 7044 7044 0
nsh64 1 10344 10344 0
knsh32 2 10348 9660 588
knsh64 2 13304 12168 1136

The savings are beyond sizeof(task_group_s) due to subordinate data of groups.

Impact

This may affect all configs.

Testing

  • QEMU 6.2 on Ubuntu 22.04 with configs nsh, nsh64, knsh32 and knsh64. See logs-12320.tar.gz for more details:
    • the embedded folder has logs from upstream version.
    • the kthreads folder has logs from patched version.
  • Github CI checks

sched/group/group_create.c Outdated Show resolved Hide resolved
sched/group/group_create.c Outdated Show resolved Hide resolved
@yf13
Copy link
Contributor Author

yf13 commented May 10, 2024

@anchao, let's discuss here as these comments are visible in gh tool as well.

Can you teach your opinion about the MM_TLSF_MANAGER? It seems having O(1) alloc() and works in flat build mode. I added *.tlsf.log files for your review.

@anchao
Copy link
Contributor

anchao commented May 11, 2024

@anchao, let's discuss here as these comments are visible in gh tool as well.

Can you teach your opinion about the MM_TLSF_MANAGER? It seems having O(1) alloc() and works in flat build mode. I added *.tlsf.log files for your review.

Although TLSF has more advantages than Best fit in terms of time complexity O(1), it will cause serious memory fragmentation issues in aging and persistence tests.

I have added an attachment containing the test results of different allocators (Sheet from @XuNeo Thanks), You can generate a chart and observe the memory trend (This chart may not show the worst case scenario of TLSF)
20221110_mem_test.xlsx

@yf13
Copy link
Contributor Author

yf13 commented May 11, 2024

@anchao, thanks for sharing the details.

Here I am trying to keep task_tcb_s for userspace tasks but use tcb_s for kthreads as suggested by @xiaoxiang781216 and it looks working for nsh64, I still check other configs. Is this approach fine for you?

@anchao
Copy link
Contributor

anchao commented May 11, 2024

@anchao, thanks for sharing the details.

Here I am trying to keep task_tcb_s for userspace tasks but use tcb_s for kthreads as suggested by @xiaoxiang781216 and it looks working for nsh64, I still check other configs. Is this approach fine for you?

Sounds good, could you update this PR, I'd like to review the code directly

@yf13 yf13 changed the title sched/tcb: drop task_tcb_s.group sched/tcb: reduce kthread overhead May 11, 2024
@yf13 yf13 force-pushed the sched branch 3 times, most recently from e93ddd2 to f443ffc Compare May 11, 2024 05:23
@yf13 yf13 force-pushed the sched branch 5 times, most recently from 1461cbe to c9a6669 Compare May 11, 2024 10:14
@acassis
Copy link
Contributor

acassis commented May 11, 2024

Please fix this failure:

In function 'nxthread_create',
    inlined from 'kthread_create_with_stack' at task/task_create.c:254:10:
Error: task/task_create.c:97:18: error: array subscript 'struct task_tcb_s[0]' is partly outside array bounds of 'unsigned char[368]' [-Werror=array-bounds]
   97 |   tcb->cmn.flags = ttype | TCB_FLAG_FREE_TCB;
      |                  ^
In file included from task/task_create.c:33:
task/task_create.c:88:9: note: object of size 368 allocated by 'zalloc'
   88 |   tcb = kmm_zalloc(ret);
      |         ^~~~~~~~~~
Error: task/task_create.c:111:7: error: array subscript 'struct task_tcb_s[0]' is partly outside array bounds of 'unsigned char[368]' [-Werror=array-bounds]
  111 |   pid = tcb->cmn.pid;
      |   ~~~~^~~~~~~~~~~~~~
task/task_create.c:88:9: note: object of size 368 allocated by 'zalloc'
   88 |   tcb = kmm_zalloc(ret);
      |         ^~~~~~~~~~
In function 'nxthread_create',
    inlined from 'kthread_create_with_stack' at task/task_create.c:254:10,
    inlined from 'kthread_create' at task/task_create.c:285:10:
Error: task/task_create.c:97:18: error: array subscript 'struct task_tcb_s[0]' is partly outside array bounds of 'unsigned char[368]' [-Werror=array-bounds]
   97 |   tcb->cmn.flags = ttype | TCB_FLAG_FREE_TCB;
      |                  ^
task/task_create.c:88:9: note: object of size 368 allocated by 'zalloc'
   88 |   tcb = kmm_zalloc(ret);
      |         ^~~~~~~~~~
Error: task/task_create.c:111:7: error: array subscript 'struct task_tcb_s[0]' is partly outside array bounds of 'unsigned char[368]' [-Werror=array-bounds]
  111 |   pid = tcb->cmn.pid;
      |   ~~~~^~~~~~~~~~~~~~
task/task_create.c:88:9: note: object of size 368 allocated by 'zalloc'
   88 |   tcb = kmm_zalloc(ret);
      |         ^~~~~~~~~~
cc1: all warnings being treated as errors
make[1]: *** [Makefile:61: task_create.o] Error 1
make[1]: Target 'libsched.a' not remade because of errors.

@yf13
Copy link
Contributor Author

yf13 commented May 11, 2024

@acassis I am wondering that is this GCC 12 issue, just pushed a version using $progma to see if it can be fixed. Please teach if there are better fixes.

@yf13 yf13 force-pushed the sched branch 2 times, most recently from 229d4d6 to 69ba63d Compare May 11, 2024 12:20
@acassis
Copy link
Contributor

acassis commented May 11, 2024

@yf13 adding #pragma in the generic code is not a good idea, maybe you can try to fix the structure to avoid the issue, please see here: micropython/micropython#7064

He create a union to storage the right size

@yf13
Copy link
Contributor Author

yf13 commented May 11, 2024

@yf13 adding #pragma in the generic code is not a good idea, maybe you can try to fix the structure to avoid the issue, please see here: micropython/micropython#7064

@acassis thanks, will remove #pragma later after passing all checks.

Now case test_ltp_interfaces_lio_listio_2_1 still fails, need more investigation about it, hints are highly welcomed.

@yf13 yf13 changed the title sched/tcb: share group for kthreads sched/tcb: use shared group for kthreads May 22, 2024
@yf13 yf13 force-pushed the sched branch 3 times, most recently from f4ff342 to 75cafed Compare May 23, 2024 05:18
@yf13
Copy link
Contributor Author

yf13 commented May 23, 2024

@xiaoxiang781216 it looks that the LTP timeouts in sim/citest still happens, so I disabled them again to see if there are remaining issues. Then a few CI attempts failed due to connectivity issues encountered during CI. Then the usrsocktest issues come back again in my last run.

So looks like that I have to wait until CI is normal to trigger another check.

@xiaoxiang781216
Copy link
Contributor

@xiaoxiang781216 it looks that the LTP timeouts in sim/citest still happens, so I disabled them again to see if there are remaining issues. Then a few CI attempts failed due to connectivity issues encountered during CI. Then the usrsocktest issues come back again in my last run.

So looks like that I have to wait until CI is normal to trigger another check.

@Donny9 is looking this problem.

@yf13 yf13 force-pushed the sched branch 2 times, most recently from a7e0136 to 9868fb6 Compare May 29, 2024 11:07
@yf13
Copy link
Contributor Author

yf13 commented May 30, 2024

@anchao, @xiaoxiang781216 looks like the CI system is okay now. Please give this patch a review when free and let me know if there is anything I can do.

include/nuttx/tls.h Show resolved Hide resolved
sched/group/group_create.c Outdated Show resolved Hide resolved
sched/group/group_create.c Outdated Show resolved Hide resolved
@@ -222,5 +238,6 @@ void group_postinitialize(FAR struct task_tcb_s *tcb)
* task has exited.
*/

group->tg_pid = tcb->cmn.pid;
group->tg_pid = (group == &g_kthread_group) ? group->tg_pid :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change

Copy link
Contributor Author

@yf13 yf13 Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kthreads group's pid shall stay the same when new kthreads are created, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest updtae used if for this.

sched/init/nx_start.c Outdated Show resolved Hide resolved
sched/task/task_create.c Outdated Show resolved Hide resolved
sched/task/task_create.c Outdated Show resolved Hide resolved
sched/task/task_start.c Outdated Show resolved Hide resolved
@@ -527,6 +528,7 @@ static int nxtask_setup_stackargs(FAR struct task_tcb_s *tcb,
FAR const char *name,
FAR char * const argv[])
{
const uint8_t ttype = tcb->cmn.flags & TCB_FLAG_TTYPE_MASK;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove const

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not addressed yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in latest update

sched/group/group_create.c Outdated Show resolved Hide resolved
sched/task/task_create.c Outdated Show resolved Hide resolved
@yf13 yf13 force-pushed the sched branch 3 times, most recently from 84d791e to b3039b9 Compare June 5, 2024 00:35
@yf13
Copy link
Contributor Author

yf13 commented Jun 5, 2024

@xiaoxiang781216 looks like our CI got connectivity issue:

Configuration/Tool: ucans32k146/se05x,CONFIG_ARM_TOOLCHAIN_GNU_EABI
ccuurrll::  ((2288))  FFaaiilleedd  ttoo  cocnonnencet ctto  tgoi tghiutbh.ucbo.mc opmo rpto r4t4 3 4a4f3t earf t1e3r5 110365 2m4s4:  mCso:n nCeocntinoenc ttiimoend  toiumte

I forgot somewhere I saw other repositories using sub-module instead of download, not sure if it can reduce this kind of connectivity issues?

Looks like the Linux (sim-xxx) checks passed so our compilation warning fixes shall work now. I will trigger another check soon.

@yf13 yf13 force-pushed the sched branch 2 times, most recently from 999578c to 8917fdb Compare June 5, 2024 07:51
@yf13 yf13 requested a review from xiaoxiang781216 June 5, 2024 23:36
Thread args have already been saved to stack after the TLS section by
nxtask_setup_stackargs. This is to retrieve it for use.

Signed-off-by: Yanfeng Liu <[email protected]>
@@ -527,6 +528,7 @@ static int nxtask_setup_stackargs(FAR struct task_tcb_s *tcb,
FAR const char *name,
FAR char * const argv[])
{
const uint8_t ttype = tcb->cmn.flags & TCB_FLAG_TTYPE_MASK;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not addressed yet

sched/task/task_create.c Outdated Show resolved Hide resolved
@@ -87,33 +88,32 @@ void nxtask_start(void)
/* Execute the start hook if one has been registered */

#ifdef CONFIG_SCHED_STARTHOOK
if (ttcb->starthook != NULL)
if (ttype != TCB_FLAG_TTYPE_KERNEL && ttcb->starthook != NULL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why check TCB_FLAG_TTYPE_KERNEL

Copy link
Contributor Author

@yf13 yf13 Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kthread doesn't have starthook as it is in task_tcb_s.

sched/group/group_setuptaskfiles.c Outdated Show resolved Hide resolved

info = task_get_info();
return info->ta_argv[0];
return (rc >= 0 && ret && ((uintptr_t)si.stack_base_ptr) > ret) ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ned check stack_base_ptr

Copy link
Contributor Author

@yf13 yf13 Jun 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack_args don't exist for pthreads yet, this check avoids returning wrong result when used from pthread. Previous logic was dangerous as it returns address in main thread's stack.

@yf13 yf13 force-pushed the sched branch 2 times, most recently from d246950 to 0cbccbb Compare June 9, 2024 01:37
yf13 added 2 commits June 9, 2024 09:45
Kthreads can share the group data so that to reduce overheads.
This implements shared kthread group via:

- use `tcb_s` instead of `task_tcb_s` for kthreads
- use `g_kthread_group` when creating kthreads
- use stackargs to start tasks and kthreads

see pull/12320 for test logs.

Signed-off-by: Yanfeng Liu <[email protected]>
- replaces `ta_argv` with `stackargs`
- drops `ta_argv` from `task_info_s`
- drops `g_idleargv` and checks idle accordingly

Signed-off-by: Yanfeng Liu <[email protected]>
@yf13
Copy link
Contributor Author

yf13 commented Jun 10, 2024

@xiaoxiang781216 the const in task_setup.c was removed in latest push, but I couldn't reply this comment
Screenshot_2024-06-14_08-04-11
directly from my browser. Please take a look when free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants