Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.8.4-3] Crash in S-57 chart cache handler. #3851

Open
thwd opened this issue Apr 27, 2024 · 8 comments
Open

[5.8.4-3] Crash in S-57 chart cache handler. #3851

thwd opened this issue Apr 27, 2024 · 8 comments
Labels

Comments

@thwd
Copy link

thwd commented Apr 27, 2024

Describe the bug

Panning and zooming around causes OpenCPN to crash randomly.
I've narrowed it down to S-57 chart handling.
I have about 8038 cells loaded (all S-57).

To Reproduce
Pan and zoom around for a bit.

Expected behavior
OpenCPN doesn't crash.

Screenshots
Freezing the at the last frame before the crash displays the "Preparing vector chart X" box.

Screenshot from 2024-04-27 10-46-58

Desktop (please complete the following information if applicable):

  • OS: Ubuntu 23.10 on x86_64
  • Version 5.8.4-3

Additional context

[📦 org.opencpn.OpenCPN ~]$ gdb /app/bin/opencpn
...
Reading symbols from /app/bin/opencpn...
Reading symbols from /usr/lib/debug//app/bin/opencpn.debug...
(gdb) set environment MALLOC_CHECK_ 2
(gdb) run
...
corrupted size vs. prev_size

Thread 1 "opencpn" received signal SIGABRT, Aborted.
0x00007ffff6091204 in __pthread_kill_implementation () from /usr/lib/x86_64-linux-gnu/libc.so.6

(gdb) bt
#0  0x00007ffff6091204 in __pthread_kill_implementation ()
    at /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff603f04e in raise () at /usr/lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff60287fc in abort () at /usr/lib/x86_64-linux-gnu/libc.so.6
#3  0x00007ffff6084c9e in __libc_message () at /usr/lib/x86_64-linux-gnu/libc.so.6
#4  0x00007ffff609b2ba in  () at /usr/lib/x86_64-linux-gnu/libc.so.6
#5  0x00007ffff609bbd6 in unlink_chunk.constprop () at /usr/lib/x86_64-linux-gnu/libc.so.6
#6  0x00007ffff609bd53 in malloc_consolidate () at /usr/lib/x86_64-linux-gnu/libc.so.6
#7  0x00007ffff609d3b0 in _int_free () at /usr/lib/x86_64-linux-gnu/libc.so.6
#8  0x00007ffff609fafe in free () at /usr/lib/x86_64-linux-gnu/libc.so.6
#9  0x0000555555eda5c1 in std::__new_allocator<connector_segment*>::deallocate(connector_segment**, unsigned long) (__n=<optimized out>, __p=<optimized out>, this=<optimized out>)
    at /usr/include/c++/12.2.0/bits/new_allocator.h:158
#10 std::allocator_traits<std::allocator<connector_segment*> >::deallocate(std::allocator<connector_segment*>&, connector_segment**, unsigned long)
    (__n=<optimized out>, __p=<optimized out>, __a=<optimized out>)
    at /usr/include/c++/12.2.0/bits/alloc_traits.h:496
#11 std::_Vector_base<connector_segment*, std::allocator<connector_segment*> >::_M_deallocate(connector_segment**, unsigned long)
    (this=0x555560053da0, __n=<optimized out>, __p=<optimized out>)
    at /usr/include/c++/12.2.0/bits/stl_vector.h:387
#12 std::_Vector_base<connector_segment*, std::allocator<connector_segment*> >::_M_deallocate(connector_segment**, unsigned long)
    (__n=<optimized out>, __p=<optimized out>, this=0x555560053da0)
    at /usr/include/c++/12.2.0/bits/stl_vector.h:383
#13 std::_Vector_base<connector_segment*, std::allocator<connector_segment*> >::~_Vector_base() (this=0x555560053da0, __in_chrg=<optimized out>)
    at /usr/include/c++/12.2.0/bits/stl_vector.h:366
#14 std::vector<connector_segment*, std::allocator<connector_segment*> >::~vector()
    (this=<optimized out>, __in_chrg=<optimized out>)
    at /usr/include/c++/12.2.0/bits/stl_vector.h:733
#15 s57chart::~s57chart() (this=0x555560053620, __in_chrg=<optimized out>)
    at /run/build/opencpn/src/s57chart.cpp:345
#16 0x0000555555eda805 in s57chart::~s57chart()
    (this=0x555560053620, __in_chrg=<optimized out>)
    at /run/build/opencpn/src/s57chart.cpp:345
#17 0x00005555558d038c in ChartDB::DeleteCacheEntry(CacheEntry*, bool, wxString const&)
   
     (this=0x5555567ce130, pce=0x555560783380, bDelTexture=<optimized out>, msg=<optimized out>)
    at /run/build/opencpn/src/chartdb.cpp:261
#18 0x00005555558d1e2f in ChartDB::PurgeCacheUnusedCharts(double)
     (this=0x5555567ce130, factor=0.69999999999999996)
    at /run/build/opencpn/src/chartdb.cpp:379
#19 0x00005555559246c9 in ChartCanvas::SetViewPoint(double, double, double, double, double, int, bool, bool)
--Type <RET> for more, q to quit, c to continue without paging--
kew=<optimized out>, rotation=<optimized out>, projection=<optimized out>, b_adjust=<optimized out>, b_refresh=<optimized out>) at /run/build/opencpn/src/chcanv.cpp:5315
#20 0x000055555592ca77 in ChartCanvas::SetVPScale(double, bool) (refresh=false, scale=<optimized out>, this=0x555556b8b900) at /run/build/opencpn/src/chcanv.cpp:4976
#21 ChartCanvas::DoZoomCanvas(double, bool) (this=0x555556b8b900, factor=<optimized out>, can_zoom_to_cursor=<optimized out>) at /run/build/opencpn/src/chcanv.cpp:4584
#22 0x000055555592cc10 in ChartCanvas::ZoomCanvas(double, bool, bool)
    (this=this@entry=0x555556b8b900, factor=<optimized out>, can_zoom_to_cursor=can_zoom_to_cursor@entry=true, stoptimer=stoptimer@entry=false) at /run/build/opencpn/src/chcanv.cpp:4445
#23 0x000055555592d7ac in ChartCanvas::MouseEventProcessCanvas(wxMouseEvent&) (this=0x555556b8b900, event=...) at /run/build/opencpn/src/chcanv.cpp:9284
#24 0x00007ffff70425c2 in wxEvtHandler::ProcessEventIfMatchesId(wxEventTableEntryBase const&, wxEvtHandler*, wxEvent&) (entry=..., handler=<optimized out>, event=...)
    at ./src/common/event.cpp:1431
#25 0x00007ffff7042a8d in wxEvtHandler::SearchDynamicEventTable(wxEvent&) (this=this@entry=0x555556b8f320, event=...) at ./src/common/event.cpp:1901
#26 0x00007ffff7042ded in wxEvtHandler::TryHereOnly(wxEvent&) (this=this@entry=0x555556b8f320, event=...) at ./src/common/event.cpp:1624
#27 0x00007ffff7042e9f in wxEvtHandler::TryBeforeAndHere(wxEvent&) (event=..., this=0x555556b8f320) at ./include/wx/event.h:4007
#28 wxEvtHandler::ProcessEventLocally(wxEvent&) (this=0x555556b8f320, event=...) at ./src/common/event.cpp:1561
#29 0x00007ffff7042fa2 in wxEvtHandler::ProcessEvent(wxEvent&) (this=0x555556b8f320, event=...) at ./src/common/event.cpp:1534
#30 0x00007ffff70447f3 in wxEvtHandler::SafelyProcessEvent(wxEvent&) (this=<optimized out>, event=...) at ./src/common/event.cpp:1650
#31 0x00007ffff77cd550 in wxWindowBase::HandleWindowEvent(wxEvent&) const (this=this@entry=0x555556b8f320, event=...) at ./src/common/wincmn.cpp:1553
#32 0x00007ffff75c53e0 in wxWindow::GTKProcessEvent(wxEvent&) const (event=..., this=0x555556b8f320) at ./src/gtk/window.cpp:1551
#33 scroll_event(GtkWidget*, GdkEventScroll*, wxWindow*) (widget=0x55555644a570, gdk_event=<optimized out>, win=0x555556b8f320) at ./src/gtk/window.cpp:2002
#34 0x00007ffff56a2661 in _gtk_marshal_BOOLEAN__BOXED () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
#35 0x00007ffff6670d92 in g_closure_invoke () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#36 0x00007ffff669f768 in signal_emit_unlocked_R.isra.0 () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#37 0x00007ffff668ed7a in g_signal_emit_valist () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#38 0x00007ffff668f5b3 in g_signal_emit () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#39 0x00007ffff599d164 in gtk_widget_event_internal.part.0.lto_priv () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
#40 0x00007ffff5821d20 in propagate_event.lto_priv () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
#41 0x00007ffff5822b49 in gtk_main_do_event () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
#42 0x00007ffff62663b3 in _gdk_event_emit () at /usr/lib/x86_64-linux-gnu/libgdk-3.so.0
#43 0x00007ffff62b646e in gdk_event_source_dispatch.lto_priv () at /usr/lib/x86_64-linux-gnu/libgdk-3.so.0
#44 0x00007ffff51b8d71 in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#45 0x00007ffff520fa18 in g_main_context_iterate.constprop () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#46 0x00007ffff51b82af in g_main_loop_run () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#47 0x00007ffff581e4b5 in gtk_main () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
#48 0x00007ffff75a2f85 in wxGUIEventLoop::DoRun() (this=0x55555c41fe00) at ./src/gtk/evtloop.cpp:61
#49 0x00007ffff6eeeafa in wxEventLoopBase::Run() (this=0x55555c41fe00) at ./src/common/evtloopcmn.cpp:87
#50 0x00007ffff6eafa24 in wxAppConsoleBase::MainLoop() (this=0x55555628e670) at ./src/common/appbase.cpp:381
#51 0x00007ffff6f3e68c in wxEntry(int&, wchar_t**) (argc=<optimized out>, argv=<optimized out>) at ./src/common/init.cpp:503
#52 0x000055555584de6e in main(int, char**) (argc=<optimized out>, argv=<optimized out>) at /run/build/opencpn/src/ocpn_app.cpp:937

NOTE: Without MALLOC_CHECK_ 2, instead of "corrupted size vs. prev_size" it says "double free or corruption".

@thwd thwd added the bug label Apr 27, 2024
@thwd
Copy link
Author

thwd commented Apr 27, 2024

Workaround

Set NCacheLimit=99999 under [Settings] in your opencpn.conf file.

This will disable cache purging, hence increasing memory usage but avoiding the crash trigger.

(This will be a different setting for Windows users)

This workaround might not be practical on devices with limited memory like a Raspberry Pi.

@thwd thwd changed the title [5.8.4-3] Crash is S-57 chart handler. [5.8.4-3] Crash in S-57 chart cache handler. Apr 27, 2024
@nohal nohal added this to the OpenCPN 5.10.0 milestone Apr 27, 2024
@bdbcat
Copy link
Member

bdbcat commented Apr 28, 2024

Hmm...
At first look, this will take a fairly significant (read:risky) refactor of chart cache logic.

(Note to self)What is going on is like this, I think:

  1. Lots of panning around with a never-before-seen S57 chart set.
  2. The background SENC creation thread queue gets pretty full.
  3. There is no limit to the number of background jobs in the queue.
  4. Jobs are pulled from the queue to load the available processor cores.
  5. Meanwhile, successfully processed charts are added to the chart cache, as encountered.
  6. Eventually, the chart cache gets full.
  7. We try to remove the oldest entry from the cache, deleting the corresponding chart from core.
  8. Sooner or later, the logic will delete a chart that is currently being processed in a background thread.
  9. All sorts of mayhem results.

Probably too risky for O5.10, but will think on it.

@thwd
Copy link
Author

thwd commented Apr 28, 2024

Wow, thanks for the quick analysis!

Clarification: The charts are imported and loaded and show correctly. Unless I misunderstand how OpenCPN deals with this, they should already be SENC.

According to the backtrace, the crash happens in s57chart.cpp:345 (in tag 5.8.4) which is the end of its destructor.
The double-freed object is an element of std::vector<connector_segment*> so it must be an element of field m_pcs_vector.

This vector is only pushed to in s57chart::AssembleLineGeometry, which is called at the end of BuildRAZFromSENCFile. So it changes between the clear() in the destructor, and the end of the destructor.

It could be a concurrent s57chart::PostInit doing BuildRAZFromSENCFile.

Is this a fair assessment in your eyes?

EDIT: but then wouldn't we sometimes see a crash in PostInit if the vector is deallocated before it reaches AssembleLineGeometry? Concurrent bugs usually show idiopathical symptoms. This one is deterministically reproducible. Maybe there is simply two different items in pChartCache that point to the same pChart or something like that?

@bdbcat
Copy link
Member

bdbcat commented Apr 28, 2024

"So it changes between the clear() in the destructor, and the end of the destructor."
Yes, probably because the SENC thread is accessing the same members.

We can eliminate any possible SENC-creation thread question by executing Settings->Charts->Prepare All ENC Charts.
This method reads all S57 cells, converts to OCPN SENC format, and stores result in disk-persistent cache to be accessed at runtime. Basically a convenience shortcut to the normal logic.
This will take a little while with 8000 cells....

@thwd
Copy link
Author

thwd commented Apr 28, 2024

Indeed, I did "Prepare All ENC Charts" and did two tests:

  • Setting NCacheLimit=1 in my opencpn.conf, to force aggressive cache pruning.
  • Not setting NCacheLimit at all.

Both times; no crash.

@bdbcat
Copy link
Member

bdbcat commented May 2, 2024

OK, confirmation of potential logic error in SENC creation thread management.
Given the state of the Alpha test code base underway now, I propose to mark this bug as WONTFIX for OCPN 5.10.
The fault is rarely seen, and there is a workaround...
Thoughts?

@thwd
Copy link
Author

thwd commented May 4, 2024

SGTM. Is there a milestone for after 5.10 that it could be assigned to?

Edit: Is there a way to execute "Prepare All ENC Charts" via CLI, e.g. to do it as a cronjob?

@nohal nohal modified the milestones: OpenCPN 5.10.0, OpenCPN 5.12.0 May 4, 2024
@leamas
Copy link
Collaborator

leamas commented May 4, 2024

Is there a way to execute "Prepare All ENC Charts" via CLI, e.g. to do it as a cronjob?

We have so far not added this to opencpn-cmd, the cli variant. It is possible using a cli option when starting opencpn, but this requires a working display and is thus incompatible with cron

EDIT: ALso, we have yet not been able to separate S57 decoding from it's GUI dependencies. Long story short: adding this to opencpn-cmd is not trivial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants