Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-host issues related to 'not' or '=' (fantom, haxe, io, nasm, powershell, r, rexx and vala) #418

Open
kanaka opened this issue Jul 15, 2019 · 13 comments

Comments

@kanaka
Copy link
Owner

kanaka commented Jul 15, 2019

@asarhaddon had a modification to how self-hosting works (captured here:
809d74c) that triggered some issues in fantom, haxe, io, nasm, powershell, r, rexx and vala. Here is the Travis self-hosted test where the issue manifested:
https://travis-ci.org/kanaka/mal/builds/558757626

@asarhaddon
Copy link
Contributor

asarhaddon commented Jul 15, 2019

The MAL implementation has been changed in order to hide these issues. A proper fix has been found for:

  • fantom
  • haxe
  • io
  • nasm (unrelated with not and =)
  • powershell (34de124)
  • r
  • rexx (82bc78e)
  • vala (unrelated with not and =)

@kanaka
Copy link
Owner Author

kanaka commented Jul 15, 2019

Strangely, the vala issue appears to go away if you make this change:

diff --git a/mal/step4_if_fn_do.mal b/mal/step4_if_fn_do.mal
index 05297be..0580ec3 100644
--- a/mal/step4_if_fn_do.mal
+++ b/mal/step4_if_fn_do.mal
@@ -74,6 +74,7 @@
 
 ;; core.mal: defined directly using mal
 (map (fn* [data] (apply env-set repl-env data)) core_ns)
+(env-set repl-env '*ABC* [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19])
 
 ;; core.mal: defined using the new language itself
 (rep "(def! not (fn* [a] (if a false true)))")

If you remove any number from the list then the assertion errors re-appear and things don't work correctly. It appears to me that the issue appears when the number of mal memory objects allocated is below a certain threshold. @sgtatham any insight you can give on why this is happening and how to fix it? Here is what happens if you run without the addition above:

$ make DOCKERIZE=1 MAL_IMPL=vala repl^mal^step4
docker run -it --rm -u 1000 -v /data/joelm/personal/programming/mal-other/:/mal -w /mal/mal kanaka/mal-test-mal make step4_if_fn_do.mal
make: 'step4_if_fn_do.mal' is up to date.
REPL implementation mal, step file: mal/step4_if_fn_do.mal
Running: docker run -e STEP=step4_if_fn_do -e MAL_IMPL=vala -it --rm -u 1000 -v /data/joelm/personal/programming/mal-other/:/mal -w /mal/vala kanaka/mal-test-vala ../mal/run 

(process:1): GLib-GObject-CRITICAL **: 20:34:54.160: g_object_ref: assertion 'G_IS_OBJECT (object)' failed
mal-user> (list)
Uncaught exception: 'list' not found
mal-user> 

@kanaka
Copy link
Owner Author

kanaka commented Jul 15, 2019

I noted this in #400 but I'm adding it here since this ticket is tracking the issues now:

The nasm failure appears to be because the step9_try.mal file is exactly 4096 bytes. Adding or removing a character anywhere in the file fixes the issue (likewise, padding step1 to 4096 makes the problem happen there too).

@bendudson can you take a look at why nasm crashes when trying to do load-file of a file that is exactly 4096 bytes?

make repl^nasm
(load-file "../mal/stepA_mal.mal")
Mal [nasm-mal]
nil
mal-user> (+ 2 3)
5
user> (load-file "../mal/step1_read_print.mal") ;; padded to exactly 4096 bytes
Error: Run out of memory for Array objects. Increase heap_array_limit.

@asarhaddon
Copy link
Contributor

The vala issue is reproducible with

(load-file "../mal/env.mal")
(load-file "../mal/core.mal")
(println "  This should not be empty: " core_ns)

Printing core_ns at the end of core.mal works as expected.

@sgtatham
Copy link
Contributor

I haven't had time to debug the Vala failure properly yet, but I've at least identified the cause of the GLib assertion-failure message. That happens in Mal.Vector, and the cause is that one of the references stored in the vector object turns out to be unexpectedly null, which I think must be because the garbage collector has either deliberately freed it, or accidentally dropped its reference to it on the floor.

In an attempt to narrow the problem down, I did a git bisect with the test input files fixed, changing only the Vala implementation itself. That pointed the finger at commit 26ced15, which moved a handful of standard functions out of the Vala source code into files in the lib subdirectory. In the process it changed the definitions slightly, but that doesn't seem to be the cause, because when I edited the source files in lib to contain the defintiions exactly as I had them in stepA_mal.vala, I still got the failure. So it seems to vary with where the definitions are, not what they are. Hmmm.

I think my best guess would be that somewhere in the chain of calls starting at load-file there's a missing GC.Root, and its absence is causing something vital to get GCed that shouldn't be. But I doubt I'll find it by staring at the code and guessing: it's going to be a matter of identifying a specific thing that shouldn't be freed, and then finding out which GC call is freeing it anyway and where in the code that's called from...

@kanaka
Copy link
Owner Author

kanaka commented Jul 15, 2019

@sgtatham valgrind might be able to help you find it. It might at least confirm what type of thing is being leaked and maybe even where it is happening.

sgtatham added a commit to sgtatham/mal that referenced this issue Jul 16, 2019
The clause in eval_ast() which evaluates each element of an input
vector into an output vector was holding the intermediate results in
an ordinary GLib.List, and putting them all into a vector at the end
of the evaluation. But that meant that nothing was preventing all
those values from being garbage-collected half way through.

Now we make an output Mal.Vector at the start of the process, and
point a GC.Root at it to ensure it stays around until we've finished
putting items in it.

This fixes the vala part of kanaka#418, I think.
@sgtatham
Copy link
Contributor

Unfortunately, vala is too smart to trigger valgrind errors! If the code had actually used the value after free, valgrind would have helpfully told me where it was freed. But actually the Vala / GLib runtime notices just in time, and doesn't.

As it turned out, the important point was not where the value was freed, but where it was allocated, because that was where I should have added the missing GC.Root. #421 should fix it.

@kanaka
Copy link
Owner Author

kanaka commented Jul 16, 2019

@sgtatham Good work. Thanks.

@dubek
Copy link
Collaborator

dubek commented Jul 18, 2019

I'm looking at io.

@dubek
Copy link
Collaborator

dubek commented Jul 18, 2019

For me make DOCKERIZE=1 MAL_IMPL=io test^mal^step8 passes OK on master (03a2a69). Has something changed? Since the travis build ran?

@kanaka
Copy link
Owner Author

kanaka commented Jul 18, 2019

@dubek sorry, to be clear, the bugs are revealed by a prior modification to the mal implementation that never made it to master. A version was merged that worked around the problem. Here is the simple diff that should reproduce the issue:

--- a/mal/core.mal
+++ b/mal/core.mal
@@ -1,11 +1,11 @@
 (def! _map? (fn* [x]
   (if (map? x)
-    (not (contains? x :__MAL_MACRO__))
+    (not (= (keys x) '(:__MAL_MACRO__)))
     false)))
 
 (def! _macro? (fn* [x]
   (if (map? x)
-    (contains? x :__MAL_MACRO__)
+    (= (keys x) '(:__MAL_MACRO__))
     false)))
 
 (def! core_ns

Will trigger this behavior:

$ make DOCKERIZE=1 MAL_IMPL=io test^mal^step8
...
Testing trivial macros
TEST: '(defmacro! one (fn* () 1))' -> ['',] -> SUCCESS (result ignored)
TEST: '(one)' -> ['',1] -> FAIL (line 4):
    Expected : '\\(one\\)\r\n1'
    Got      : "(one)\r\nUncaught exception: MalMap does not respond to 'call'"
...

@dubek
Copy link
Collaborator

dubek commented Jul 20, 2019

The following extra step4 tests find a bug in the Io implementation that eventually causes this self-hosting bug:

diff --git a/tests/step4_if_fn_do.mal b/tests/step4_if_fn_do.mal
index 0237d227..2d37b57d 100644
--- a/tests/step4_if_fn_do.mal
+++ b/tests/step4_if_fn_do.mal
@@ -431,6 +431,8 @@ nil
 ;=>false
 (= :abc ":abc")
 ;=>false
+(= (list :abc) (list :abc))
+;=>true

 ;; Testing vector truthiness
 (if [] 7 8)
@@ -465,6 +467,8 @@ nil
 ;=>true
 (= [7 8] [7 8])
 ;=>true
+(= [:abc] [:abc])
+;=>true
 (= (list 1 2) [1 2])
 ;=>true
 (= (list 1) [])

Now I'll try to hunt it down.

@dubek dubek mentioned this issue Jul 21, 2019
@dubek
Copy link
Collaborator

dubek commented Jul 21, 2019

From my local tests #425 should solve the self-hosting issue in io .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants