Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for custom permissions for custom sections. #105

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kitlith
Copy link
Contributor

@kitlith kitlith commented Dec 6, 2019

I'm not sure how any of this translates over to the mach file format, so I haven't touched it yet.

Fixes #104 , though I haven't tested it yet.

A different approach may be needed, as this is kinda just mirroring the SectionBuilder for ELF.

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually looks good; i think adding it to mach will be even easier. Can you post your binary dump here or in the tracking issue; would be interested in testing to see if it works :)

match self {
DefinedDecl::Section(a) => a.is_executable(),
DefinedDecl::Function(_) => true,
DefinedDecl::Data(_) => false,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I thought the PR would allow code like:

    obj.declare("foo", Decl::data().writeable().executable())?;

But this forces data to always not be executable and functions to always be executable; we might as well default to sane values here, but ultimately let user choose to set these differently?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g., I think some JIT engines will want WX flags on data section? Or if i want to write self modifying code, etc., and have this in the data segments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make it a property of the Decl/DefinedDecl instead, I was just trying to make minimal changes. This may require either adding flags to every varient of Decl or converting DefinedDecl into a struct that contains an enum + flags or something?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah like something like:

/// A declaration that is defined inside this artifact
pub enum DefinedDeclKind {
    /// A function defined in this artifact
    Function(FunctionDecl),
    /// A data object defined in this artifact
    Data(DataDecl),
    /// A section defined in this artifact
    Section(SectionDecl),
}
pub struct DefinedDecl {
  kind: DefinedDeclKind,
  writable: bool,
  executable: bool,
}

?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably maybe a good idea (if those fields are truly shared amongst each of the Decl variants), but i suspect the diff might be quite large if DefinedDecl became a struct with a kind?

Copy link
Contributor Author

@kitlith kitlith Dec 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the number of differences will be interesting, yes, but I'll put in the legwork if it's a good idea. EDIT: i.e. my first implementation is an attempt to do minor changes, but with feedback from people who know what they're doing i'll follow their lead.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it does worry me because:

  1. it's a breaking change
  2. I don't know who's using DefinedDecl, i think its effectively internal, but it would have been nice if it was marked pub(crate)
  3. have a default value doesn't translate as well, because the default for a function decl is different than a data, etc.

At this point i think your PR is fine and we should just move forward with it.

Copy link
Contributor Author

@kitlith kitlith Dec 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still possible to just add these flags to every varient, i.e. FunctionDecl, DataDecl, and SectionDecl, just like datatype_methods!(), and then the methods on DefinedDecl just delegate to the impl for the correct enum.

This wouldn't be a breaking change, right?

edit: might want to break it up into a macro invocation for alloc, write, and exec seperately so that we can distinguish which decls need which flags.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that wouldn't be a breaking change; i was just referring to changing the DefinedDecl to a struct; I think what you have now is basically done, modulo some of the comments I made, unless I'm missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so yes, i think this is (mostly?) done, unless you want me to add is_writable to FunctionDecl and/or is_executable to DataDecl.

we need to make sure not to forget making it work on mach, too.

src/artifact/decl.rs Show resolved Hide resolved
@kitlith
Copy link
Contributor Author

kitlith commented Dec 8, 2019

so, RE binary dump stuff: here's roughly code I used along with the contents of this PR: https://gist.github.com/kitlith/a26a1021c268b34bef60d57cfc6cc19f#file-wasm_lib-rs-L32-L46 -- if you want to follow along I can make some effort to actually get this up in a repo.

Here's the raw binary and linked binary in a zip together: NewOrleans.zip

This is for the msp430 arch, so if you want to run objdump or smth you may need to grab a toolchain, i'll just run objdump -x on both of these real quick for you:

raw_dump:

raw_dump:     file format elf32-msp430
raw_dump
architecture: MSP430, flags 0x00000010:
HAS_SYMS
start address 0x00000000

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 flash         00010000  00000000  00000000  00000034  2**4
                  CONTENTS, ALLOC, LOAD, CODE
  1 .note.GNU-stack 00000000  00000000  00000000  00000000  2**0
                  CONTENTS, READONLY
SYMBOL TABLE:
00000000 l    df *ABS*	00000000 New Orleans
00000000 l    d  flash	00000000 flash

linked:

linked:     file format elf32-msp430
linked
architecture: msp:14, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x00004400

Program Header:
    LOAD off    0x00000060 vaddr 0x00000000 paddr 0x00000000 align 2**2
         filesz 0x00010000 memsz 0x00010000 flags rwx

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 flash         00010000  00000000  00000000  00000060  2**4
                  CONTENTS, ALLOC, LOAD, CODE
SYMBOL TABLE:
00000000 l    d  flash	00000000 flash
00000000 l    df *ABS*	00000000 New Orleans

This is a dump of New Orleans (which is the level immediately after the tutorial) from Microcorruption, sans symbols because the interface I was trying to use doesn't seem to be implemented yet.

@m4b
Copy link
Owner

m4b commented Dec 8, 2019

This is for the msp430 arch, so if you want to run objdump or smth you may need to grab a toolchain, i'll just run objdump -x on both of these real quick for you:

I just use bingrep :D

m4b@efrit ::  [ ~/projects/faerie ] bingrep raw_dump 
ELF REL MSP430-little-endian @ 0x0:

e_phoff: 0x0 e_shoff: 0x10098 e_flags: 0x0 e_ehsize: 52 e_phentsize: 32 e_phnum: 0 e_shentsize: 40 e_shnum: 5 e_shstrndx: 1

ProgramHeaders(0):
  

SectionHeaders(5):
  Idx   Name                      Type   Flags                    Offset     Addr   Size       Link         Entsize   Align  
  0                           SHT_NULL                            0x0        0x0    0x0                     0x0       0x0    
  1     .strtab             SHT_STRTAB                            0x10034    0x0    0x33                    0x0       0x1    
  2     .symtab             SHT_SYMTAB                            0x10068    0x0    0x30       .strtab(1)   0x10      0x8    
  3     flash             SHT_PROGBITS   WRITE ALLOC EXECINSTR    0x34       0x0    0x10000                 0x0       0x10   
  4     .note.GNU-stack   SHT_PROGBITS                            0x0        0x0    0x0                     0x0       0x1    

Syms(3):
               Addr   Bind       Type        Symbol        Size   Section    Other  
                 0    LOCAL      NOTYPE                    0x0               0x0    
                 0    LOCAL      FILE        New Orleans   0x0    ABS        0x0    
                 0    LOCAL      SECTION                   0x0    flash(3)   0x0    

Dyn Syms(0):
Dynamic Relas(0):

Dynamic Rel(0):

Plt Relocations(0):

Shdr Relocations(0):

Dynamic: None


Libraries(0):

Soname: None
Interpreter: None
is_64: false
is_lib: false
little_endian: true
entry: 0

@kitlith
Copy link
Contributor Author

kitlith commented Dec 8, 2019

So, something I just kinda realized is that afaict this PR basically makes SectionDecl subsume all the other Decls, to the point where all the others are basically SectionDecls w/ different default flags and such set. FunctionDecl is a SectionDecl w/ the rx flags set and without the ability to add additional symbols. etc.

Might be something to keep in mind for future breaking changes?

@m4b
Copy link
Owner

m4b commented Dec 8, 2019

Might be something to keep in mind for future breaking changes?

sure! if you could break add a breaking change, how would you design it?

@kitlith
Copy link
Contributor Author

kitlith commented Dec 8, 2019

Breaking change ideas:

DefinedDecl kinda goes away, as it's been subsumed by SectionDecl, as does FunctionDecl and DataDecl (edit: or we could rename SecitonDecl to DefinedDecl). The impls on Decl roughly become types of defaults for SectionDecl, for better backwards compat:

impl Decl {
    fn section(kind: SectionKind) -> SectionDecl { SectionDecl::new(kind) }
    fn function() -> SectionDecl { SectionDecl::new(SectionKind::Function).with_executable(true).with_loaded(true).with_writable(false) }
    fn data() -> SectionDecl { SectionDecl::new(SectionKind::Data).with_executable(false).with_loaded(true).with_writable(true) }

around this time i'd like to remind you i'm talking out of my ass, and my mind is going in this direction because of how they seem similar, but restricted. i.e. i don't get why the original design was the way it was. there are probably good reasons to keep these seperate.

@bjorn3
Copy link
Contributor

bjorn3 commented Dec 8, 2019

@kitlith For Mach-O all functions are placed in the same __TEXT.__text section.

@kitlith
Copy link
Contributor Author

kitlith commented Dec 8, 2019

How can you specify a RWX section? EDIT: not quite so simple...

Mach-o files have the concept of segments, which, among other things, specify the maximum and initial memory protections of all of their sections. However, it has been defined that "an intermediate object file only contains one segment, ... containing all the sections destigned for different segments in the file." So with that said, I guess I don't know how generating a rwx section/segment works for a intermediate object, only an executable. :/

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking pretty good to go; probably should add a small test verifying functions And or data can be read write ?

@kitlith
Copy link
Contributor Author

kitlith commented Dec 9, 2019

Okay, so I should add write flags to FunctionDecl is what I'm hearing? (and then write tests to prove that it's possible?) Or am I misunderstanding? And then there's the mach backend which currently silently ignores these new flags afaik. Are we ignoring that for now?

@m4b
Copy link
Owner

m4b commented Dec 10, 2019

I thought you added write flags to function decl? Or just executable to data and sections ?

Anyway yea test would be good. And ideally add it to mach if you’re feeling adventurous or probably easier to do in a separate PR (if you want)

@kitlith
Copy link
Contributor Author

kitlith commented Dec 10, 2019

We discussed it, but I never got a definitive answer so I held off on implementing. After all, adding stuff is a minor release, but removing stuff is a breaking change.

For clarification:

  • FunctionDecl
    • readonly, executable by default (will add write flag, will not add alloc flag. Will not add way to change whether it's executable.)
  • DataDecl
    • (edit, NOT) writeable by default (already has way to change that, will add executable flag, will not add alloc flag)
  • SectionDecl
    • not allocated by default (pr currently adds alloc, exec, and write flags, none of this needs changing.)

I'll work on these changes tomorrow.

As for mach-o, this library seems to try and abstract away from the underlying format as much as it can. so, I'm focusing more on mach-o so much because if there's no way to do it for mach-o, this could be a leaky/bad abstraction. As such, I feel uncomfortable suggesting this be merged until either finding an API that works for both, or (possibly more likely at this rate) figuring out that it's not possible in mach-o outside of linked executables, and thus (currently) out of scope for fixing in the library, so we should just more forward with something.

@m4b
Copy link
Owner

m4b commented Dec 10, 2019

Ok this sounds good to me, as long as careful about breaking changes (I don’t see any so I think we’re good).

Yes it’s a good idea to work on think about Mach too, thank you!

@kitlith
Copy link
Contributor Author

kitlith commented Dec 10, 2019

I found a way to make this work for mach-o:

LC_LINKER_OPTION allows us to embed linker options inside our binary. combine this with the segprot linker option (-segprot segname max_prot init_prot) and we should be able to specify the permissions of custom segments.

tbh this feels very meh, but I'll take it. I feel more comfortable about the API proposed by this PR now.

@m4b
Copy link
Owner

m4b commented Dec 11, 2019

Mach has section flags, I would be surprised if they didn’t have read/write at least? The null segment is mapped this way, making it not readable or write able, so null pointers fault.

@m4b
Copy link
Owner

m4b commented Dec 11, 2019

Eg are these not sufficient ?

https://github.com/m4b/goblin/blob/e5ff5518851a179601b06f2d61587981a3a7b407/src/mach/constants.rs#L194

Maybe that is only for use by dynamic linker though ... needs more investigating

@kitlith
Copy link
Contributor Author

kitlith commented Dec 11, 2019

Those are, afaik, segment flags, not section flags. And since there is only one segment in intermediate object files, there's no way to specify the permissions for custom segments. This is the reason why I've been concerned about mach-o the whole time.

EDIT: yeah, rereading format docs says that the flags on sections hold the section type and section attributes, and none of those (officially? as far as we know?) specify protection flags. What does specify protection flags is maxprot and initprot on the segment commands.

@m4b
Copy link
Owner

m4b commented Dec 17, 2019

Can we just emit another segment with different flags? E.g., right now we just emit __TEXT but maybe another one could be emitted? Will probably be a lot of work to do right though? There has to be a way to have code that can write to text sections executable sections though, otherwise i'm not sure how JIT would work?

@philipc
Copy link
Collaborator

philipc commented Dec 17, 2019

I doubt the linker will be happy with another segment. Does the macos linker even use the segment flags in the object file? I expected it to ignore them. Should be easy to test for someone with a macos system.

@philipc
Copy link
Collaborator

philipc commented Dec 17, 2019

Looking at the lld source, it probably treats multiple segments the same as if it was only one segment, and the only segment field that I can see it uses is nsects.

@bjorn3
Copy link
Contributor

bjorn3 commented Dec 17, 2019

JIT compilers works by mmaping an anonymous area of memory with either writable and executable permission, or only writable and then changing it to only executable later. (W^X is safer) They don't generally manipulate existing code.

@kitlith
Copy link
Contributor Author

kitlith commented Dec 17, 2019

technically, emitting a "__TEXT" segment instead of "" is already off spec (unless i've completely missed something and this doesn't generate intermediate object files, but executables) -- and while I haven't looked at linker code myself, i'd be inclined to believe philip's guess.

and, of course, there is still a way to define a segment w/ rwx or whatever perms you want. linker flags. if you need it, you know how to add flags to LDFLAGS. or, since you can embed flags in object files with LC_LINKER_OPTION, have the object file specify the section.

This is probably best done with some sort of switch to specify whether a linker options should be embedded, as well as if a section should be placed in a custom segment. Blast if I know how to make a good API for that though, since we're sharing an API with elf files. It's probably easiest to define our own """standard""" segments with custom permissions ("__FAERIE_RWX" anybody?) that we can sort stuff into if they match the permissions, and include the flags for the segments being used. Somewhere in the middle would be generating segment names based on section/symbol names and that seems worse than both the other options.

@m4b at least you get why I was so reluctant about the api and kept bringing up mach-o now, right? I don't want to add features to your cross-format API that only work for one format.

@m4b
Copy link
Owner

m4b commented Dec 17, 2019

JIT compilers works by mmaping an anonymous area of memory with either writable and executable permission, or only writable and then changing it to only executable later. (W^X is safer) They don't generally manipulate existing code.

Ah yea that makes sense! I find it hard to believe you can’t write self modifying code on mach though, but maybe.

@m4b at least you get why I was so reluctant about the api and kept bringing up mach-o now, right? I don't want to add features to your cross-format API that only work for one format.

Sure !

So some steps moving forward: we can abandon this approach; the easiest hack, but very much a hack, is I believe for an elf person to manually alter the write/execute flags on sections.

Any other ideas ?

@kitlith
Copy link
Contributor Author

kitlith commented Dec 17, 2019

I find it hard to believe you can’t write self modifying code on mach though, but maybe.

This is representable in executable files, just not in object files, which is what this library targets. It actually makes sense -- there's no need to embed this information in the object file when what really matters is the final link, when the linker will have all the information on what segments go where from the linker script/arguments. So all we need is a way to embed information in the object file that the linker will pay attention to (LC_LINKER_OPTION) or tell the user to add stuff to a linker script.

Ways to proceed:

  • LC_LINKER_OPTION + -segprot segname max_prot init_prot to embed segment permissions into the object file. Should, in theory, work, but may need more than just the one option for the segment to actually work.
  • Have some way to specify segment names (probably a no-op for elf) for the above and/or for people writing their own linker scripts/calling out to a linker and adding options. For the latter, these options should no-op for mach-o (as they currently implicitly are).
  • other currently unknown mach/linker magic?
  • Just accept that there are going to be features that do not translate between elf and mach-o, so there should be executable dependent apis. (meh, but I would have originally expected completely seperate libraries, so I don't consider it too bad)
  • Abandon this PR. (I don't actually need it because i'm manually writing out the program headers anyway, who cares about mismatches between section headers and program headers)

@kitlith
Copy link
Contributor Author

kitlith commented Mar 20, 2020

@m4b have any comments on how to proceed here? (since you wished i had pinged sooner on the other PR)

@m4b
Copy link
Owner

m4b commented Mar 22, 2020

I'm not sure where we landed to be honest. This is still marked a draft, but you seemed to have some reservations about merging? Do you still feel that way, or do you want to merge the changes as it stands here?

Could you give a high-level overview of the downsides or dangers (if any) of merging what you have?

@kitlith
Copy link
Contributor Author

kitlith commented Mar 22, 2020

My initial reservation about merging this came from not being able to find a way to implement this feature for mach-o. I then found a way, but still had/have reservations because the method is kinda meh (embedding linker command-line options into the object file). Which lead to the options i mentioned awhile ago.

For the most part, I think this is okay to merge? (I don't remember the state of the code though.) I just want there to be a plan for how it'll eventually be implemented for mach-o. And if the plan is not to implement it for mach-o (because the potential solution is diagusting or causes issues or something) then I want you to consider whether it's okay to have an api that'll only ever work for one file type. That's it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RWX sections (or: custom section permissions)
4 participants