Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement OPA compatible authorizer #488

Open
12 of 14 tasks
Tracked by #499 ...
soenkeliebau opened this issue Apr 15, 2024 · 0 comments · May be fixed by stackabletech/hbase-opa-authorizer#1
Open
12 of 14 tasks
Tracked by #499 ...

Implement OPA compatible authorizer #488

soenkeliebau opened this issue Apr 15, 2024 · 0 comments · May be fixed by stackabletech/hbase-opa-authorizer#1
Assignees

Comments

@soenkeliebau
Copy link
Member

soenkeliebau commented Apr 15, 2024

As a user I want to be able to use OPA/Rego rules as the basis of authorization checks in HBase.

Background

HBase uses a coprocessor - AccessController - which, when configured, invokes a Hadoop Group mapper to collect user/group information. This is used in conjunction with Zookeeper to filter data in HBase. This would mean that the implementation steps would be:

  • check the GroupMapper exists in the HDFS image (it does, by default)
  • configure AccessController in HBase
    • set hbase.security.authorization to true in the site config
  • check that the mapper is invoked
  • write rego rules that return user/group mappings for HBase to use

Problem

We do not actually use the StackableGroupMapper by default, as it has limitiations:

Returns list of groups for a user. Internally Hadoop will pass the short name to this function,
but this prevents us from effectively separating users with the same names but with different
kerberos principals.

Instead, we have implemented public class StackableAccessControlEnforcer implements INodeAttributeProvider.AccessControlEnforcer which bypasses the group mappings entirely and uses rego rules exclusively.

Possible approaches

  • we use the GroupMapper
    • bad, as we still have possible name conflicts over principals and the mapper would be redundant/inconsistent as we don't use it for HDFS anyway (and will probably cause problems as hadoop group assignments may interfere with the OPA access control in non-obvious ways)
    • confirmed with a test: HBase passes the shortUserName to the mapper
    • there are ways around this using mapping rules but we have opted against this for HDFS authorization
  • write our own co-processor
    • less bad, though not all internal components can be overwritten e.g. AccessController has AccessChecker and PermissionStorage which are heavily dependent on Zookeeper and we don't want to re-invent the wheel
  • use our own AccessControlService to manipulate ACLs
    • this is protobuf generated, is thinly documented, and parts of the API are already deprecated

Recommended approach

  • implement our own coprocessor
    • we build from source
  • based on the AccessController coprocessor
  • implements AccessChecker and AuthManager but uses full username
  • calls the opa service which applies rego rules

Implementation questions

  • the AccessChecker is invoked by not just the AccessController, but also MasterRpcServices, SnapshotManager, RSRpcServices and VisibilityController, meaning that we cannot just extend AccessController and use our own version of AccessChecker "internally". We could activate NoopAccessChecker (with hbase.security.authorization=false) so that other users of AccessChecker will allow everything, and then use our own setting (e.g. hbase.opa.authorization=true) to activate our own calls to the opa server. What are the sideaffects of this?
    • update: in 2.6.0 some of these calls have been replaced with coprocessor hooks, see the JIRA.

coproc

  • what are the effects on performance by bypassing Zookeeper caching? (i.e. will we have a caching layer in Opa?)
    • we should take care not to bypass this as it is has been optimised for typical HBase-throughputs. The cache keys will need to be based on fully-qualified user names, though.

Update following initial efforts

  • HBase writes ACL data in a dedicated table in HBase, but this is used mainly for persistence: the changes themselves are initially tracked in Zookeeper and are queried via an internal cache - Zookeeper passes these changes on to the ACL table via a watcher mechanism.
  • it would be possible to override just two methods to use our Opa rego rules: a) getActiveUser in AccessController (so that we can ensure that the cache keys use the full user name and not the short username) and b) getUserGroups in AccessChecker (which would then bypass the HDFS group mapper and get the information direclty from Opa). However, patching the HBase codes to makes these methods overridable, or cloning the two classes are both non-trivial as the code is not easy to decouple and we would have to duplicate (and keep updated) significant amounts of HBase code.
  • the current recommendation would be to implement a "clean" co-processor that implements the required interfaces and use our own code exclusively
    • this would mean we are responsbile for appropriate caching and performance considerations
    • but we would be independent of any HBase changes to components on which we would otherwise rely (e.g. ZkPermissionWatcher, AccessChecker etc.)
  • the goal is to start with a few methods (e.g. prePut, preGet, preDelete, createTable ...), get these working with opa, and then issue as a "preview" feature in the next release

Tasks

  • Add an OpenPolicyAgentAccessController coprocessor (implementing the required interfaces, not extending AccessController)
  • Create a unit test with HBaseTestingUtility and verify that it loads and calls OpenPolicyAgentAccessController
  • Use this unit test to call e.g. prePut
  • Investigate and gather an overview of what-happens-where-and-why
    • how is the current user retrieved?
    • what events are fired (picked up by the AccessController hooks) and in what order?
  • Mock Opa calls
    • Add an initial mocked test callout to the opa server in OpenPolicyAgentAccessController
      • Done using WireMock library
    • Add further mocked test calls before writing the rego rules
  • Test adding the coprocessor to the HBase docker image and checking that it is loaded
  • Examine callouts and implement an appropriate caching strategy (could be external to HBase)
  • Add rego rules for an integration test
  • Extend simple rego rules to production-ready rego rules (as has been done for Trino and HDFS)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Development: In Progress
Development

Successfully merging a pull request may close this issue.

3 participants