Skip to content

CherryPySpec

Sviatoslav Sydorenko edited this page Oct 6, 2017 · 2 revisions
                      The CherryPy HTTP framework

Abstract

CherryPy is a framework for developing and deploying HTTP applications.

CONTENTS

1 Introduction
    1 Purpose
    2 Requirements
    3 Terminology
    4 Overview
2 Core
    1 Applications
    2 Requests and Responses
        1 The Request object
        2 The Response Object
        3 Serving the Request and Response
        4 Request Execution
        5 Cleanup
    3 Dispatchers
        1 Invocation
        2 request.handler
        3 request.config
    4 HTTP Servers
    5 WSGI
    6 Engines
3 Extensions
    1 Hooks
        1 Hook points
        2 Hook objects
    2 Tools
        1 Decorators
        2 Callables
        3 Handlers
    3 Toolboxes
    4 Configuration
        1 Scopes
        2 Namespaces
            1 Namespace handlers
        3 Handler Attributes
4. Footnotes and References

1 Introduction

CherryPy is a framework for developing and deploying HTTP applications.

1.1 Purpose

This specification defines the composition and interaction of CherryPy components.

1.2 Requirements

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. See http://www.ietf.org/rfc/rfc2119.txt.

An implementation is not compliant if it fails to satisfy one or more of the MUST or REQUIRED level requirements for the protocols it implements. An implementation that satisfies all the MUST or REQUIRED level and all the SHOULD level requirements for its protocols is said to be "unconditionally compliant"; one that satisfies all the MUST level requirements but not all the SHOULD level requirements for its protocols is said to be "conditionally compliant."

1.3 Terminology

Unless otherwise specified, all terminology used in this specification should be interpreted as that of "Hypertext Transfer Protocol -- HTTP/1.1" (RFC 2616) and "Uniform Resource Identifiers (URI): Generic Syntax and Semantics" (RFC 2396).

Additional terms:

handler (page handler) A callable which responds to a request, usually by returning an HTTP response body.

handler (namespace handler) A callable which parses and applies a configuration entry based on a hierarchy of entry names.

unexpected exception In the normal course of responding to requests, CherryPy raises known exceptions such as HTTPError, HTTPRedirect, and InternalRedirect in order to skip various parts of the request process. In addition, the exceptions SystemExit and KeyboardInterrupt are never handled by request objects, but are always passed outward to the caller. These are all "expected exceptions", and any other exception, therefore, is defined as an "unexpected exception".

1.4 Overview

CherryPy consists of not one, but four separate API layers.

The APPLICATION LAYER is the simplest. CherryPy applications are written as a tree of classes and methods, where each branch in the tree corresponds to a branch in the URL path. Each method is a 'page handler', which receives GET and POST params as keyword arguments, and returns or yields the (HTML) body of the response. The special method name 'index' is used for paths that end in a slash, and the special method name 'default' is used to handle multiple paths via a single handler. This layer also includes:

  • the 'exposed' attribute (and cherrypy.expose)
  • cherrypy.quickstart()
  • _cp_config attributes
  • cherrypy.tools (including cherrypy.session)
  • cherrypy.url()

The ENVIRONMENT LAYER is used by developers at all levels. It provides information about the current request and response, plus the application and server environment, via a (default) set of top-level objects:

  • cherrypy.request
  • cherrypy.response
  • cherrypy.engine
  • cherrypy.server
  • cherrypy.tree
  • cherrypy.config
  • cherrypy.thread_data
  • cherrypy.log
  • cherrypy.HTTPError, NotFound, and HTTPRedirect
  • cherrypy.lib

The EXTENSION LAYER allows advanced users to construct and share their own plugins. See Section 3.

Finally, there is the CORE LAYER, which uses the core API's to construct the default components which are available at higher layers. You can think of the default components as the 'reference implementation' for CherryPy. Megaframeworks (and advanced users) may replace the default components with customized or extended components. The core API's are discussed in Section 2.

2 Core

2.1 Applications

CherryPy uses an application object to implement a collection of URI's which maps to a collection of page handlers. This terminology is taken directly from Fielding, "...the server receives the identifier (which identifies the mapping) and applies it to its current mapping implementation (usually a combination of collection-specific deep tree traversal and/or hash tables) to find the currently responsible handler implementation and the handler implementation then selects the appropriate action+response based on the request content."

The exact implementation of that mapping is dependent on the dispatcher(s) (section 2.3) which the application employs internally; by default, the external application interface only exposes a "script name" (root URI) for the entire collection.

An application object MUST contain the following three attributes:

* script_name: a string, containing the "mount point" for this object.
    A mount point is that portion of the URI which is constant for all
    URIs that are serviced by this application; it does not include
    scheme, host, or proxy ("virtual host") portions of the URI.
    It MUST NOT end in a slash. If the script_name refers to the
    root of the URI, it MUST be an empty string (not "/").
* config: a nested dict, containing configuration entries which apply
    to this application, of the form: {section: {entry name: value}}.
    The 'section' keys MUST be strings. If they represent URI paths,
    they MUST begin with a slash, and MUST be relative to this object's
    script_name. If they do not begin with a slash, they SHOULD be
    treated as arbitrary section names, which applications MAY use as
    they see fit. The 'entry name' keys MUST be strings, and in the
    case of path sections, SHOULD be namespaced (section 3.4).
    The values may be arbitrary Python values.
* namespaces: a dict of configuration namespace names and handlers.
    See section 3.4.

Application objects also MUST possess a "merge" method, that takes a single "config" argument, which MUST be a dict, nested in the same manner as the application object's config. The "merge" method MUST combine the supplied config with the application object's existing config dict in such a way that the supplied config overrides (overwrites) entries in the existing config. The "merge" method MUST NOT remove any values in the existing config unless replacing them with a new value, or performing the removal via a namespace handler. The "merge" method MUST pass all entries in the supplied config to the proper namespace handler (if any). It MUST NOT pass any entries from the existing config to namespace handlers, since these entries will have already been handled when they were first merged. Callers SHOULD NOT attempt to add config entries to the application object via any means other than passing a new config dict to the "merge" method.

The specification of application objects excludes calling syntax by design; their implementation, however, MAY include additional methods which are used to associate them with an HTTP request, and even initiate the handling of each request. For example, the reference implementation extends the spec by adding a __call__ method which acts as a "WSGI application interface"; WSGI servers and middleware may then hand off request processing to such an application object by calling it.

In addition, application objects MAY possess other attributes and methods which consumers can use to differentiate them. For example, a consumer might wish to use different application objects based on the "Accept" HTTP request header, in which case a cooperating creator of application objects could give each object an additional "accept" attribute.

2.2 Requests and Responses

The CherryPy Request API involves the creation and handling of Request and Response objects, and also a caller. The caller is usually an HTTP server (section 2.4), although it may act through intermediaries such as a WSGI adapter (section 2.5) and/or an Engine (section 2.6). The rest of this section uses "HTTP server" to mean any combination of calling code, regardless of its architecture.

The API is quite simple, and consists of five steps:

2.2.1 The Request Object

An HTTP server obtains a request object by instantiating it directly. Each HTTP request MUST result in a separate request object.

The constructor arguments for the request object are:

local_host: an instance of http.Host corresponding to the server socket.
remote_host: an instance of http.Host corresponding to the client socket.
scheme: a string containing the protocol actually used for the HTTP
    conversation, lowercased. Usually, this will be either "http" or
    "https", but is open to extension. This should be provided by the
    server based on its own awareness of the conversation details;
    that is, it should not be obtained from any part of the request
    message itself.
server_protocol: a string containing the HTTP-Version for which the
    server is at least conditionally compliant. Servers which meet all
    of the MUSTs in RFC 2616 should set this to "HTTP/1.1"; all others
    should use "HTTP/1.0" (lower versions are not explicitly supported).

Once the HTTP server obtains the request object, it is free to modify it in any way it sees fit. Generally, this involves adding new server environment attributes such as 'login', 'multithread', 'app', 'prev' and so on. Some such additional attributes MAY be required by individual request implementations.

Request objects SHOULD use hooks (section 3.1) and tools (section 3.2) to implement extensions.

2.2.2 The Response Object

The HTTP server obtains a response object by instantiating it; there are no arguments. Each HTTP request MUST result in a separate response object.

Once the HTTP server obtains the response object, it is free to modify it in any way it sees fit. Some additional attributes MAY be required by individual response implementations.

2.2.3 Serving the Request and Response

Once the HTTP server has obtained a request and response object (and before executing the request object, section 2.2.4), it MUST register them both via:

cherrypy.serving.load(req, resp)

This makes the request and response objects available via cherrypy.request and cherrypy.response, respectively.

2.2.4 Request Execution

When ready, the HTTP server calls the 'run' method of the Request. It takes the following arguments; the first four SHOULD be obtained directly from the HTTP Request-Line.

* method: a string containing the HTTP request method token.
    Methods are case-sensitive.
* path: a string containing the Request-URI, minus any query string.
    This string MUST be "% HEX HEX" decoded.
* query_string: a string containing the query string from the URI.
    This string SHOULD NOT be "% HEX HEX" decoded.
* req_protocol: a string containing the HTTP-Version of the request
    message; for example, "HTTP/1.1".

* headers: a list of (name, value) tuples containing the request headers.
* rfile: a file-like object containing the HTTP request entity.

The 'run' method handles the request in any way it sees fit. The only constraint is that it MUST return the cherrypy.response object, which MUST be the same object that the HTTP server created, and which MUST have the following three attributes upon return:

* status: a valid HTTP Status-Code and Reason-Phrase, e.g. "200 OK".
* header_list: a list of (name, value) tuples of the response headers.
* body: an iterable yielding strings.

The HTTP server SHOULD then use these response attributes to build the outbound stream. Due to the vagaries of socket communications, and to reduce the burden on server authors, the HTTP server MAY iterate over the entire response body, or it may not. CherryPy application authors should not assume that page handlers which are generators will run to completion.

2.2.5 Cleanup

Regardless of whether the HTTP server iterates over the entire response body or not, it MUST call the 'close' method of the request object once it has finished with the body. The 'close' method takes no args, and MUST be idempotent.

Once an HTTP server obtains a request object, it MUST call the 'close' method, even if exceptions occur during the remainder of the process. Once the 'close' method returns (or errors), the HTTP server SHOULD delete all references to the request and response objects.

In addition, the HTTP server MUST clear the serving object as follows:

cherrypy.serving.clear()

2.3 Dispatchers

A 'dispatcher' is the function or callable object which looks up the 'page handler' callable and collects config for the current request based on the path_info, other request attributes, and the application architecture.

The default dispatcher discovers the page handler by matching path_info to a hierarchical arrangement of objects, starting at request.app.root. Other dispatchers MAY use other techniques to map the given URI (and other message parameters) to the proper handler.

2.3.1 Invocation

Request objects MUST look up and call a dispatcher as early as possible after the headers are read and parsed, and MUST pass a single 'path_info' argument to the dispatcher.

Dispatchers MUST be callable, and MUST take a single 'path_info' argument (a string). When called, they MUST set request.handler and request.config. In addition, if the handler is an "index" handler (designed to map to URI's which end in a slash ("/")), the dispatcher SHOULD set request.is_index to True.

2.3.2 request.handler

The value bound to request.handler MUST be a callable object that takes no arguments. Note that instances of the builtin exceptions HTTPError, NotFound, and HTTPRedirect may be set as handlers, if appropriate.

Because request.handler MUST take no arguments, it MAY be wrapped in an intermediary object which calls the "real" handler, allowing the "real" handler to be passed arguments which have been stored in the intermediary. For example, the LateParamPageHandler in the reference implementation wraps the "real" handler so that it can decide which arguments to pass to the handler (and can decide as late as possible). Such intermediaries SHOULD provide read-write access to the wrapped handler and SHOULD provide read/write access to the positional and keyword arguments which they will eventually pass to the wrapped handler.

2.3.3 request.config

The value bound to request.config MUST be a new dict object (that is, not shared between requests) and MUST contain all entries found in cherrypy.config, and any entries found in cherrypy.request.app.config which apply to the current path_info or one of its hierarchical ancestors. Entries from app.config MUST override entries from cherrypy.config, and multiple entries in app.config MUST be collapsed into a single entry by retaining the value with the longest URI path.

The request.config dict SHOULD also contain _cp_config entries from handler methods and their containers (such as controller classes) and merge those values into request.config. However, since the very nature of different dispatchers is to enable different controller architectures, the decision of where to attach and collect _cp_config entries is dispatcher-specific. Also, dispatchers SHOULD allow app.config entries to override _cp_config entries; this allows deployers to more easily override developer defaults.

Dispatchers may be nested, and therefore a given dispatcher MAY call another and pass it a different 'path_info' argument (for example, the builtin VirtualHost dispatcher adds a prefix to the path_info value it receives before calling the next dispatcher). Some consumers may even wish to attach dispatchers as methods on their controller classes (which would then presumably set request.handler to a found method of that controller).

2.4 HTTP Servers

An "HTTP server" is a component "that accepts connections in order to service [HTTP] requests by sending back [HTTP] responses." "HTTP communication usually takes place over TCP/IP connections."

Server objects MUST possess the following attributes:

* protocol_version: a string containing the HTTP-Version for which
    the server is at least conditionally compliant.
* start: a method which starts the HTTP server. In order to make servers
    easier to write, this method MAY block until the server is stopped
    or interrupted.
* ready: a boolean state flag, which the server MUST set internally to
    signal whether or not it is ready to receive requests from clients.
* stop: a method which stops the HTTP server. This method MUST block
    until the server is truly stopped (all threads idle or shutdown
    and all sockets closed, including the listening socket).
* restart: a method which calls stop, then start.
* max_request_body_size:
* max_request_header_size:
* thread_pool:

Servers which communicate over TCP SHOULD possess these additional attributes:

* reverse_dns:
* socket_file:
* socket_host:
* socket_port:
* socket_queue_size:
* socket_timeout:

Servers which use SSL SHOULD possess these additional attributes:

* ssl_certificate:
* ssl_private_key:

2.5 WSGI

See PEP 333.

2.6 Engines

Engine objects MUST possess the following attributes:

* state: a state flag, one of:
    * STOPPED = 0
    * STARTING = None
    * STARTED = 1
* block: a method which MUST block until the 'state' is STOPPED or an
    exception is raised. This allows a main thread to wait while child
    threads respond to HTTP requests. If any exception is raised, the
    method SHOULD call its own 'stop' method. If KeyboardInterrupt or
    SystemExit is raised, the method MUST call server.stop.
* restart: a method which MUST call the 'stop' method, and then the
    'start' method.
* start: a method which takes a single optional 'blocking' argument.
    If True, the 'start' method MUST call the 'block' method.
    The 'start' method MAY temporarily set 'state' to STARTING,
    but MUST set it to STARTED before either returning or blocking.
* stop: a method which MUST set 'state' to STOPPED. Note that this
    will signal any thread which has called 'block' to stop blocking.
* wait: a method which must block until the 'state' is STARTED.
    This allows a main thread to wait until the engine has started
    without having to block after that point.

3 Extensions

3.1 Hooks

Hooks are optional callables which are invoked at various points in the request-handling process. They MAY be declared (attached) by the core, by application developers, and by deployers.

3.1.1 Hook points

Each hook callable is bound to a "hook point", a named calling point inside the request-handling process. The exact list of available hook points is flexible, and SHOULD be specified by the request object (section 2.2.1). Request objects SHOULD implement the following hook points, and SHOULD call them according to the corresponding descriptions:

* on_start_resource: called after the headers are read and parsed,
    and a page handler is located.
* before_request_body: called just before the request entity body
    is read from the incoming stream.
* before_handler: called just before the page handler is called.
* before_finalize: called just before the response entity is checked
    for validity. For page handlers which buffer their output, this
    should be called after the entire response body has been buffered.
    For page handlers which stream their output, this should be called
    after the generator has been returned, but before it has been
    iterated over. This may be called more than once if errors occur.
* on_end_resource: called just before the "run" method of the request
    object returns.
* on_end_request: called after the entire response message has been
    written out to the client. This allows hook callables to run
    after unbuffered page handlers have terminated. In general,
    this should be run inside the request object's "close" method.

* before_error_response: called just before generating a response
    due to an unexpected exception.
* after_error_response: called just after generating a response
    due to an unexpected exception.

3.1.2 Hook objects

In order to facilitate the declaration, inspection, and invocation of hook callables, each one MUST be wrapped in a Hook object. Each Hook object MUST possess the following attributes:

* callback: The hook callable that this Hook object is wrapping,
    which will be called when the Hook is called.
* failsafe: If True, the callback MUST be guaranteed to run even if
    other callbacks from the same call point raise any exceptions
    (other than KeyboardInterrupt and SystemExit). Because errors
    may be silenced by failsafe hooks, unexpected exceptions which
    occur during the execution of a hook MUST be logged.
* priority: Defines the order of execution for a list of Hooks at
    the same hook point. Priority numbers SHOULD be limited to the
    closed interval [0, 100], but values outside this range are
    acceptable, as are fractional values.
* kwargs: A set of keyword arguments that will be passed to the
    callable on each call.

3.2 Tools

The Tool interface allows pluggable extensions, both simple and complex, to be declared by a uniform API. It also allows request objects to run code between the page handler lookup (section 2.3.2) and the first hook (section 3.1). This is essential to provide dynamic hook declarations based on the configuration in effect for each request.

Tool objects MUST possess a single "_setup" method which takes no arguments. This method MUST be called after the request.handler has been obtained, and before the first hook point is reached. The reference implementation uses toolboxes (section 3.3), each with its own configuration namespace (section 3.4.2), to accomplish this. Tools SHOULD belong to a toolbox. The "_setup" method SHOULD attach hooks in order to invoke functionality at appropriate points in the request process.

3.2.1 Decorators

Tool objects SHOULD be callable, and this feature SHOULD be used as a decorator to declare that a given tool applies to a given handler. For example, given a Tool object called "tools.proxy", the following code snippet would enable the tool for the given handler:

@tools.proxy(base="https://www.mydomain.cz")
def whats_my_base(self):
    return cherrypy.request.base
whats_my_base.exposed = True

Note in particular that the Tool object must be called to be used in this fashion. This allows application developers to supply keyword arguments to the decorator that will then be used by the tool when its "_setup" method is called. That is, the following code is not expected to work (its behavior is undefined by this specification), since tools.proxy is used as a decorator itself, rather than the result of tools.proxy():

@tools.proxy
def whats_my_base(self):
    return cherrypy.request.base
whats_my_base.exposed = True

Note also that the reference implementation does not wrap the original function; instead, it asserts that the decorated handler function has a configuration attribute (section 3.4.3) which enables the tool. Tool implementations SHOULD do likewise.

3.2.2 Callables

Tool objects SHOULD expose an attribute named "callable", which allows the functionality of the tool to be invoked anywhere, most likely from within a page handler. If the tool object does not have invokable functionality, or if it uses cooperating hooks that are not useful in isolation, it SHOULD NOT expose the "callable" attribute.

3.2.3 Handlers

Some tools are designed to circumvent the normal calling of a page handler; for example, a tool which finds static files and serves them as the response does not need to then call a separate handler. Such tools SHOULD expose a "handler" method, which allows the tool to be declared in place of a "normal" page handler method:

from cherrypy.tools import staticdir

class Root:
    nav = staticdir.handler(section="/nav", dir="nav", root=absDir)

The "handler" method, if provided, MUST return a callable which can be used as a request.handler callable. That callable SHOULD have its "exposed" attribute set to True before being returned from the "handler" method.

The reference implementation includes a HandlerTool class which implements these recommendations.

3.3 Toolboxes

A toolbox is a set of tools sharing a single namespace. CherryPy uses the "tools" namespace for the built-in tools. Distinct toolboxes should be unaware of each other.

3.4 Configuration

In CherryPy, "configuration" refers to the (declarative) values and attributes which affect the (imperative) behavior of a running program. Implementations MUST provide a means of declaring configuration values (indeed, they can hardly prevent normal code from being one); they MAY do so in formats other than Python code (such as INI-style config files).

3.4.1 Scopes

CherryPy configuration is separated in several ways, each set of boundaries mapping directly to some user need.

Configuration data MUST allow for two independent layers: that which applies to a single application and that which applies to ALL applications. The former is called "(per-)application" config, and the latter is called "global" (or "site-wide") config.

Application config is further separated by URI in a hierarchical fashion. That is, each configuration entry for a given URI MUST apply to that URI and all its child URI's (all URI's that begin with the given URI), unless explicitly counteracted by an opposing entry for a child URI.

In some cases, two different applications may share a common URI. For example, a WSGI dispatcher may choose one over another based on the contents of the "Accept" header. A more common example occurs when one application is "mounted" at "/" and another mounted at "/foo". When this occurs, the configuration of each application MUST be isolated to that application; that is, configuration entries from one application MUST NOT "leak" into another, even if they share the same URI-space.

3.4.2 Namespaces

CherryPy config entries, whether global- or application-scoped, SHOULD be "namespaced"; that is, they should use a hierarchical naming scheme for the keys. The reference implementation, for example, adopts the Python "dotted attribute" notation, so that e.g. "tools.sessions.name" refers to a "tools" container (object) with a "sessions" attribute, and a "name" subattribute. This allows the parsing and activation of configuration data to be controlled by smaller "handler" components (at the least, one for each top-level namespace), rather than by a monolithic parser.

3.4.2.1 Namespace handlers

Namespace handlers are objects which parse and activate configuration entries based on a hierarchy. In order to reduce confusion and allow for easy extension, CherryPy implementations SHOULD use sets of namespace handlers exclusively for this task.

Each handler in a set MUST be either a callable which takes a key and a value argument, or a Python 2.5-style context manager [1] whose __enter__ method returns such a callable. The "key" argument MUST by a string, and that key MAY include further hierarchical delimiters (which the callable will parse on its own). The value's type and range are variable for each entry.

3.4.3 Page Handler Attributes

In addition to allowing application developers and deployers to associate configuration with specific URI's, the implementation SHOULD allow them to associate configuration entries with specific page handlers. Because the mapping of URI's to page handlers is not 1:1, this allows maximum developer flexibility.

  1. Footnotes and References

[1] For a complete discussion of the use and requirements of context managers, see http://www.python.org/dev/peps/pep-0343/

Clone this wiki locally