Skip to content
Massimo Nocentini edited this page Jan 13, 2020 · 4 revisions

Title

Streams (srfi-41st)

Author

Philip L. Bewig (Massimo Nocentini)

Status of the upstream SRFI

This SRFI is currently in final status. To see an explanation of each status that a SRFI can hold, see here. To comment on this SRFI, please [email protected]. See instructions here to subscribe to the list. You can access the discussion via the archive of the mailing list. You can access post-finalization messages via the archive of the mailing list.

Abstract

Streams, sometimes called lazy lists, are a sequential data structure containing elements computed only on demand. A stream is either null or is a pair with a stream in its cdr. Since elements of a stream are computed only when accessed, streams can be infinite. Once computed, the value of a stream element is cached in case it is needed again.

Streams without memoization were first described by Peter Landin in 1965. Memoization became accepted as an essential feature of streams about a decade later. Today, streams are the signature data type of functional programming languages such as Haskell.

This Scheme Request for Implementation describes two libraries for operating on streams: a canonical set of stream primitives and a set of procedures and syntax derived from those primitives that permits convenient expression of stream operations. They rely on facilities provided by R6RS, including libraries, records, and error reporting. To load both stream libraries, say:

(import (streams))

in Smalltalk evaluate

Metacello new
   repository: 'github://massimo-nocentini/srfi-41st';
   baseline: 'Srfi41';
   load

Table of contents

Rationale

Harold Abelson and Gerald Jay Sussman discuss streams at length, giving a strong justification for their use. The streams they provide are represented as a cons pair with a promise to return a stream in its cdr; for instance, a stream with elements the first three counting numbers is represented conceptually as (cons 1 (delay (cons 2 (delay (cons 3 (delay '())))))). Philip Wadler, Walid Taha and David MacQueen describe such streams as odd because, regardless of their length, the parity of the number of constructors (delay, cons, '()) in the stream is odd.

The streams provided here differ from those of Abelson and Sussman, being represented as promises that contain a cons pair with a stream in its cdr; for instance, the stream with elements the first three counting numbers is represented conceptually as (delay (cons 1 (delay (cons 2 (delay (cons 3 (delay '()))))))); this is an even stream because the parity of the number of constructors in the stream is even.

Even streams are more complex than odd streams in both definition and usage, but they offer a strong benefit: they fix the off-by-one error of odd streams. Wadler, Taha and MacQueen show, for instance, that an expression like (stream->list 4 (stream-map / (stream-from 4 -1))) evaluates to (1/4 1/3 1/2 1) using even streams but fails with a divide-by-zero error using odd streams, because the next element in the stream, which will be 1/0, is evaluated before it is accessed. This extra bit of laziness is not just an interesting oddity; it is vitally critical in many circumstances, as will become apparent below.

When used effectively, the primary benefit of streams is improved modularity. Consider a process that takes a sequence of items, operating on each in turn. If the operation is complex, it may be useful to split it into two or more procedures in which the partially-processed sequence is an intermediate result. If that sequence is stored as a list, the entire intermediate result must reside in memory all at once; however, if the intermediate result is stored as a stream, it can be generated piecemeal, using only as much memory as required by a single item. This leads to a programming style that uses many small operators, each operating on the sequence of items as a whole, similar to a pipeline of unix commands.

In addition to improved modularity, streams permit a clear exposition of backtracking algorithms using the “stream of successes” technique, and they can be used to model generators and co-routines. The implicit memoization of streams makes them useful for building persistent data structures, and the laziness of streams permits some multi-pass algorithms to be executed in a single pass. Savvy programmers use streams to enhance their programs in countless ways.

There is an obvious space/time trade-off between lists and streams; lists take more space, but streams take more time (to see why, look at all the type conversions in the implementation of the stream primitives). Streams are appropriate when the sequence is truly infinite, when the space savings are needed, or when they offer a clearer exposition of the algorithms that operate on the sequence.

About this Smalltalk implementation

Our effort to port the original SRFI-4 implementation (on Github also) to Smalltalk born from the sake of understanding and of personal edification of the porter, Massimo Nocentini; moreover, we think that a solid library on lazy enumeration of possibly infinite sequences of objects can help the Smalltalk community.

We take inspiration from the original implementation to express concepts using a pure object-oriented methodology, taking advantage of a lively environment to code the translation, at least we try to work according to this desire. Eventually we would like to be able to reproduce all examples and to add some more, taking into account recent research on subjects such as co-recursion, (weighted) enumerations, formal power series and combinatorics.

We rely on the document written by Philip L. Bewig reporting it entirely in order to give to the reader the possibility of cross-check the original implementation with our own (this requires the reader should be familiar with the Scheme language); in parallel, we add paragraphs, chunks of code and examples that describe and show our implementation in order to have everything in a whole document.

Our additions will be written in italics in order to easily distinguish our own work and we take the original document apart using wiki pages while keeping the document untouched with respect to the content.

Status of this port

The port is quite complete with respect to the fundamental operators and comprises almost all examples shown in the original document. On one hand, for the history have a look at Git logs from the usual Github repo view; on the other hand, a roadmap with further and desired features follows:

  • comment and describe formal power series;
  • reproduce the nice research did by prof. Ralf Hinze on co-recursion;
  • implement the theoretical approach of prof. Jan Rutten;
  • integrate examples from SICP.