Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web RDB API as a Layered API? #16

Open
js-choi opened this issue Apr 27, 2018 · 4 comments
Open

Web RDB API as a Layered API? #16

js-choi opened this issue Apr 27, 2018 · 4 comments

Comments

@js-choi
Copy link

js-choi commented Apr 27, 2018

The new Layered API proposal is trying to enable standardization of web features that are high level, yet which are entirely implemented in terms of lower-level features. This allows browsers to replace polyfill functionality with their own native implementation whenever the latter is available. The first such formal Layered API is the new Async Local Storage proposal, which is layered on top of IndexedDB and other APIs available to general developers.

Work on the Web RDB specification is currently dormant, probably due to difficulty securing support within the Chromium team by its author, who is also a Google employee. But it may be possible to garner more support for Web RDB by reframing it as a Layered API. Web RDB would have to be entirely implementable by a polyfill using IndexedDB—just like Async Local Storage. But any browser with native support for Web RDB could replace it with an implementation targeting its own internal SQL engine.

Reframing Web RDB as a Layered API may make it more palatable to implementers. Are there plans to doing such a thing?

@freshp86
Copy link
Collaborator

Hi @js-choi. I was at BlinkOn conference few weeks ago, and as soon as I saw the presentation about Layered APIs, the first thing that came to mind is exactly the point you are making above. I also mentioned this to @arthurhsu.

Arthur can probably comment in more detail, but my first observation is that, an RDB implementation purely based on existing Web APIs (a requirement for L-APIs) is not possible, unless a few more primitives are exposed by the platform.

Note that RDB has been inspired by our (Arthur and myself) work on https://github.com/google/lovefield, which is leveraging existing APIs, and consequently has some limitations as described here and here, that RDB is aiming to overcome.

@js-choi
Copy link
Author

js-choi commented Apr 27, 2018

@freshp86:

I was at BlinkOn conference few weeks ago, and as soon as I saw the presentation about Layered APIs, the first thing that came to mind is exactly the point you are making above. I also mentioned this to @arthurhsu.

That’s great. It’s encouraging that you thought of the same thing when you first heard of layered APIs.

…my first observation is that, an RDB implementation purely based on existing Web APIs (a requirement for L-APIs) is not possible, unless a few more primitives are exposed by the platform…[Lovefield] is leveraging existing APIs, and consequently has some limitations…that RDB is aiming to overcome.

Indeed. If RDB was reframed as a layered API, then that missing low-level functionality would have to first be standardized as an isolated proposal. The first step would be to write a strawman for such missing low-level functionality, perhaps with use cases based on Lovefield’s limitations. Use cases that do not involve Lovefield might also be present, because this missing low-level functionality might benefit users of IndexedDB in general.

Because of the smaller scope of such a low-level proposal, browsers may find it more palatable than they have for RDB in the past. And, if such low-level functionality is standardized, then a polyfillable RDB layered API—one that would take advantage of browser SQL engines—may in turn become more palatable to implementers.

@arthurhsu
Copy link
Owner

arthurhsu commented Apr 28, 2018

Quite a bunch stuff missing in lower level API.

  1. cross session mutex/semaphore (I think this is currently drafted)
  2. low level APIs allowing page swaps, this includes
    a. transactional, blocking file I/O just like POSIX API
    b. Memory measurement support: how much memory am I using for this JS object?
    c. Fast serialization/deserialization from blob of memory to JS objects
  3. (nice to have) p-thread like threading model

Now the why part.

Fundamental tools for people to write a fast and efficient database is to effectively manage the memory they use. Databases are generally implemented using page stores, and the engine only loads needed pages from disk for optimal performance (that's also why the complicated B-Tree is used).

In C this is doable: malloc() gets us raw bytes, free() returns it, and one can easily cast blobs of memory into C struct. For C++, the general idea is the same, and we have placement news and nicer casts to do so. In JavaScript, there is no reliable way to detect how much memory an object use, not to mention quickly serialize/deserialize it. Without that, I can't implement an efficient page store / swap, and the same compromise I made for Lovefield will still stay: just load everything into memory.

To make things more challenging, I still need to piggy back on IndexedDB and hope that it will do a 4K blob I/O efficiently. That's not the case. IndexedDB did too much as a file API and too little as a database, therefore it does no good for either. A pure LevelDB is good, but POSIX like file I/O is even better.

Current workers (web worker, service workers) are not designed to provide same functionality that I would expect from a long running pthread. Well one can argue we still have great databases like dBase3 or Clipper that does not leverage threads. Sure. Let's put this as optional.

As a result, I'm not that optimistic about layered API. It will be years for JS to offer "casting" and memory management capabilities. Inside browser we for sure can do a page-store-based real DB engine, but polyfill that using JS lower level API as part of the layered API requirements? I'm not seeing a clear path for all the pieces.

@js-choi
Copy link
Author

js-choi commented Apr 28, 2018

@arthurhsu: Thanks for writing this detailed gap analysis. It is clear indeed that there is no clear path forward, because the low-level APIs you’ve pointed out are many. But, as you point out, some of these features are already being worked on, and it also may be productive to bring attention to the features that nobody is yet working on. It would indeed take a lot of effort, however.

  1. cross session mutex/semaphore (I think this is currently drafted)

As you point out, mutexes and semaphores are indeed actively being worked on in the Web Locks API proposal by Joshua Bell. Being generally useful to many applications, hopefully they will arrive in the near future.

b. Memory measurement support: how much memory am I using for this JS object?
c. Fast serialization/deserialization from blob of memory to JS objects

…In C [programs, fine control of pages in memory] is doable: malloc() gets us raw bytes, free() returns it, and one can easily cast blobs of memory into C struct. For C++, the general idea is the same, and we have placement news and nicer casts to do so. In JavaScript, there is no reliable way to detect how much memory an object use, not to mention quickly serialize/deserialize it. Without that, I can't implement an efficient page store / swap, and the same compromise I made for Lovefield will still stay: just load everything into memory.

The proposed Typed Objects feature of JavaScript may eventually support this use case, at least partially. You probably already are aware of this, but just in case: There was an old, dormant TypedObject proposal for JavaScript, before ES2015. They were recently revived in a new proposal by Dmitry Lomov of Google, perhaps in large part to better support WebAssembly–JavaScript interaction. There is a preliminary implementation in Mozilla SpiderMonkey (see its mention on MDN Mozilla bug 1336740). From what I understand, Typed Objects’ structs are intended to eventually be allocatable from JavaScript within TypedArrays.

Of course, they are not here now; they would come in the future. But even if they block RDB from being implemented, it may still be valuable to submit feedback to the people working on Typed Objects, making them aware of RDB’s use case. (Of course, this doesn’t apply to arbitrary JavaScript objects, which will probably never support precise memory measurement.)

a. …transactional, blocking file I/O just like POSIX API…

To make things more challenging, I still need to piggy back on IndexedDB and hope that it will do a 4K blob I/O efficiently. That's not the case. IndexedDB did too much as a file API and too little as a database, therefore it does no good for either. A pure LevelDB is good, but POSIX like file I/O is even better.

This is probably the toughest low-level feature that a layered RDB would require. As you undoubtedly already know, a web API for synchronous/blocking file IO would probably never happen—at least not without very strict limitations. I’m reminded of the deprecation of sync XMLHttpRequest (see whatwg/xhr#20). This requirement may be the most difficult to add to the web; I certainly know of no one working on it yet.

Perhaps the problem could be solved by somehow extending IndexedDB to better support efficient blob serializing and loading, though I don’t yet see how that would happen. Or (more complicatedly) perhaps a new type of worklet would be worth proposing—one that would support limited blocking file IO, and which might also be long running. It depends on how important IO synchronicity would be to a layered RDB. It would be a lot of work to propose one of these APIs. But perhaps it would be worth trying—if such a low-level primitive may be useful for other use cases—and if the primitive would be more palatable for browser vendors to implement.

In any case, thank you for the detailed gap analysis. Cross-process mutexes and memory-managed structs will probably come in the future. But the efficient/synchronous blob serialization problem is more serious and would require a lot of work on a new low-level API.

Hopefully, whether or not it could be a layered API, RDB’s prospects will improve in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants