Oracle data loader example #1346

mbostock · 2024-05-14T21:55:09Z

No description provided.

rinie · 2024-05-29T06:00:31Z

Loader examples seem messy. I looked at Snowflake/Postgress/Databricks examples and edit.csv.ts is a combination of a simple template javascript process.stdout.write(csvFormat and a SQL query. Why not factor out the query and you have a standard connector with just the query as a variant.
Browsing DuckDB solutions I came at https://evidence.dev/blog/why-we-built-usql/ and https://github.com/rinie/evidence-connector-oracledb...
Your markdown usage and delta data updates seem easier, 'their' sql in development mode and connectors seem more easy.

If both solutions use a dataloader/source to get data in parquet or arrow and use DuckDB wasm in the frontend, why not enable their connectors to framework?

How is the process.stdout.write processed to parquet/duckdb.
Would it be possible to bypass that and generate parquet directly, or is that slower?

Would like to play around with Oracle and Framework but just by providing a connection and some queries, not code...

mbostock · 2024-05-30T04:52:20Z

Why not factor out the query and you have a standard connector with just the query as a variant.

There are lots of answers to this, but for a start, so you can:

use any off-the-shelf connector or SDK (rather than being limited to what Framework directly integrates)
write a data loader in any programming language (rather than being limited to SQL)
interpolate dynamic variables into queries (rather than being limited to static queries)
post-process the results in code (rather than forcing all logic to be in SQL)
save the results in whatever format you like (rather than being limited to Parquet or whatever)

Data loaders as SQL files would eliminate a bit of boilerplate, but you’d give up a whole ton of functionality in exchange.

Also, you could totally implement a SQL interpreter if you want to write data loaders this way. Use the interpreters config option to register .sql as a data loader extension, and point it to a program which reads the file and runs the query. (We could conceivably register a SQL interpreter automatically if you configure a database, but I don’t consider it’s a priority because we want to retain the flexibility to talk to multiple databases, support multiple languages, and allow post-processing of the query results in code. Maybe we’d do it if the SQL interpreter approach proved popular.)

As to your other questions, maybe these examples will help?

https://observablehq.observablehq.cloud/framework-example-loader-parquet/
https://observablehq.observablehq.cloud/framework-example-loader-duckdb/

If you have more open-ended questions — rather than comments specific to the Oracle data loader — please open a discussion rather than commenting here. Thank you!

rinie · 2024-05-30T12:51:52Z

Thx. would you be OK if I added an evidence.dev loader so I could reuse my simple Oracle code?
That way reusing their sql files separated from the 'connector' approach fit my Oracle needs.

I am attracted to framework due to the markdown/sql/parquet/arrow use on the frontend.
For me defaulting that on the backend would also make sense.
Static website but being able to refresh data for the current day every 5 minutes, and use static data for older periods, is what I try to achieve.

The examples I see do csvformat to standard out, ending in the cache as parquet/arrow?
CSV/JSON/TSV seem only appropriate for smaller datasets.

Fil · 2024-05-30T16:33:36Z

Here's an example that output an arrow table
https://github.com/observablehq/framework/blob/main/examples/loader-arrow/src/data/samples.arrow.js

this one outputs a parquet file (from javascript)
https://observablehq.observablehq.cloud/framework-example-loader-parquet/

this one outputs a parquet file from shell/duckdb
https://observablehq.observablehq.cloud/framework-example-loader-duckdb/

rinie · 2024-05-31T13:09:48Z

Thx for the examples but
// Output the Apache Arrow table as an IPC stream to stdout.
process.stdout.write(Arrow.tableToIPC(table));

And then
npm build
and then ending up in the cache as parquet seems a big detour???
├─ src # source root
│ ├─ .observablehq
│ │ ├─ cache # data loader cache

How about a delta update to the cache
or a multiple parquets for 1 data source

mbostock · 2024-05-31T14:20:52Z

ending up in the cache as parquet seems a big detour

I think you’re mixing your examples. The first one Fil linked generates an Arrow IPC file. The other ones generate a Parquet file. What ends up in the cache is determined by what the data loader generates. Framework doesn’t convert between file formats.

How about a delta update to the cache

Yes, you can do this in a data loader; read from the cache in your data loader.

or a multiple parquets for 1 data source

See the archives section of the data loader documentation.

rinie · 2024-05-31T14:29:04Z

Ok my misunderstanding but is using stdout to chache not slower than a direct write?
But databricks process.stdout.write(csvFormat from databricks instead of arrow/parquet or Postgres will be very slow for large tables?

So CSV/JSON/TSV end up as that format in the cache not duckdb processed?
So SQL to Parquet/Arrow would be a smarter default?

mbostock · 2024-05-31T14:48:26Z

is using stdout to chache not slower than a direct write

No. Also, data loaders only run at build time, so even a slow data loader won’t affect the performance of your site; the only thing that affects page performance is how big the files are (any how many files and libraries etc. you load on the page).

So CSV/JSON/TSV end up as that format in the cache not duckdb processed?

Data loaders generate files. That’s it. Whether or not you use DuckDB (either in a data loader, or in the client using DuckDB-Wasm) is up to you. We’ve pointed you at the DuckDB data loader example, and you can find the DuckDB-Wasm examples in our docs.

So SQL to Parquet/Arrow would be a smarter default?

The Parquet file format is a good choice for efficient representation of tabular data, yes. CSV is also a good choice if you have smaller data sets.

rinie · 2024-05-31T15:04:17Z

Thx for clarifying this. I mistakenly thought large datasets would be automatically served as parquet.
'Framework includes built-in support for client-side SQL powered by DuckDB. You can use SQL to query data from CSV, TSV, JSON, Apache Arrow, Apache Parquet, and DuckDB database files, which can either be static or generated by data loaders.'

Instead of using DuckDB to process the generated CSV/TSV/JSON.
Will try to make an Oracle to Parquet example.
Or serverside post processing CSV/TSV/JSON -> Parquet using DuckDb.

mbostock · 2024-05-31T15:36:21Z

Right, Framework’s client-side SQL code blocks require data files to be in one of several formats supported by DuckDB-Wasm. But you don’t have to use client-side SQL — you can read files in JavaScript in whatever format you want. For example if you just want to render a chart when you can load a FileAttachment and pass the result directly to Plot.plot.

mbostock mentioned this issue May 14, 2024

Data loader examples #1304

Closed

16 tasks

mbostock added the documentation Improvements or additions to documentation label May 14, 2024

mbostock mentioned this issue May 30, 2024

SQL interpreter #1405

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Oracle data loader example #1346

Oracle data loader example #1346

mbostock commented May 14, 2024

rinie commented May 29, 2024

mbostock commented May 30, 2024

rinie commented May 30, 2024

Fil commented May 30, 2024

rinie commented May 31, 2024

mbostock commented May 31, 2024

rinie commented May 31, 2024 •

edited

mbostock commented May 31, 2024

rinie commented May 31, 2024

mbostock commented May 31, 2024

Oracle data loader example #1346

Oracle data loader example #1346

Comments

mbostock commented May 14, 2024

rinie commented May 29, 2024

mbostock commented May 30, 2024

rinie commented May 30, 2024

Fil commented May 30, 2024

rinie commented May 31, 2024

mbostock commented May 31, 2024

rinie commented May 31, 2024 • edited

mbostock commented May 31, 2024

rinie commented May 31, 2024

mbostock commented May 31, 2024

rinie commented May 31, 2024 •

edited