#

Apache Spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 8,264 public repositories matching this topic...

marsfoundation / spark-app

spark ethereum dapp dai makerdao defi

Updated May 16, 2024
TypeScript

xl-xueling / xl-lighthouse

XL-LightHouse是一套支持超大数据量、支持超高并发的通用型流式大数据统计系统。常见的应用场景包括：PV、UV统计；电商销售额、下单用户数统计；日志量统计；接口调用量、异常量、耗时情况统计；服务器运维指标监控等功能。系统支持多维度统计，支持各种复杂的条件筛选和逻辑判断，一键部署，一行代码接入，轻松实现各种海量数据实时统计，帮助企业以更低的成本快速搭建起数据指标体系，是企业降本增效的好帮手！

statistics big-data spark analytics clickhouse flink digital-solutions

Updated May 16, 2024
Java

ytsaurus / ytsaurus

YTsaurus is a scalable and fault-tolerant open-source big data platform.

sql big-data spark clickhouse distributed-database lakehouse olap-database ytsaurus

Updated May 16, 2024
C++

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

python java r scala sql big-data spark jdbc

Updated May 16, 2024
Scala

gchq / Gaffer

A large-scale entity and relation database supporting aggregation of properties

big-data spark hadoop graph accumulo hbase graph-database parquet aggregation

Updated May 16, 2024
Java

apache / celeborn

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.

spark bigdata shuffle

Updated May 16, 2024
Java

apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery real-time sql database spark hive hadoop etl snowflake olap query-engine redshift dbt elt iceberg hudi delta-lake lakehouse

Updated May 16, 2024
Java

apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

big-data spark flink real-time-analytics data-ingestion table-store paimon streaming-datalake

Updated May 16, 2024
Java

exacaster / lighter

REST API for Apache Spark on K8S or YARN

spark apache-spark yarn jupyter k8s livy sparkmagic

Updated May 16, 2024
Java

cerndb / spark-dashboard

Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Apache Spark Performance Dashboard using containers technology.

docker kubernetes spark monitoring helm grafana grafana-dashboard

Updated May 16, 2024
Dockerfile

cdapio / cdap

An open source framework for building data analytic applications.

python java platform middleware spark integration dataset spark-streaming java-8 unified mapreduce cdap

Updated May 16, 2024
Java

mage-ai / mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

python data-science data machine-learning sql spark pipeline etl pipelines orchestration artificial-intelligence data-engineering data-integration dbt elt transformation data-pipelines reverse-etl

Updated May 16, 2024
Python

apache / incubator-uniffle

Uniffle is a high performance, general purpose Remote Shuffle Service.

rss spark mapreduce shuffle tez remote-shuffle-service

Updated May 16, 2024
Java

nessie

projectnessie / nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

git java data spark aws-lambda iceberg

Updated May 16, 2024
Java

AbsaOSS / pramen

Resilient data pipeline framework running on Apache Spark

scala big-data spark etl hacktoberfest data-pipeline

Updated May 16, 2024
Scala

Shankar-Anumula / data-engineer

java scala spark spark-sql

Updated May 16, 2024
Scala

NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs

big-data spark gpu rapids

Updated May 16, 2024
Scala

apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator

rust spark arrow datafusion

Updated May 16, 2024
Rust

delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

big-data spark analytics acid delta-lake

Updated May 16, 2024
Scala

tobymao / sqlglot

Python SQL Parser and Transpiler

mysql python bigquery parser postgres sql spark presto hive clickhouse sqlite snowflake optimizer transpiler redshift databricks tsql trino sqlparser duckdb

Updated May 16, 2024
Python

Created by Matei Zaharia

Released May 26, 2014

Followers: 414 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics