Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 8,264 public repositories matching this topic...
XL-LightHouse是一套支持超大数据量、支持超高并发的通用型流式大数据统计系统。常见的应用场景包括:PV、UV统计;电商销售额、下单用户数统计;日志量统计;接口调用量、异常量、耗时情况统计;服务器运维指标监控等功能。系统支持多维度统计,支持各种复杂的条件筛选和逻辑判断,一键部署,一行代码接入,轻松实现各种海量数据实时统计,帮助企业以更低的成本快速搭建起数据指标体系,是企业降本增效的好帮手!
-
Updated
May 16, 2024 - Java
YTsaurus is a scalable and fault-tolerant open-source big data platform.
-
Updated
May 16, 2024 - C++
A large-scale entity and relation database supporting aggregation of properties
-
Updated
May 16, 2024 - Java
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
-
Updated
May 16, 2024 - Java
REST API for Apache Spark on K8S or YARN
-
Updated
May 16, 2024 - Java
Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Apache Spark Performance Dashboard using containers technology.
-
Updated
May 16, 2024 - Dockerfile
An open source framework for building data analytic applications.
-
Updated
May 16, 2024 - Java
🧙 Build, run, and manage data pipelines for integrating and transforming data.
-
Updated
May 16, 2024 - Python
Resilient data pipeline framework running on Apache Spark
-
Updated
May 16, 2024 - Scala
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
-
Updated
May 16, 2024 - Scala
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 414 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia