Skip to content

Commit

Permalink
Adding some detail on how to set the number of pipelines (#5057)
Browse files Browse the repository at this point in the history
* Adding some detail on how to set the number of pipelines

* Apply suggestions from code review

Co-authored-by: Adrien Guillo <[email protected]>

---------

Co-authored-by: Adrien Guillo <[email protected]>
  • Loading branch information
fulmicoton and guilload committed Jun 3, 2024
1 parent ac37d08 commit 9774c8a
Showing 1 changed file with 16 additions and 3 deletions.
19 changes: 16 additions & 3 deletions docs/configuration/source-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,11 +165,24 @@ EOF

## Number of pipelines

`num_pipelines` parameter is only available for sources that can be distributed: Kafka, GCP PubSub and Pulsar (coming soon).
The `num_pipelines` parameter is only available for distributed sources like Kafka, GCP PubSub, and Pulsar.

It defines the number of pipelines to run on a cluster for the source. The actual placement of these pipelines on the different indexer
will be decided by the control plane. Note that distributions of a source like Kafka is done by assigning a set of partitions to different pipelines.
As a result, it is recommended to make sure the number of partitions is a multiple of the number of `num_pipelines`.
will be decided by the control plane.

:::info

Note that distributing the indexing load of partitioned sources like Kafka is done by assigning the different partitions to different pipelines. As a result, it is important to ensure that the number of partitions is a multiple of `num_pipelines`.

Also, assuming you are only indexing a single Kafka source in your Quickwit cluster, you should set the number of pipelines to a multiple of the number of indexers. Finally, if your indexing throughput is high, you should provision between 2 and 4 vCPUs per pipeline.

For instance, assume you want to index a 60-partition topic, with each partition receiving a throughput of 10 MB/s. If you measured that Quickwit can index your data at a pace of 40MB/s per pipeline, a possible setting could be:
- 5 indexers with 8 vCPUs each
- 15 pipelines

Each indexer will then be in charge of 3 pipelines, and each pipeline will cover 4 partitions.
:::


## Transform parameters

Expand Down

0 comments on commit 9774c8a

Please sign in to comment.