Skip to content

Simple local RAG sample using Spring-AI, Ollama LLMs and Vespa-AI

License

Notifications You must be signed in to change notification settings

pehrs/spring-ai-vespa-embedding-sample

Repository files navigation

spring-ai-vespa-embedding-sample

This repo is the result of me experimenting with running LLM models spring-ai, ollama and vespa. I wanted to run everything locally and not rely on any online services.

This is a simple RAG Spring AI service running everything locally that uses Vespa as the VectorStore and an ollama model for building embeddings and prompting.

The repo has two (spring-boot) applications:

  • PopulateVespaVectorStore - Batch job that will get a number of news articles via RSS feeds and insert them into Vespa for the RAG calling the ollama to generate the embedding vector.
  • RagSampleService - Service that will use Vespa to do a similarity search to provide set of documents for the PromptTemplate. The service uses this template.

Overview

This code is built on-top of these samples:

Remember that spring-ai is still in development. Please check out these for updates:

Build

# if you use asdf then set jdk version to 17+
asdf local java corretto-17.0.6.10.1

# Results go into target/spring-ai-vespa-embedding-sample-0.0.1-SNAPSHOT-assembly/
mvn clean package

Runtime Requirements

Ollama model running locally

Install ollama

The default configuration is using the mistral llm:

ollama pull mistral:latest

To list your local ollama models:

ollama list

# For more details on the models do:
curl -s localhost:11434/api/tags | jq .

Vespa

Start Vespa cluster

You need to start a Vespa version 8 cluster:

docker run --detach \
  --name vespa \
  --hostname vespa-tutorial \
  --publish 8080:8080 \
  --publish 19071:19071 \
  --publish 19092:19092 \
  --publish 19050:19050 \
  vespaengine/vespa:8

Note: the 19050 port is not absolutely necessary, but has a nice status page for the Vespa cluster once you have your Vespa doc-types in place.

Deploy application

Install the vespa-cli if needed:

brew install vespa-cli

Run from the root of this repo:

vespa deploy --wait 300 vespa

If you used the above docker command to expose the 19050 port then you can monitor the Cluster status on this page: http://127.0.0.1:19050/clustercontroller-status/v1/llm

Stopping Vespa

To kill (and delete all data from) the Vespa cluster just:

docker rm -f vespa

Usage

The examples below are using bash scripts to start the applications as the "normal" way of building a spring-boot application, with the spring-boot-maven-plugin plugin, does not allow you to have multiple applications. So I'm using the maven-assembly-plugin to build a distribution with start scripts.

NOTE: The scripts have only been tested on Linux (Ubuntu 22.04) so your mileage might vary. You can always start the applications in Intellij, if you use that.

Populate Vespa with your favorite news

./target/spring-ai-vespa-embedding-sample-0.0.1-SNAPSHOT-assembly/bin/populate-vespa-cluster.sh \
   http://www.svt.se/nyheter/rss.xml

Start the RAG-Service

./target/spring-ai-vespa-embedding-sample-0.0.1-SNAPSHOT-assembly/bin/rag-service.sh

Once the service is up and running then you can ask a question:

curl localhost:8082/ask \
  -H "Content-type: application/json" \
  -d '{"question": "What are the top 5 news?"}'

Configuration

Vespa

If you need to change the vespa config please make sure that your vespa.yaml config aligns with the vespa schema deployed

Ollama

If you need to change the ollama config please make sure your application.properties align with your downloaded model (ollama list)

Misc

The image above is created using PlantUML from the spring-ai-vespa-embedding-sample.puml file.

About

Simple local RAG sample using Spring-AI, Ollama LLMs and Vespa-AI

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published