Evaluate – LLM evals framework

Note, this article is written by me, the human who wrote the code!

If you use LLMs you will need to check their output, else hallucinations, wrong answers, and unhappy customers. Note, this code is all open source, aka “Free”

I built “evaluate” as the server, which does the calls to the LLMs, and you can call the endpoints of the server, or use the PythonSDK which asbtracts some of the API syntax.

Actual images from GUI Dashboard

evaluate

Supply ground truth for comparison versus LLM output and let a 2nd LLM be the “judge”

  • Pass or Fail
  • Eval history is stored in Sqlite database

Installation

Get started by setting up your development environment.

Or try Evaluate immediately by cloning from GitHub.

What you’ll need

  • Rust version 1.70 or above
  • Node.js version 20.0 or above (for the documentation site)
  • A Gemini API key from Google AI Studio
  • Git for version control

Clone and Setup

Clone the Evaluate repository and set up your environment:

# 1. Clone the repository
git clone git@github.com:RGGH/evaluate.git

# 2. Navigate into the project directory
cd evaluate

Create a .env file in the root directory with your configuration:

DATABASE_URL=sqlite:data/evals.db
api_base = "https://generativelanguage.googleapis.com"
api_key = "AIzaSyAkQnssdafsdfasdfasxxxxxxxxxxxxxxxxxxx"

The project will automatically install all necessary dependencies when you build it.

Start your application

Run the development server:

cargo run

The cargo run command builds your Rust application and starts the evaluation server locally at http://127.0.0.1:8080/.

You should see output similar to:

                   _                          
| | _
_____ _ _ _____| | _ _ _____ _| |_ _____
| ___ | | | (____ | || | | (____ (_ _) ___ |
| ____|\ V // ___ | || |_| / ___ | | |_| ____|
|_____) \_/ \_____|\_)____/\_____| \__)_____)


LLM Evaluation & Testing Framework

✅ DATABASE_URL set to: sqlite:data/evals.db
✅ Created database directory: data
📦 Database file path: /home/pop/rust/evaluate/data/evals.db
📦 Connecting to: sqlite:///home/pop/rust/evaluate/data/evals.db?mode=rwc
✅ Database connected successfully
✅ Database migrations completed
🚀 Starting server...
📊 Frontend available at http://127.0.0.1:8080
[2025-10-14T15:40:17Z INFO actix_server::builder] starting 22 workers
[2025-10-14T15:40:17Z INFO actix_server::server] Actix runtime found; starting in Actix runtime
[2025-10-14T15:40:17Z INFO actix_server::server] starting service: "actix-web-service-0.0.0.0:8080", workers: 22, listening on: 0.0.0.0:8080

Open your browser and navigate to http://127.0.0.1:8080 to access the built-in GUI. You can now start running evaluations and the database automatically saves your history.

Next Steps

Now that your server is running, you can:

  • Test the API with sample curl commands
  • Use the web interface to run single evaluations
  • Submit batch evaluations using JSON files
  • View your evaluation history in the GUI

AI ML

Previous article

ReLU – Rectified Linear Unit