AI ML

HOME
AI ML
Evaluate – LLM evals framework

November 6, 2025 / Last updated : November 6, 2025 admin AI ML

Evaluate – LLM evals framework

Note, this article is written by me, the human who wrote the code!

If you use LLMs you will need to check their output, else hallucinations, wrong answers, and unhappy customers. Note, this code is all open source, aka “Free”

I built “evaluate” as the server, which does the calls to the LLMs, and you can call the endpoints of the server, or use the PythonSDK which asbtracts some of the API syntax.

evaluate

Supply ground truth for comparison versus LLM output and let a 2nd LLM be the “judge”

Pass or Fail
Eval history is stored in Sqlite database

Installation

Get started by setting up your development environment.

Or try Evaluate immediately by cloning from GitHub.

What you’ll need

Rust version 1.70 or above
Node.js version 20.0 or above (for the documentation site)
A Gemini API key from Google AI Studio
Git for version control

Clone and Setup

Clone the Evaluate repository and set up your environment:

# 1. Clone the repository
git clone git@github.com:RGGH/evaluate.git

# 2. Navigate into the project directory
cd evaluate

Create a .env file in the root directory with your configuration:

DATABASE_URL=sqlite:data/evals.db
api_base = "https://generativelanguage.googleapis.com"
api_key = "AIzaSyAkQnssdafsdfasdfasxxxxxxxxxxxxxxxxxxx"

The project will automatically install all necessary dependencies when you build it.

Start your application

Run the development server:

cargo run

The cargo run command builds your Rust application and starts the evaluation server locally at http://127.0.0.1:8080/.

You should see output similar to:

                   _                          
                  | |               _         
 _____ _   _ _____| | _   _ _____ _| |_ _____ 
| ___ | | | (____ | || | | (____ (_   _) ___ |
| ____|\ V // ___ | || |_| / ___ | | |_| ____|
|_____) \_/ \_____|\_)____/\_____|  \__)_____)
                                                                                                                                                                                                       

    LLM Evaluation & Testing Framework

✅ DATABASE_URL set to: sqlite:data/evals.db
✅ Created database directory: data
📦 Database file path: /home/pop/rust/evaluate/data/evals.db
📦 Connecting to: sqlite:///home/pop/rust/evaluate/data/evals.db?mode=rwc
✅ Database connected successfully
✅ Database migrations completed
🚀 Starting server...
📊 Frontend available at http://127.0.0.1:8080
[2025-10-14T15:40:17Z INFO  actix_server::builder] starting 22 workers
[2025-10-14T15:40:17Z INFO  actix_server::server] Actix runtime found; starting in Actix runtime
[2025-10-14T15:40:17Z INFO  actix_server::server] starting service: "actix-web-service-0.0.0.0:8080", workers: 22, listening on: 0.0.0.0:8080

Open your browser and navigate to http://127.0.0.1:8080 to access the built-in GUI. You can now start running evaluations and the database automatically saves your history.

Next Steps

Now that your server is running, you can:

Test the API with sample curl commands
Use the web interface to run single evaluations
Submit batch evaluations using JSON files
View your evaluation history in the GUI

Categories: AI ML, automation, Python Code and Rust Programming

Tags: AI python Rust Programming

AI ML

September 30, 2025

Rust Programming

November 13, 2025