Structure, test, and observe every AI agent, model, and pipeline you ship. Ryva gives AI engineers the rigor that backend teams have had for decades.
$ pip install ryva && ryva init my-project
Project initialized: 1 agent · 1 prompt · 1 tool
$ ryva compile
Compiled: 1 agent · 1 pipeline · 0 errors
$ ryva test
4/4 tests passed (schema, latency, regression, adversarial)
$ ryva traces list
RUN ID AGENT STATUS DURATION
f87951e4 summarizer_agent success 2201ms
8
Test types built in
15
Fuzz input categories
4
Standard benchmarks
2 min
To first passing test
One CLI. All the tooling backend engineers take for granted, purpose-built for LLM-powered systems.
Every agent, prompt, tool, and pipeline is a versioned YAML file. Dependencies are declared. Everything compiles before it runs, catching broken references before they hit production.
Schema validation, regression comparison, adversarial probing, memory retention, RAG faithfulness, hallucination detection, fuzz testing, and fine-tune evaluation. All from one CLI command.
Every agent run is traced automatically. See the exact prompt sent, the response received, the model used, and the latency in milliseconds. Inspect any run with ryva traces show.
Track token spend per agent, forecast budget exhaustion at current usage, and surface the cheapest model that still passes all your tests. ryva forecast runs in under a second.
Register every model your project uses as a versioned entry. Centralize provider credentials, aliases, and capability metadata. One source of truth queried with ryva registry list.
Score your agents against four built-in benchmark suites: summarization quality, question answering accuracy, classification F1, and code correctness. Catch regressions before users do.
Eight test types built in. No configuration required to get started. Each test runs in isolation and reports individually, so failures are easy to trace to their source.
Add ryva test to your CI pipeline and it runs automatically on every push. Test results are stored alongside traces so you can jump from a failure directly to the run that caused it.
15/15 fuzz tests passed
empty, whitespace, very_long, special_chars,
unicode, sql_injection, prompt_injection,
null_bytes, newlines, numbers_only, json_input,
html_tags, repeat_chars, mixed_case,
negative_number
ryva testValidates output schema, measures latency against configured thresholds, and runs regression assertions against recorded baseline outputs.
ryva test --fuzzFires 15 fuzz categories at your agent: empty strings, unicode, SQL injection patterns, prompt injection payloads, null bytes, HTML tags, and more.
ryva test --memoryRuns multi-turn conversations and verifies the agent correctly retains and references context from earlier turns.
ryva test --ragTests retrieval pipelines for relevance and faithfulness. Verifies answers are grounded in retrieved documents, not hallucinated.
ryva test --hallucinationSubmits prompts with known facts and detects when the model generates plausible-sounding but factually incorrect claims.
ryva test --regressionCompares current outputs to a saved baseline. Flags any response that diverges beyond the configured similarity threshold.
ryva test --adversarialProbes the agent with crafted inputs: prompt injections, jailbreak attempts, contradictory instructions, and boundary edge cases.
ryva test --finetuneRuns the same test suite against both the base model and your fine-tuned variant, reporting accuracy delta and regression count.
Trace: f87951e4
Agent: summarizer_agent
Model: claude-sonnet-4-5 (anthropic)
Status: success
Duration: 2201ms
Step 1 - Prompt
"You are a precise summarization assistant..."
Step 2 - Response
{"summary": "...", "word_count": 8}
Every time an agent runs, Ryva records the full execution trace. The exact prompt sent, the complete response, the model and provider, latency in milliseconds, and token cost. Nothing is summarized or omitted.
Traces are indexed by run ID, agent name, and timestamp. Inspect any run from the CLI or browse all runs in Ryva Cloud. When a test fails, you can jump directly to the trace that triggered it.
ryva traces listList all recent runs, sorted by timeryva traces showFull step-by-step breakdown of a single runRyva tracks token usage per agent run and aggregates it into per-agent cost profiles. ryva forecast projects your monthly spend at current call volume and tells you exactly when you will hit your budget.
It also surfaces cheaper model alternatives. If a smaller model passes all your existing tests, Ryva will surface it with the projected savings, so you can switch with confidence.
ryva forecastProject monthly spend at current usage, by agentryva benchmarkScore agents against four standard evaluation suitesryva registry listView all registered models and provider configurationsForecast: my-project (last 30 days)
AGENT CALLS COST/CALL MONTHLY
summarizer_agent 1,240 $0.0018 $2.23
classifier_agent 890 $0.0009 $0.80
qa_extractor 320 $0.0011 $0.35
Total: $3.38 of $25.00 budget (13.5% used)
At current pace: budget lasts 7.4 months
Cheaper alternatives that pass all tests:
claude-haiku-3 saves $1.21/mo all 4 tests pass
Ryva is open source and free to use. The CLI runs locally, your traces stay on your machine, and nothing is sent to an external server unless you choose Ryva Cloud.
Open Source
Free
Forever. No account required.
Ryva Cloud
Early access
Hosted traces, team dashboard, CI integrations.
Open source. No account required. Your first agent test passes before you finish reading this page.
$ pip install ryva
$ ryva init my-project
$ ryva test
4/4 tests passed (schema, latency, regression, adversarial)