Skip to content

Evaluation tasks

Task
Creating and evaluating tasks

Solvers

generate()
Convert a chat to a solver function

Scorers

Store, view, deploy, and compare evaluation logs

vitals_log_dir() vitals_log_dir_set()
The log directory
vitals_view()
Interactively view local evaluation logs
vitals_bundle()
Prepare logs for deployment
vitals_bind()
Concatenate task samples for analysis

Example Evals

are
An R Eval