智

智能体评估框架

Name: 智能体评估框架
Author: NeoLabHQ

多维度评分的 Claude Code 智能体综合评估框架

byNeoLabHQ

Home/AI & ML/智能体评估框架

What is it?

多维度 Claude Code 代理评估框架，支持 LLM-as-Judge 模式和研究支持的性能方差分析。

How to use it?

安装技能后，Claude 会在检测到代理评估任务时自动应用此技能，也可直接在提示中引用其名称来调用。

Key Features

多维度代理评估打分
LLM-as-Judge 评估模式
研究支持的性能方差分析
自动化评估标准评分
人工评估辅助检查清单

View on GitHub

GitHub Stats

Stars

Forks

Last Update

Author

NeoLabHQ

License

GPL-3.0

Version

1.0.0

Features

Related Skills

Context Engineering Guide

Comprehensive context engineering tutorial covering attention mechanics, progressive disclosure, context budget management, and quality vs quantity trade-offs for AI agent development

433NeoLabHQ

AI & ML

Developer Tools

Multi-Perspective Critique

Multi-perspective review system using Multi-Agent Debate and LLM-as-Judge patterns with 3 specialized judges, debate rounds, and consensus building

433NeoLabHQ

AI & ML

Developer Tools

Create Claude Code Agent

Complete guide for creating Claude Code agents with YAML frontmatter structure, agent file format, trigger condition design, and system prompt writing

433NeoLabHQ

AI & ML

Developer Tools

智能体评估框架

What is it?

How to use it?

Key Features

GitHub Stats

Categories

Tags

Features

Related Skills

Context Engineering Guide

Multi-Perspective Critique

Create Claude Code Agent