# AAA Security Assessment Framework

Agentified Agent Assessment (AAA) for CVE vulnerability exploitation testing in LobeChat, built on Google's A2A protocol.

## Architecture

- **A2A Protocol**: Agent-to-Agent communication via JSON-RPC 2.0 over HTTP
- **AAA Pattern**: Assessor (Green Agent) and Assessee (White Agent) architecture
- **Docker Integration**: Automated target environment setup and teardown
- **LLM Evaluation**: Hybrid rule-based + LLM assessment of exploitation results

## Directory Structure

```
aaa-cve-2024-32964-ssrf/
├── data/                                    # All test-relevant data (HF-pluggable)
│   ├── env/                                 # Environment definitions
│   │   ├── lobechat-0.150.5-victim/         # Target container (Dockerfile, docker-compose.yml, entrypoint.sh)
│   │   ├── lobechat-0.150.5-attacker/       # Attacker container (compose.yml, Dockerfile)
│   │   ├── lobechat-0.162.13-victim/
│   │   ├── lobechat-0.162.13-attacker/
│   │   ├── lobechat-1.19.12-victim/
│   │   ├── lobechat-1.19.12-attacker/
│   │   ├── lobechat-1.136.1-victim/
│   │   └── lobechat-1.136.1-attacker/
│   └── task/                                # Task definitions
│       ├── task-cve-2024-32964-ssrf/        # Task name = unified key
│       │   ├── task_config.json
│       │   ├── start.sh, verify.sh, run_agent.sh, ...
│       │   └── README.md
│       └── task-cve-2024-37895-apikey-leak/
│
├── src/                                     # Framework code
│   ├── agentxploit/                         # Core exploit testing utilities
│   │   ├── docker_manager.py                # Docker container lifecycle
│   │   ├── evaluator.py                     # LLM-based evaluation
│   │   ├── task_loader.py                   # Task config loading
│   │   └── result_schema.py                 # Result data models
│   ├── green_agent/                         # A2A interface: assessment orchestrator
│   │   ├── agent.py                         # A2A agent (imports from agentxploit)
│   │   └── security_green_agent.toml
│   ├── white_agent/                         # A2A interface: task executor
│   ├── my_util/                             # Shared utilities (parse_tags, A2A comms)
│   └── launcher.py                          # Workflow orchestrator
│
├── results/                                 # Evaluation outputs
├── main.py                                  # CLI entry point
└── pyproject.toml
```

## Naming Conventions

| Entity | Unified Name | Used In |
|--------|-------------|---------|
| Runtime | `lobechat-0.150.5` | `task_config.json:setup_container.runtime` (maps to `data/env/<runtime>-victim/` and `data/env/<runtime>-attacker/`) |
| Task | `task-cve-2024-32964-ssrf` | `data/task/` dir name, `task_config.json:task_id`, CLI `--task-id` arg |
| Containers | `lobechat-0.150.5-victim`, `lobechat-0.150.5-attacker` | `task_config.json:environment.*_container` defaults, `DockerManager` |

## AgentXploit Interface

```
src/agentxploit/
├── TaskLoader(tasks_dir, runtimes_dir)
│   ├── .load_task(task_id) -> dict
│   ├── .list_tasks() -> list[str]
│   └── .get_task_summary(task_id) -> dict
│
├── DockerManager(task_id, task_config)
│   ├── .start_environment() -> bool
│   ├── .wait_for_target_ready(timeout) -> bool
│   ├── .setup_internal_service() -> bool
│   ├── .exec_in_attacker(command) -> (bool, str)
│   ├── .exec_in_target(command) -> (bool, str)
│   ├── .get_file_content(container, path) -> str|None
│   └── .stop_environment() -> bool
│
├── HybridEvaluator(model)
│   └── .evaluate(task_config, agent_output, docker_mgr) -> dict
│
└── AssessmentResult
    ├── .from_task_config(config) -> AssessmentResult
    ├── .save(results_dir) -> Path
    └── .summary() -> str
```

## Setup

```bash
# Install dependencies
uv sync

# Configure API key
cp .env.example .env
# Edit .env with your OPENAI_API_KEY
```

## Usage

### Full Evaluation (Recommended)

```bash
uv run python main.py launch task-cve-2024-32964-ssrf
```

This runs the complete A2A evaluation:
1. Starts Green Agent (port 9001) and White Agent (port 9002)
2. Green Agent sets up Docker environment
3. Green Agent sends task to White Agent via A2A
4. White Agent executes exploit
5. Green Agent evaluates results with LLM
6. Cleanup and report

### CLI Commands

```bash
# List available tasks
uv run python main.py tasks

# Show task details
uv run python main.py info task-cve-2024-32964-ssrf

# Docker environment management (for debugging)
uv run python main.py docker-up task-cve-2024-32964-ssrf
uv run python main.py docker-down task-cve-2024-32964-ssrf

# Start agents individually
uv run python main.py green --port 9001
uv run python main.py white --port 9002
```

## Adding New Environments

1. Create `data/env/<runtime>-victim/` with `Dockerfile`, `docker-compose.yml`, `entrypoint.sh`
2. Create `data/env/<runtime>-attacker/` with `compose.yml` (and `Dockerfile` if needed)
3. The victim `docker-compose.yml` should include `../<runtime>-attacker/compose.yml`

## Adding New Tasks

1. Create `data/task/<task-id>/` with `task_config.json`, `start.sh`, `verify.sh`
2. Set `setup_container.runtime` in task_config.json to the env name (e.g., `lobechat-0.150.5`)
3. The `setup_container.command` path should reference `../../env/<runtime>-victim`
