[Bug] Speculative decoding backends are unreachable, silent cache mismatches, and multiple config/doc inconsistencies in cloud-edge LLM example

## Summary

While reproducing the `cloud-edge-collaborative-inference-for-llm` benchmark end-to-end inside a Docker container (following the official Quick Start), I encountered a chain of interconnected bugs that collectively make the **speculative decoding feature unusable** and introduce **silent reproducibility failures** for all users. These issues are distinct from those reported in #357 (Windows compatibility), #360 (missing `retry` + broken `LadeSpecDecLLM` import), and #356 (HTTP 400 crash).

The core theme is: **the example advertises 5 edge model backends (`huggingface`, `vllm`, `api`, `EagleSpecDec`, `LadeSpecDec`), but only 3 actually work**, and when the other 2 fail, the error messages are misleading or entirely absent.

**Full RUNLOG with every command and output:** [RUNLOG.md](https://github.com/hharshhsaini/KubeEdge-Ianvs-Task-Submission-/blob/main/RUNLOG.md)
**Generated artifacts (logs, configs, fixes):** [Ianvs_Generated_Artifacts](https://github.com/hharshhsaini/KubeEdge-Ianvs-Task-Submission-/tree/main/Ianvs_Generated_Artifacts)

## Environment

| Item | Details |
|------|---------|
| **Host OS** | Windows 11 (Build 26100), x86_64 |
| **Docker** | Docker Desktop for Windows (Linux containers via WSL2) |
| **Docker Base Image** | `continuumio/miniconda3:latest` |
| **Container OS** | Linux (kernel 6.6.87.2-microsoft-standard-WSL2) |
| **Python** | 3.8.20 (inside conda environment `ianvs-experiment`) |
| **Conda** | 25.11.1 |
| **Ianvs** | v0.1.0 (installed from source via `pip install -e .`) |
| **Sedna** | Installed from bundled `.whl` (`sedna-0.6.0.1`) |
| **Datasets** | GPQA + MMLU-5-shot (pre-downloaded from Kaggle) |
| **GPU** | Not available - no GPU passthrough to Docker container |
| **Docker Image Size** | ~24.5 GB (includes datasets + model caches) |

> **Note:** Since no GPU was available in the Docker container, all inference was done through **pre-cached results** stored in the workspace directories (`workspace-gpqa/` and `workspace-mmlu/`). The cache mechanism in Ianvs matches the current run configuration against stored cache entries and returns cached responses if found, avoiding the need to actually load and run the LLM models.

---

## Problem 1 - Backend validation in `edge_model.py` rejects speculative decoding backends before they can load

### Location
[`edge_model.py` -> `EdgeModel.__init__()`, line 47](https://github.com/kubeedge/ianvs/blob/main/examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/edge_model.py#L47)

### Error
```
ValueError: Unsupported backend: EagleSpecDec. Supported options are: 'huggingface', 'vllm', 'api'.
```

### Root Cause
`__init__()` validates the backend against only **3 values**:

```python
# edge_model.py, line 47
if self.backend not in ["huggingface", "vllm", "api"]:
    raise ValueError(
        f"Unsupported backend: {self.backend}. Supported options are: 'huggingface', 'vllm', 'api'."
    )
```

But `load()` (same file, lines 74-77) expects **5 backends**:

```python
# edge_model.py, lines 67-77
def load(self, **kwargs):
    if self.backend == "huggingface":
        self.model = HuggingfaceLLM(**self.kwargs)
    elif self.backend == "vllm":
        self.model = VllmLLM(**self.kwargs)
    elif self.backend == "api":
        self.model = APIBasedLLM(**self.kwargs)
    elif self.backend == "EagleSpecDec":            # ← never reached
        self.model = EagleSpecDecModel(**self.kwargs)
    elif self.backend == "LadeSpecDec":              # ← never reached
        self.model = LadeSpecDecLLM(**self.kwargs)
```

**Result:** If a user sets `backend: "EagleSpecDec"` or `backend: "LadeSpecDec"` in `test_queryrouting.yaml` (as the YAML comments suggest they can), `__init__()` raises `ValueError` *before* `load()` is ever called. The speculative decoding code paths are dead code.

### Impact
- Speculative decoding is a **headline feature** of this example (mentioned in README title, architecture diagram, and results table)
- The README Results section (lines 412-423) shows `EagleSpecDec` benchmark results, implying this backend works - **but it cannot be selected**
- Users following the YAML documentation comments will hit this crash immediately

### Reproduction Steps
```bash
# In test_queryrouting.yaml, uncomment EagleSpecDec:
#   - backend:
#       values:
#         - "EagleSpecDec"

ianvs -f examples/cloud-edge-collaborative-inference-for-llm/benchmarkingjob.yaml
# -> ValueError: Unsupported backend: EagleSpecDec
```

### Proposed Fix

```diff
# edge_model.py, line 47
- if self.backend not in ["huggingface", "vllm", "api"]:
+ if self.backend not in ["huggingface", "vllm", "api", "EagleSpecDec", "LadeSpecDec"]:
      raise ValueError(
-         f"Unsupported backend: {self.backend}. Supported options are: 'huggingface', 'vllm', 'api'."
+         f"Unsupported backend: {self.backend}. Supported options are: 'huggingface', 'vllm', 'api', 'EagleSpecDec', 'LadeSpecDec'."
      )
```

---

## Problem 2 - Typo in `test_queryrouting.yaml`: `LadeSepcDec` instead of `LadeSpecDec`

### Location
[`test_queryrouting.yaml`, line 32](https://github.com/kubeedge/ianvs/blob/main/examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/test_queryrouting.yaml#L32)

### Details
```yaml
# Current (line 32):
#  5> "LadeSepcDec": Lookahead Decoding framework;
#        ^^^^
#     Typo: "Sepc" should be "Spec"

# Should be:
#  5> "LadeSpecDec": Lookahead Decoding framework;
```

### Impact
- Users who copy the backend name from the YAML comments will use `"LadeSepcDec"` (wrong), which won't match the `load()` code that checks for `"LadeSpecDec"` (correct)
- Even after fixing Problem 1, users who follow the comments will still hit a silent failure - the backend won't match any `elif` branch in `load()`, and the model will never be initialized (no error raised, just `AttributeError` later when `self.model` is accessed)

### Proposed Fix
```diff
# test_queryrouting.yaml, line 32
-             #  5> "LadeSepcDec": Lookahead Decoding framework;
+             #  5> "LadeSpecDec": Lookahead Decoding framework;
```

---

## Problem 3 - Cache config mismatch silently skips 14,000+ cached results with no warning

### Location
[`base_llm.py` -> `BaseLLM._load_cache()`, line 251](https://github.com/kubeedge/ianvs/blob/main/examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/models/base_llm.py#L251)

### Details
The cache lookup uses **exact config dict equality**:

```python
# base_llm.py, line 251
for cache in self.cache_models:
    if cache["config"] == self.config:   # ← strict equality, no logging on mismatch
        self.cache = cache
        self.cache_hash = {item["query"]:item['response'] for item in cache["result"]}
```

When the cached results were generated with `backend: "vllm"` but the user's current config has `backend: "huggingface"` (common when running on a CPU-only machine), the comparison **silently fails**. There is:
- No log message saying "cache found but config mismatch"
- No indication of *what* differed (backend? model? temperature?)
- No hint that 14,042 cached results exist but aren't being used

The user's only clue is that the benchmark suddenly tries to download a **14GB+ model from HuggingFace** instead of using cached results - with no explanation of why.

### Impact
- First-time users waste hours downloading large models they don't need
- The silence makes debugging extremely difficult - was it a path issue? A config issue? A cache format issue?
- This directly affected our MMLU benchmark run, where the provided `workspace-mmlu` cache had `backend: "vllm"` but our Docker setup used `backend: "huggingface"`

### Proposed Fix

```diff
# base_llm.py, _load_cache() method
  for cache in self.cache_models:
      if cache["config"] == self.config:
          self.cache = cache
          self.cache_hash = {item["query"]:item['response'] for item in cache["result"]}
+     else:
+         diff_keys = [k for k in set(list(cache["config"].keys()) + list(self.config.keys()))
+                      if cache["config"].get(k) != self.config.get(k)]
+         LOGGER.warning(
+             "Cache entry found but config mismatch on keys: %s. "
+             "Cached: %s vs Current: %s. %d cached results skipped.",
+             diff_keys,
+             {k: cache["config"].get(k) for k in diff_keys},
+             {k: self.config.get(k) for k in diff_keys},
+             len(cache.get("result", []))
+         )
```

---

## Problem 4 - Dockerfile does not install the `retry` dependency

### Location
[`Dockerfile`, line 25](https://github.com/kubeedge/ianvs/blob/main/examples/cloud-edge-collaborative-inference-for-llm/Dockerfile#L25) and [`requirements.txt`](https://github.com/kubeedge/ianvs/blob/main/examples/cloud-edge-collaborative-inference-for-llm/requirements.txt)

### Details
The `requirements.txt` for this example contains:
```
vllm
transformers
openai
accelerate
datamodel_code_generator
kaggle
groq
```

But `base_llm.py` line 7 imports:
```python
from retry import retry
```

The `retry` package is **not listed** in `requirements.txt`. When following the Docker Quick Start, `pip install -r requirements.txt` succeeds, but the benchmark **crashes at runtime** with:
```
ModuleNotFoundError: No module named 'retry'
```

> **Note:** This specific missing dependency is also reported in #360. I include it here because the Dockerfile workflow (the primary recommended setup path) is directly affected, and the fix should be part of the same `requirements.txt` patch.

### Proposed Fix
```diff
# requirements.txt
  vllm
  transformers
  openai
  accelerate
  datamodel_code_generator
  kaggle
  groq
+ retry
```

---

## Problem 5 - README inconsistencies and undocumented configuration gaps

### 5a. README contradicts itself on Joint Inference mode

**Line 65-67:**
> In this example, we will rely on Ianvs' Joint Inference Paradigm using the **`inference-then-mining`** mode to implement a Query Routing strategy.

**But `benchmarkingjob.yaml` line 8:**
```yaml
hard_example_mining_mode: "mining-then-inference"
```

The README says `inference-then-mining`, the actual config uses `mining-then-inference`. One of them is wrong. Based on the query routing design (first decide routing, then infer), `mining-then-inference` in the YAML appears correct, and the README should be updated.

### 5b. README recommends Python 3.8, Dockerfile uses Python 3.10

- **README line 162:** `conda create -n ianvs-experiment python=3.8`
- **Dockerfile line 5:** `PYTHON_VERSION=3.10`
- **README line 91:** `Python 3.8+ environment`

While both work, the inconsistency is confusing. The Dockerfile (the recommended quick start) uses 3.10, but a user following the "Detailed Setup Guide" would install 3.8.

### 5c. No MMLU-specific `benchmarkingjob.yaml` or `testenv.yaml` provided

The README mentions both GPQA and MMLU-5-shot datasets (line 186), and the Dockerfile copies both datasets (lines 29-32):
```dockerfile
COPY dataset-gpqa/ /ianvs/dataset/gpqa/
COPY workspace-gpqa/ /ianvs/workspace-gpqa/
COPY dataset-mmlu/ /ianvs/dataset/mmlu-5-shot/
COPY workspace-mmlu/ /ianvs/workspace-mmlu/
```

But **only GPQA configs are provided** in the repository:
- `benchmarkingjob.yaml` -> hardcodes `workspace: "./workspace-gpqa"`
- `testenv.yaml` -> hardcodes `train_data: "./dataset/gpqa/train_data/data.json"`

To run MMLU, users must manually create `benchmarkingjob-mmlu.yaml` and `testenv-mmlu.yaml` with the correct paths - but **this is not documented anywhere**. During our benchmark run, we had to reverse-engineer the correct config by examining the cache structure.

### 5d. Dockerfile `COPY` paths are undocumented

The Dockerfile (lines 29-32) expects four directories (`dataset-gpqa/`, `workspace-gpqa/`, `dataset-mmlu/`, `workspace-mmlu/`) to exist alongside it, but the README's Docker Quick Start (lines 98-145) never mentions preparing these directories. The user is told to download datasets **inside** the container after it's running, which contradicts the Dockerfile's `COPY` instructions.

### Proposed Fix
- Update README line 67: change `inference-then-mining` -> `mining-then-inference`
- Standardize Python version to 3.10 in both README and Dockerfile
- Add `benchmarkingjob-mmlu.yaml` and `testenv-mmlu.yaml` (or a parametric config that accepts dataset name)
- Document the required directory structure for `docker build` in the Docker Quick Start section

---

## Summary Table

| # | Bug | File(s) | Severity | Affects |
|---|-----|---------|----------|---------|
| 1 | Backend validation rejects `EagleSpecDec`/`LadeSpecDec` | `edge_model.py:47` | 🔴 **Critical** - feature completely broken | All platforms |
| 2 | Typo `LadeSepcDec` in YAML comments | `test_queryrouting.yaml:32` | 🟡 Medium - misleads users who copy from comments | All platforms |
| 3 | Silent cache config mismatch (no warning) | `base_llm.py:251` | 🟠 High - wastes hours, impossible to debug | All platforms |
| 4 | Missing `retry` in `requirements.txt` | `requirements.txt` | 🔴 **Critical** - immediate crash | All platforms |
| 5a | README wrong Joint Inference mode | `README.md:67` | 🟡 Medium - documentation error | All platforms |
| 5b | Python version inconsistency | `README.md:162`, `Dockerfile:5` | 🟢 Low - both work but confusing | All platforms |
| 5c | No MMLU benchmark configs provided | Missing files | 🟠 High - second dataset unusable without manual config | All platforms |
| 5d | Dockerfile `COPY` paths undocumented | `Dockerfile:29-32`, `README.md` | 🟡 Medium - Docker build fails without prep | Docker users |

## Acceptance Criteria

- [ ] `backend: "EagleSpecDec"` no longer raises `ValueError` in `edge_model.py`
- [ ] YAML comment typo `LadeSepcDec` -> `LadeSpecDec` in `test_queryrouting.yaml`
- [ ] Cache mismatch logs a `WARNING` with the differing keys and number of skipped results
- [ ] `retry` added to `requirements.txt`
- [ ] README Joint Inference mode matches `benchmarkingjob.yaml`
- [ ] MMLU-specific config files provided (or config parameterized)
- [ ] Dockerfile `COPY` prerequisites documented in README

## Related Issues
- #360 - Missing `retry` dependency + broken `LadeSpecDecLLM` import + undocumented OpenAI requirement
- #357 - Windows compatibility + inaccessible GPQA dataset
- #356 - HTTP 400 crash during cloud API inference
- #338 - KeyError crashes during metric computation


Item	Details
Host OS	Windows 11 (Build 26100), x86_64
Docker	Docker Desktop for Windows (Linux containers via WSL2)
Docker Base Image	`continuumio/miniconda3:latest`
Container OS	Linux (kernel 6.6.87.2-microsoft-standard-WSL2)
Python	3.8.20 (inside conda environment `ianvs-experiment`)
Conda	25.11.1
Ianvs	v0.1.0 (installed from source via `pip install -e .`)
Sedna	Installed from bundled `.whl` (`sedna-0.6.0.1`)
Datasets	GPQA + MMLU-5-shot (pre-downloaded from Kaggle)
GPU	Not available - no GPU passthrough to Docker container
Docker Image Size	~24.5 GB (includes datasets + model caches)

#	Bug	File(s)	Severity	Affects
1	Backend validation rejects `EagleSpecDec`/`LadeSpecDec`	`edge_model.py:47`	🔴 Critical - feature completely broken	All platforms
2	Typo `LadeSepcDec` in YAML comments	`test_queryrouting.yaml:32`	🟡 Medium - misleads users who copy from comments	All platforms
3	Silent cache config mismatch (no warning)	`base_llm.py:251`	🟠 High - wastes hours, impossible to debug	All platforms
4	Missing `retry` in `requirements.txt`	`requirements.txt`	🔴 Critical - immediate crash	All platforms
5a	README wrong Joint Inference mode	`README.md:67`	🟡 Medium - documentation error	All platforms
5b	Python version inconsistency	`README.md:162`, `Dockerfile:5`	🟢 Low - both work but confusing	All platforms
5c	No MMLU benchmark configs provided	Missing files	🟠 High - second dataset unusable without manual config	All platforms
5d	Dockerfile `COPY` paths undocumented	`Dockerfile:29-32`, `README.md`	🟡 Medium - Docker build fails without prep	Docker users

[Bug] Speculative decoding backends are unreachable, silent cache mismatches, and multiple config/doc inconsistencies in cloud-edge LLM example #361

Description

Summary

Environment

Problem 1 - Backend validation in edge_model.py rejects speculative decoding backends before they can load

Location

Error

Root Cause

Impact

Reproduction Steps

Proposed Fix

Problem 2 - Typo in test_queryrouting.yaml: LadeSepcDec instead of LadeSpecDec

Location

Details

Impact

Proposed Fix

Problem 3 - Cache config mismatch silently skips 14,000+ cached results with no warning

Location

Details

Impact

Proposed Fix

Problem 4 - Dockerfile does not install the retry dependency

Location

Details

Proposed Fix

Problem 5 - README inconsistencies and undocumented configuration gaps

5a. README contradicts itself on Joint Inference mode

5b. README recommends Python 3.8, Dockerfile uses Python 3.10

5c. No MMLU-specific benchmarkingjob.yaml or testenv.yaml provided

5d. Dockerfile COPY paths are undocumented

Proposed Fix

Summary Table

Acceptance Criteria

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Problem 1 - Backend validation in `edge_model.py` rejects speculative decoding backends before they can load

Problem 2 - Typo in `test_queryrouting.yaml`: `LadeSepcDec` instead of `LadeSpecDec`

Problem 4 - Dockerfile does not install the `retry` dependency

5c. No MMLU-specific `benchmarkingjob.yaml` or `testenv.yaml` provided

5d. Dockerfile `COPY` paths are undocumented