Key Findings | ShellSentry

LLM assistance requires policy gates.

Even with prompt hygiene and cleaned model output, deterministic whitelist, blacklist, read-only, and normalization rules remain essential before any remote execution.

Host context materially improves relevance.

Parallel probes that capture OS, services, and sockets per target reduce generic or hallucinated commands compared with single-prompt, host-agnostic generation.

Reuse needs its own threat model.

Archiving scripts and supporting cron scheduling only stayed safe when filenames, existence checks, managed tags, and re-validation of stored content were treated as first-class controls—not add-ons.

Defense in depth is observable end-to-end.

Authentication, input validation, intent routing, generation, validation, execution, structured reporting, and audit logging each catch different failure modes; no single layer is sufficient on its own.

Probabilistic components keep residual risk.

Policies mitigate most abuse, yet LLM behavior remains non-deterministic; broader production use would still demand hardened secrets management, TLS, SIEM feeds, and stricter policy profiles as called out in the project scope notes.