Abstract
A secure, web-based prototype that takes admin requests in natural language to
a validated Bash command to execute on a remote Linux system. The project
tackles the challenges of manual shell management and the execution of
commands generated by AI, using a combination of authenticated access,
retrieval-augmented generation (RAG), host-aware context collection,
deterministic command validation, and SSH-based multi-server orchestration.
Flask, Python, Paramiko, FAISS, and an OpenAI-compatible Large Language Model
were used to implement the system. Functional, security, and performance
testing are done in a controlled environment in the virtual lab.
The results demonstrated successful authentication of execution, multi-host
command orchestration, blocking of unsafe commands, and structured audit
logging with a reliable report. ShellSentry showcases how AI-driven shell
automation can become much more secure by implementing multiple-level
validation, execution policies, and accountability.
Project Overview
ShellSentry is a web-based system that accepts user requests in natural
language, uses an LLM to generate Linux commands, validates those commands
with security rules, then executes approved commands on remote servers via
SSH.
The Problem
Administrators and security teams run repetitive Linux commands across
many servers. That creates a usability gap (strong CLI expertise is
required) and a security gap (mistakes, unsafe commands, or misuse can
impact critical systems). LLMs reduce the usability barrier, but
executing raw model output amplifies risks such as prompt injection,
privilege misuse, and unpredictable behavior across hosts.
Our Approach
ShellSentry closes that gap with defense in depth: validated inputs,
policy-gated commands, host context before generation, auditable
multi-host execution, and controlled reuse through a remote script
archive and safe managed scheduling—without replacing human judgment for
production-grade deployments.