rand[om]

rand[om]

med ∩ ml

Taming your shell for LLMs

I recently got frustrated with Codex’s command permissions. They don’t let you configure which commands should always be allowed or denied. There’s an issue about it, but it’s been open for almost 4 months as of writing this.

Their sandboxing is less convenient than tools like Claude Code, since it’s more of an all or nothing approach. So I decided to build my own solution.

The Problem

To make LLM CLI agents useful, you need to let them run commands more freely, including ones with network access. But you want control over what they can execute. I think “Claude Code” has a very good user experience for this.

However, Codex doesn’t have this feature. So I decided to build my own solution. After some thinking, I decided I would need to somehow override my shell, but in a way that is only active when the LLM CLI agent is running and doesn’t interfere with my regular terminal usage.

First Attempts (That Failed)

I tried several approaches:

Python wrapper script

I read Codex’s source code to see how it picks shells. It uses the getpwuid system call, so you’d need to change your login shell with chsh to pick a a new shell. Too invasive for my tmux workflow. I still gave it a shot, and wrote a Python script that would be forward arguments to my shell (bash), and set that script as my login shell. Overall it worked better than expected, but it was a bit inconvenient.

I tried creating a single folder with symlinks to “white-listed” binaries, and use that single folder as the PATH. But it was a bit messy and didn’t work well.

Both options were inconvenient and incompatible with my workflow.

The Solution: DEBUG Trap

After the initial failed attempts, the answer came from diving deeper into bash itself.

Bash has a DEBUG trap that runs shellcode before every command executes. This was exactly what I needed - a way to intercept and filter commands at the shell level.

Perfect for intercepting and filtering commands. It includes the command in the $BASH_COMMAND variable to inspect the command that’s going to be executed.

trap 'python3 /some/script.py "$BASH_COMMAND" || exit 1' DEBUG

That line will run python3 /some/script.py "$BASH_COMMAND" before every command executes. If the script returns a non-zero exit code, the command will not be executed (because of the || exit 1 part).

How It Works

  1. Write a Python script that acts as a DEBUG trap
  2. The script receives $BASH_COMMAND as input
  3. Filter commands however you want:
    • Use an allowlist
    • Block dangerous commands
    • Even ask another LLM if the command is safe

The Command Monitor Script

Here’s the complete command-monitor script:

#!/usr/bin/env python3

import sys

# Use as:
# trap 'command-monitor "$BASH_COMMAND" || exit 1' DEBUG

# Allowed commands configuration
ALLOWED_COMMANDS = [
    "cat",
    "cut",
    "date",
    "df",
    "diff",
    "du",
    "echo",
    "file",
    "grep",
    "head",
    "ls",
    "rg",
    "sort",
]


def is_command_allowed(cmd):
    """Check if a command is allowed based on the allowlist."""
    cmd = cmd.strip()
    if not cmd:
        return False

    for allowed in ALLOWED_COMMANDS:
        if cmd.startswith(allowed):
            return True

    return False


def handle_command_check(cmd):
    """Check if command is allowed and exit parent shell if not."""
    if not is_command_allowed(cmd):
        print(
            f"[command-monitor] Command not allowed: {cmd}",
            file=sys.stderr,
        )
        # Exit with code 1 to signal parent shell to exit
        sys.exit(1)

    # Command is allowed, continue normally
    return 0


def main():
    """Main function to handle trapped bash commands."""

    # Get the command from the first argument (passed by the trap)
    if len(sys.argv) < 2:
        print(
            "[command-monitor] Error: No command provided",
            file=sys.stderr,
        )
        sys.exit(1)

    cmd = sys.argv[1]
    handle_command_check(cmd)
    # If we reach here, command is allowed - exit normally
    sys.exit(0)


if __name__ == "__main__":
    main()

The script is simple but effective. It checks each command against the allowlist using startswith() matching.

Setting It Up

The tricky part was making Codex use this setup.

I found the BASH_ENV variable. The $BASH_ENV environment variable is used by Bash to define an initialization file to read before executing a script. It’s like .bashrc but for non-interactive shells (exactly what Codex uses).

Here’s my workflow:

  1. Create a wrapper script for Codex (I already had one)
  2. Generate a BASH_ENV file on the fly with the DEBUG trap
  3. Set Codex’s shell policy to always use this variable
  4. The DEBUG trap kicks in for every command and the script checks the command against the allowlist

The Wrapper Configuration

The wrapper generates the BASH_ENV file with the DEBUG trap:

bash_env_content = (
    "trap 'command-monitor \"$BASH_COMMAND\" || exit 1' DEBUG\n"
)

bash_env_file.write_text(bash_env_content)

# ...
# Then pass it to Codex
# ...

args.extend([
    "--config",
    f'shell_environment_policy.set = {{ BASH_ENV = "{bash_env_file}" }}',
])

Results

It works great. Codex can execute commands freely, but I control what actually runs.

Commands get allowed or denied based on my rules. At this point, I could extend it even further to read the allowlist from a file, or use another small LLM to filter unsafe commands, or ask for command injection. Not that this makes things 100% safe, but it’s a good start.

Limitations

It’s not as polished as Claude Code’s approach. Claude Code lets you:

  • Whitelist specific commands
  • Ask for approval on unknown commands

You could hack this functionality into the Python script. I just haven’t implemented it yet.

Beyond Codex

The cool thing about this technique is that it works with any LLM that uses shell commands. Although you may need something different if the LLM agent uses bash differently.

Setup Options

To go one step further, you can use an environment variable to trigger the DEBUG trap.

  • Configure your LLM CLI agent to always set an environment variable
  • Check for that specific environment variable in the DEBUG trap
  • Only run the command filter when that var is set

For example, configure your LLM CLI agent to always set the environment variable INSIDE_LLM_AGENT=true. Then the DEBUG trap can look like:

trap 'if [ "$INSIDE_LLM_AGENT" = "true" ]; then python3 /some/script.py "$BASH_COMMAND" || exit 1; fi' DEBUG

The exact setup depends on the configuration options provided by whichever LLM CLI agent tool you’re using.