rand[om]

rand[om]

med ∩ ml

Use a subprocess instead of a dependency

Sometimes calling a subprocess is better than using a dependency/package. At least in Python, once you add a third-party dependency, distribution becomes slightly harder. I like writing automation scripts in a single .py file. If that script doesn’t use any third-party dependencies, distributing it is as easy as copying the file to the machine. Otherwise you need to package your project, deal with virtual environments, PyPi, pipx, etc. I don’t think all of those tasks are hard, but rsync‘ing a file is easier.

Here are a some examples and thoughts.

aws CLI

You want to do some AWS operations from inside the script. You know that the environment runnning the script will always have the aws CLI installed. Instead of using boto3, you can call the aws CLI directly.

Network requests

Let’s imagine that urllib doesn’t exist in the Python standard library, but you want to do a network request. You can install requests or httpx, but now your scripts requires pip install-ing a dependency, creating a virtual environment, etc. On the other hand, most systems have curl installed, by using a subprocess.run call to curl, we can have a script that can be distributed as-is, without the need to install any third party dependecies.

Reproducing errors

If your script is just calling a subprocess, it’s very easy to reproduce and share errors. You can share the command that the script is running. Now other people from your team can just run the command on their terminals to try to reproduce the error 1

Elasticsearch client

You have a script that needs to do a quick elasticsearch query (and you don’t really need the capabilities of the Python client). Instead of using the Elasticsearch Python client, you can use curl in a subprocess. E.g:

import json
import subprocess

ELASTICSEARCH_URL = "elasticsearch-cluster.foobar.io"
ELASTICSEARCH_INDEX = "my_index"

def get_by_id(elasticsearch_id):
    payload = {"query": {"terms": {"_id": [elasticsearch_id]}}}
    p = subprocess.run(
        [
            "curl",
            "-Ss",
            "-X",
            "GET",
            f"http://{ELASTICSEARCH_URL}:9200/{ELASTICSEARCH_INDEX}/_search",
            "-H",
            "Content-Type: application/json",
            "-d",
            json.dumps(payload),
        ],
        check=True,
        capture_output=True,
        text=True,
    )
    d = json.loads(p.stdout)

    return d["hits"]["hits"][0]["_source"]

  1. This requires some code in place so that the commands are properly logged in the script’s output. See Python automation utils ↩︎