Use a subprocess instead of a dependency
Table of contents
Sometimes calling a subprocess is better than using a dependency/package. At least in Python, once you add a third-party dependency, distribution becomes slightly harder. I like writing automation scripts in a single .py
file. If that script doesn’t use any third-party dependencies, distributing it is as easy as copying the file to the machine. Otherwise you need to package your project, deal with virtual environments, PyPi, pipx, etc. I don’t think all of those tasks are hard, but rsync
‘ing a file is easier.
Here are a some examples and thoughts.
aws
CLI
You want to do some AWS operations from inside the script. You know that the environment runnning the script will always have the aws
CLI installed. Instead of using boto3
, you can call the aws
CLI directly.
Network requests
Let’s imagine that urllib
doesn’t exist in the Python standard library, but you want to do a network request. You can install requests
or httpx
, but now your scripts requires pip install
-ing a dependency, creating a virtual environment, etc. On the other hand, most systems have curl
installed, by using a subprocess.run
call to curl
, we can have a script that can be distributed as-is, without the need to install any third party dependecies.
Reproducing errors
If your script is just calling a subprocess, it’s very easy to reproduce and share errors. You can share the command that the script is running. Now other people from your team can just run the command on their terminals to try to reproduce the error 1
Elasticsearch client
You have a script that needs to do a quick elasticsearch query (and you don’t really need the capabilities of the Python client). Instead of using the Elasticsearch Python client, you can use curl
in a subprocess. E.g:
import json
import subprocess
ELASTICSEARCH_URL = "elasticsearch-cluster.foobar.io"
ELASTICSEARCH_INDEX = "my_index"
def get_by_id(elasticsearch_id):
payload = {"query": {"terms": {"_id": [elasticsearch_id]}}}
p = subprocess.run(
[
"curl",
"-Ss",
"-X",
"GET",
f"http://{ELASTICSEARCH_URL}:9200/{ELASTICSEARCH_INDEX}/_search",
"-H",
"Content-Type: application/json",
"-d",
json.dumps(payload),
],
check=True,
capture_output=True,
text=True,
)
d = json.loads(p.stdout)
return d["hits"]["hits"][0]["_source"]
This requires some code in place so that the commands are properly logged in the script’s output. See Python automation utils ↩︎