rand[om]

rand[om]

med ∩ ml

Building HTML components from Python functions

This post could also be called “Or how to do React in Python” or “HTML as a function of state”.

Most people use templating libraries like jinja2 to render HTML. I think that’s probably the best way to do it in production. However, for very simple / internal / proof-of-concept apps, I wanted to generate the HTML directly from Python functions to avoid needing extra files. I tried using f-strings to do that, but it gets messy pretty quickly. I recently found a nice way to render HTML using lxml. As a nice side effect, the overall architecture is similar to React, where functions become UI components. At the same time, it allows rendering only individual components easily. This can be especially useful when used together with HTMX.

A basic component, rendering strings

lxml already comes with a class and some utilities to generate HTML elements and serialize them to a string.

from lxml.html import HtmlElement
from lxml.html import tostring
from lxml.html.builder import E as e

def s(tree: HtmlElement) -> str:
    """
    Serialize LXML tree to unicode string. Using DOCTYPE html.
    """
    return tostring(tree, encoding="unicode", doctype="<!DOCTYPE html>", pretty_print=True)


def head(title: str):
    return e.head(
        e.meta(charset="utf-8"),
        e.meta(name="viewport", content="width=device-width, initial-scale=1"),
        e.title(title),
    )

tree = head("Hello")
print(s(tree))

This will generate this HTML (in a real-life scenario, you can remove the pretty_print=True argument):

<!DOCTYPE html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Hello</title>
</head>

We now have a simple, but valid, HTML generated from Python objects.

Converting Python objects to HTML

Normally, you will have some kind of state or context and render the HTML based on that context. We can use any Python object to generate the HTML. Here, we will convert a list of elements to an <ul> element.

from lxml.html import HtmlElement
from lxml.html import tostring
from lxml.html.builder import E as e

def s(tree: HtmlElement) -> str:
    """
    Serialize LXML tree to unicode string. Using DOCTYPE html.
    """
    return tostring(tree, encoding="unicode", doctype="<!DOCTYPE html>", pretty_print=True)


def list_items(items: list[str]):
    return e.ul(*[e.li(item) for item in items])

tree = list_items(["foo", "bar", "baz"])
print(s(tree))

Which generates (we can ignore the DOCTYPE here):

<!DOCTYPE html>
<ul>
<li>foo</li>
<li>bar</li>
<li>baz</li>
</ul>

Creating our first view

Now we can create an index view with the <head> element, separated in a different function, and a list generated from a Python object. Here, I’m creating a FastAPI app to render the contents.

import random

import uvicorn
from fastapi import FastAPI
from fastapi.responses import HTMLResponse
from lxml.html import HtmlElement
from lxml.html import tostring
from lxml.html.builder import E as e

app = FastAPI()


def s(tree: HtmlElement) -> str:
    """
    Serialize LXML tree to unicode string. Using DOCTYPE html.
    """
    return tostring(tree, encoding="unicode", doctype="<!DOCTYPE html>")


def head(title: str):
    return e.head(
        e.meta(charset="utf-8"),
        e.meta(name="viewport", content="width=device-width, initial-scale=1"),
        e.title(title),
    )


def list_items(items: list[str]):
    return e.ul(*[e.li(item) for item in items])


def index(items: list[str]):
    return e.html(
        # generate <head> element by calling a python function
        head("Home"),
        e.body(
            e.h1("Hello, world!"),
            list_items(items),
        ),
    )


@app.get("/", response_class=HTMLResponse)
def get():
    items = [str(random.randint(0, 100)) for _ in range(10)]
    tree = index(items)
    html = s(tree)
    return html


if __name__ == "__main__":
    # run app with uvicorn
    uvicorn.run(
        f'{__file__.split("/")[-1].replace(".py", "")}:app',
        host="127.0.0.1",
        port=8000,
        reload=True,
        workers=1,
    )

After installing FastAPI, uvicorn and lxml, you can run your app (change file.py with the name of your Python script):

python3 file.py

And this is how it looks:

Adding more utilities

lxml comes with some functions to add attributes to elements, but I decided to write my own to have better ergonomics.

# handle some Python / HTML keywords.
def replace_attr_name(name: str) -> str:
    if name == "_class":
        return "class"
    elif name == "_for":
        return "for"
    return name


def ATTR(**kwargs):
    # Use str() to convert values to string. This way we can set boolean
    # attributes using True instead of "true".
    return {replace_attr_name(k): str(v) for k, v in kwargs.items()}

With those functions, we can now build elements like this:

e.html(
    ATTR(lang="en"),
    head("Hello"),
    e.body(
	    # we use `class` because `class` is a Python keyword
        e.main(ATTR(id="main", _class="container")),
    ),
)

Adding more components and state

We have all the basic pieces in place. We can start building more components and composing them together. In this example, instead of passing all the element arguments, I will generate a single state dictionary and pass it around 12. I will also add picocss to the <head> for styling. I will show all the code with some comments, and then we will look at specific parts:

import random

# import MappingProxyType for "frozen dict"
from types import MappingProxyType

import uvicorn
from fastapi import FastAPI
from fastapi.responses import HTMLResponse
from lxml.html import HtmlElement
from lxml.html import tostring
from lxml.html.builder import E as e

app = FastAPI()

# Type alias. State can be a dict or a MappingProxyType.
State = dict | MappingProxyType


def replace_attr_name(name: str) -> str:
    if name == "_class":
        return "class"
    elif name == "_for":
        return "for"
    return name


def ATTR(**kwargs):
    # Use str() to convert values to string. This way we can set boolean
    # attributes using True instead of "true".
    return {replace_attr_name(k): str(v) for k, v in kwargs.items()}


def s(tree: HtmlElement) -> str:
    """
    Serialize LXML tree to unicode string. Using DOCTYPE html.
    """
    return tostring(tree, encoding="unicode", doctype="<!DOCTYPE html>")


def base(*children: HtmlElement, state: State):
    return e.html(
        ATTR(lang="en"),
        head(state),
        e.body(
            e.main(ATTR(id="main", _class="container"), *children),
        ),
    )


def head(state: State):
    return e.head(
        e.meta(charset="utf-8"),
        e.title(state.get("title", "Home")),
        e.meta(name="viewport", content="width=device-width, initial-scale=1"),
        e.meta(name="description", content="Welcome."),
        e.meta(name="author", content="@polyrand"),
        e.link(
            rel="stylesheet",
            href="https://cdn.jsdelivr.net/npm/@picocss/pico@1/css/pico.min.css",
        ),
    )


def login_form(state: State):
    return e.article(
        ATTR(**{"aria-label": "log-in form"}),
        e.p(
            e.strong(ATTR(style="color: red"), "Wrong credentials!")
            if state.get("error")
            else f"{state.get('user', 'You')} will receive an email with a link to log in."
        ),
        e.form(
            e.label("Email", _for="email"),
            e.input(
                ATTR(
                    placeholder="Your email",
                    type="email",
                    name="email",
                    required=True,
                )
            ),
            e.button("Log In"),
            action="/login",
            method="post",
        ),
    )


def view_index(state: State):
    return base(
        e.section(
            e.h1("Page built using lxml"),
            e.p("This is some text."),
        ),
        list_items(state),
        login_form(state),
        state=state,
    )


def list_items(state: State):
    return e.ul(*[e.li(item) for item in state["items"]])


@app.get("/", response_class=HTMLResponse)
def idx(error: bool = False):
    items = [str(random.randint(0, 100)) for _ in range(4)]
    state = {
        "title": "Some title",
        "items": items,
        "user": "@polyrand",
    }
    if error:
        state["error"] = True
    tree = view_index(MappingProxyType(state))
    html = s(tree)
    return html


if __name__ == "__main__":
    uvicorn.run(
        f'{__file__.split("/")[-1].replace(".py", "")}:app',
        host="127.0.0.1",
        port=8000,
        reload=True,
        workers=1,
    )

Let’s look at some parts.

    return e.article(
        ATTR(**{"aria-label": "log-in form"}),
        e.p(
            e.strong(ATTR(style="color: red"), "Wrong credentials!")
            if state.get("error")
            else f"{state.get('user', 'You')} will receive an email with a link to log in."
        ),

Here we are setting the attribute aria-label="log-in form" on the element. Then, we will render text based on the state (see screenshots below).

    return base(
        e.section(
            e.h1("Page built using lxml"),
            e.p("This is some text."),
        ),
        list_items(state),
        login_form(state),
        state=state,
    )

Here we are rendering our base template and passing some child objects. Notice how each element is a Python function (list_items and login).

    tree = view_index(MappingProxyType(state))
    html = s(tree)

We use this code to render the HTML string. The best part about this is that we could render only the Log In form by using this code instead:

    tree = login_form(MappingProxyType(state))
    html = s(tree)

And now we can return partial chunks of HTML.

Here’s how the page looks now. The numbers should change every time you refresh it:

And if we add /?error=1 as a URL parameter, the state dictionary will contain "error": True, which should show a different message 3:

Escaping

When building HTML, you should be careful when passing user-generated data to the templates. You can use MarkupSafe to escape the HTML values you need. You could modify the lxml.html.builder.E class to escape all the string values 4. Jinja2 does not escape by default.

Architecture

At this point, there are different ways you could architect your Python-HTML components. For example, you put all the component functions inside a class. The class can then hold the state dictionary as an attribute. That way, you don’t have to pass it around. This allows keeping all the UI functions in a separate namespace while still being able to keep all the code in a single file 5. I built the same app using this approach; here is the source code.

Or maybe you want each function to explicitly list all the required arguments. Although this would probably turn into “Prop Drilling”, as it’s called in the React world.

Performance compared to Jinja2

I ran a simple benchmark that generates an HTML list based on a Python list. Using jinja2 was faster than using LXML, although the performance difference may not be as relevant compared to other parts of the application. Since jinja2 caches the templates after parsing them for the first time, I also benchmarked a function that re-creates the template every time it’s called (that’s what the LXML approach is doing). Then I also created a (not very convenient to use) function that uses LXML to generate the elements, but which caches each generated element after it’s first created.

These are the results:

Jinja
16.4 µs ± 51.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Jinja recreate template
353 µs ± 4.41 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

LXML
180 µs ± 744 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

LXML cached builder
22.2 µs ± 220 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Summary:

TechniqueAverage execution time (µs)
Jinja216.4
Jinja2 recreate353
LXML180
LXML cached22.2

jinja2 is definitely faster.

This post is mostly about sharing an easy approach I use to generate HTML from Python. For simple apps, I like it more than having jinja2 templates as strings, either inside my script or as separate files5. But since we are using LXML, which already builds an object tree for us, we could get a lot fancier and create some tree-diff function that only renders the modified elements, use the object tree to post-process certain elements before serializing them as a string, etc.

The LXML / Python functions approach to generating HTML makes it straightforward to do template fragments.

Some alternative tools I’ve seen (but haven’t tried) that do something similar:

  • py-htmltools from the Shiny team, looks a bit under-documented.
  • domonic. It seemed cool the first time I discovered it. I feel it comes with more features than I need.
  • hype-html

Update 1: I just learned about hotmeal and even though I haven’t tried it, I really like the approach, probably more than using LXML.

Update 2: I recently found out htpy, which also seems like a package I would use, especially since it gives you type-checked elements and setting attributes is easier than in lxml.

Update 3: After using htpy for a couple larger projects, I can say it’s my favourite tool so far.


  1. You can wrap the dictionary in a MappingProxyType to make it immutable. ↩︎

  2. This is similar to the context you usually pass when rendering a jinaj2 template. ↩︎

  3. This is one of the reasons I like returning just HTML. We can store a lot of things in the state dictionary to “declaratively” generate HTML, but then we just send HTML to the client. We don’t have to worry as much about the size of our state compared to other client-side approaches. See also the HTMX HATEOAS essay ↩︎

  4. Here’s a simple example I built that does that. ↩︎

  5. See Single file applications ↩︎ ↩︎