In June 2021 I purchased a shiny new Macbook Air M1, packing a 4 efficiency cores, 4 performance cores and 16GB of RAM. The performance is pretty dang incredible. It was a good choice to wait and buy, as everything was pretty much ported and working by the time I purchased it, and as of the end of 2022 almost everything is native ARM64.
At my day job, the data team primarily uses Python for our pipelines, as Python has some fantastic libraries which makes Data Engineering a lot easier, and we can focus on the business instead of building/maintaining our own internal tools.
However a pain point for M1 and Linux ARM64 is that wheels (what Python calls precompiled packages) don’t exist for a lot of “older” packages, or the maintainers just don’t want to add them.
With the release of Python 3.11, a lot more packages now support these platforms, but there are still a lot of packages that don’t; or have older versions that popular projects have pinned. For example, Dagster relies on grpcio, and it’s not a great install experience on Apple Silicon.
However having to constantly rebuild packages is such a PITA, especially as this can take time and the dependencies are annoying, and slows down our development process and our deployment CI process.
So during this holiday break, I decided to do something about it.
A builder and mirror for Python packages that don’t have wheels for platforms such as Apple Silicon (aarch64/ARM64, if you have an M1/M2 processor) or Linux ARM64 (AWS Graviton, etc), and to also back-build wheels for older Python and package versions.
The aim of this project is to provide reproducible builds using Github Actions, and then provide a mirror for them on python.build as well as on the releases page.
Right now I’m focusing on a small subset of packages, to ensure that the process works well.
How it works
Packages are built using Github Actions, using cibuildwheel to build the wheels for the following platforms:
- macOS x86_64 & arm64
And with the following planned platforms:
- Linux x86_64 & arm64
The runners are configured as self-hosted runners on dedicated machines; because it’s much faster to build these wheels on a dedicated machine compared to Hosted Github Actions. For example, a build of grpcio takes around 70 seconds compared to 20-25 minutes using Github Actions hosted runners.
Github Actions also has a 2000 minutes monthly limit. So this is a way to get around that.
So the python.build website is pretty straight forward, where we go through the following steps:
- Have a list of packages we support, using the Simple Repository API (grpcio, grpcio-tools, lxml)
- For wheels that have been built, it links to the package.
- For everything else, it links back to officially hosted file from pypi.org
- For packages we don’t support, 301 redirect to pypi.org’s simple API. (python.build/simple/numpy -> pypi.org/simple/numpy)
This allows setting python.build as a marker in Poetry, so packages on the defined platforms will download the wheels from python.build, and everything else downloads from pypi.org.
You can view full instructions, including setup instructions for Poetry on the README.
Thanks to pietrodn, who created a Github Action
for grpcio. This gave me the idea to create a generic repository that also mirrored it, so people can either download
the wheels from the mirror in their
pyproject.toml or vendor it themselves.