LazyFile How To Guides

How to lazily extract a file from a remote wheel

The original motivation for this library was to extract the metadata file from a wheel, without downloading the whole wheel file (which could potentially be very large).

Wheel files are structured as zip files, and can be read using the standard library zipfile module. The ZipFile constructor takes a file-like object which must return bytes, and be seekable. A LazyFile is ideal for this.

First, we need to implement methods to get the length of the file, and to get a block of bytes from the file.

from urllib.request import Request, urlopen

# Get the file size with a HEAD request
def content_len(url):
    req = Request(url, method="HEAD")
    with urlopen(req) as f:
        return int(f.headers["Content-Length"])

# Get a block of bytes from the URL.
def get_url_range(url, lo, hi):
    # Adjust the range, as hi is "past the end"
    req = Request(url, headers={"Range": f"bytes={lo}-{hi-1}"})
    with urlopen(req) as f:
        data = f.read()
        if len(data) != hi-lo:
            raise ValueError(f"Failed to read {hi-lo} bytes")

    return data

With these helpers, we can open a URL lazily

from lazyfile import LazyFile
from functools import partial

def open_url(url):
    url_getter = partial(get_url_range, url)
    file_size = content_len(url)
    return LazyFile(file_size, url_getter)

And that’s all we need to process the file as a zipfile and extract the metadata

def extract_metadata(url)
    f = open_url(url)
    z = ZipFile(f)
    for name in z.namelist():
        if name.endswith(".dist-info/METADATA"):
            metadata = z.read(name)
            return metadata