LazyFile How To Guides¶
How to lazily extract a file from a remote wheel¶
The original motivation for this library was to extract the metadata file from a wheel, without downloading the whole wheel file (which could potentially be very large).
Wheel files are structured as zip files, and can be read using
the standard library zipfile
module. The ZipFile
constructor
takes a file-like object which must return bytes, and be seekable.
A LazyFile is ideal for this.
First, we need to implement methods to get the length of the file, and to get a block of bytes from the file.
from urllib.request import Request, urlopen
# Get the file size with a HEAD request
def content_len(url):
req = Request(url, method="HEAD")
with urlopen(req) as f:
return int(f.headers["Content-Length"])
# Get a block of bytes from the URL.
def get_url_range(url, lo, hi):
# Adjust the range, as hi is "past the end"
req = Request(url, headers={"Range": f"bytes={lo}-{hi-1}"})
with urlopen(req) as f:
data = f.read()
if len(data) != hi-lo:
raise ValueError(f"Failed to read {hi-lo} bytes")
return data
With these helpers, we can open a URL lazily
from lazyfile import LazyFile
from functools import partial
def open_url(url):
url_getter = partial(get_url_range, url)
file_size = content_len(url)
return LazyFile(file_size, url_getter)
And that’s all we need to process the file as a zipfile and extract the metadata
def extract_metadata(url)
f = open_url(url)
z = ZipFile(f)
for name in z.namelist():
if name.endswith(".dist-info/METADATA"):
metadata = z.read(name)
return metadata