Writing Benchmarks#
Have an idea for how to speed up read or write from a local or remote NWB file?
This section explains how to write your own benchmark to prove it robustly across platforms, architectures, and environments.
Standard Prefixes#
Just like how pytest
automatically detects and runs any function or method leading with the keyphrase test_
, AirSpeed Velocity runs timing tests for anything prefixed with time_
, tracks peak memory via prefix peakmem_
, custom values, such as our functions for network traffic, with track_
(this must return the value being tracked), and many others. Check out the full listing in the primary AirSpeed Velocity documentation.
A single tracking function should perform only the minimal operations you wish to time. It is also capable of tracking only a single value. The philosophy for this is to avoid interference from cross-measurements; that is, the act of tracking memory of the operation may impact how much overall time it takes that process to complete, so you would not want to simultaneously measure both time and memory.
Note
One caveat with using track_
is that if you want it to return the custom values as samples
similar to the timing tests, it must be wrapped in a dictionary with required outer keywords samples
and number=None
.
Class Structure#
A single benchmark suite is a file within the benchmarks
folder. It contains one or more benchmark classes. It is not itself important that the word ‘Benchmark’ be in the name of the class; only the prefix on the function matters.
Every benchmark function has an attribute timeout
which specified the most number of seconds the process has to complete, otherwise it will count as a failure. The global value for this is set in the .asv.conf.json
file.
Similar to unittest.TestCase
classes, all benchmark classes have setup
and teardown
methods. Set any values that need to be established before the main benchmark cases run as attributes on the instance during setup
; likewise, if you save anything to disk and you don’t want it to persist, remove it in the teardown
step.
Timing benchmarks have several special attributes, the most important of which are rounds
and repeat
. All timing functions in a class can be repeated in round-robin fashion using rounds > 1
; the philsophy here is to ‘average out’ variation on the system over time and may not always be relevant to increase. Each function in a suite is repeated repeat
number of times to get an estimate of the standard deviation of the operation. Every function in the suite has at most timout
number of seconds to complete, otherwise it will count as a failure.
For timing functions, setup
and teardown
will be called before and after execution of every count of rounds
and repeat
for every tracking function (such as timing) in the class. setup
should therefore be as light as possible since it will be repeated so often, though sometimes even a minimal setup can still take time (such as reading a large remote NWB file using a suboptimal method). In some cases, setup_cache
is a method that can be defined, and runs only once per class to precompute some operation, such as the creation of a fake dataset for testing on local disk.
Note
Be careful to assign objects fetched by operations within the tracking functions; otherwise, you may unintentionally track the garbage collection step triggered when the reference count of the return value reaches zero in the namespace. For relatively heavy I/O operations this can be non-negligible.
Finally, you can leverage params
and param_names
on all benchmark types to perform a structured iteration over many inputs to the operations. param_names
is a list of length equal to the number of inputs you wish to pass to an operation. params
is a list of lists; the outer list being of equal length to the number of inputs, and each inner list being equal in length to the number of different cases to iterate over.
Note
This structure for params
can be very inconvenient to specify; if you desire a helper function that would instead take a flat list of dictionaries to serve as keyword arguments for all the iteration cases, please request it on our issues board.
For more advanced details, refer to the primary AirSpeed Velocity documentation.
Philosophy#
In the spirit of PEP8, it was decided from PRs 12, 19, 20, and 21 and ensuing meetings that we should adopt an explicit functionally-based approach to structuring these classes and their methods. This will help make the project much easier to understand for people outside the team and will even reduce the amount of time it takes our main developers to read benchmarks they have not seen before or have forgotten about over an extended period of time.
This approach means relying as little on inheritance and mixins as possible to reduce the amount of implicit knowledge required to understand a benchmark just by looking at it - instead, all instance methods of the benchmark class should be explicitly defined.
To reduce duplicated code, it is suggested to write standalone helper functions in the core
submodule and then call those functions within the benchmarks. This does mean that some redirection is still required to understand exactly how a given helper function operates, but this was deemed worth it to keep the actual size of benchmarks from inflating.
An example of this philosophy in practice would be as follows. In this example we wish to test how long it takes to both read a small remote NWB file (from the s3_url
) using the remfile
method…
from nwb_benchmarks.core import read_hdf5_nwbfile_remfile
class NWBFileReadBenchmark:
param_names = ["s3_url"]
params = [
"https://dandiarchive.s3.amazonaws.com/ros3test.nwb", # The original small test NWB file
]
def time_read_hdf5_nwbfile_remfile(self, s3_url: str):
self.nwbfile, self.io, self.file, self.bytestream = read_hdf5_nwbfile_remfile(s3_url=s3_url)
as well as how long it takes to slice ~20 MB of data from the contents of a remote NWB file that has a large amount of series data…
from nwb_benchmarks.core import get_s3_url, read_hdf5_nwbfile_remfile
class RemfileContinuousSliceBenchmark:
param_names = ["s3_url", "object_name", "slice_range"]
params = (
[
get_s3_url( # Yet another helper function for making the NWB file input easier to read
dandiset_id="000717",
dandi_path="sub-IBL-ecephys/sub-IBL-ecephys_ses-3e7ae7c0_desc-18000000-frames-13653-by-384-chunking.nwb",
)
],
["ElectricalSeriesAp"],
[(slice(0, 30_000), slice(0, 384))], # ~23 MB
)
def setup(self, s3_url: str, object_name: str, slice_range: Tuple[slice]):
self.nwbfile, self.io, self.file, self.bytestream = read_hdf5_nwbfile_remfile(s3_url=s3_url)
self.neurodata_object = get_object_by_name(nwbfile=self.nwbfile, object_name="ElectricalSeriesAp")
self.data_to_slice = self.neurodata_object.data
def time_slice(self, s3_url: str, object_name: str, slice_range: Tuple[slice]):
"""Note: store as self._temp to avoid tracking garbage collection as well."""
self._temp = self.data_to_slice[slice_range]
Notice how the read_hdf5_nwbfile_remfile
function (which reads an HDF5-backend pynwb.NWBFile
object into memory using the remfile
method) was used as both the main operation being timed in the first case, then reused in the setup
of the of the second. By following the redirection of the function to its definition, we find it is itself a compound of another helper function for remfile
usage…
# In nwb_benchmarks/core/_streaming.py
def read_hdf5_remfile(s3_url: str) -> Tuple[h5py.File, remfile.File]:
"""Load the raw HDF5 file from an S3 URL using remfile; does not formally read the NWB file."""
byte_stream = remfile.File(url=s3_url)
file = h5py.File(name=byte_stream)
return (file, byte_stream)
def read_hdf5_nwbfile_remfile(s3_url: str) -> Tuple[pynwb.NWBFile, pynwb.NWBHDF5IO, h5py.File, remfile.File]:
"""Read an HDF5 NWB file from an S3 URL using the ROS3 driver from h5py."""
(file, byte_stream) = read_hdf5_remfile(s3_url=s3_url)
io = pynwb.NWBHDF5IO(file=file, load_namespaces=True)
nwbfile = io.read()
return (nwbfile, io, file, byte_stream)
and so we managed to save ~5 lines of code for every occurrence of this logic in the benchmarks. Good choices of function names are critical to effectively communicating the actions being undertaken. Thorough annotation of signatures is likewise critical to understanding input/output relationships for these functions.
Writing a network tracking benchmark#
Functions that require network access —such as reading a file from S3— are often a black box, with functions in other libraries (e.g., h5py
, fsspec
, etc.) managing the access to the remote resources. The runtime performance of such functions is often inherently driven by how these functions utilize the network to access the resources. It is, hence, important that we can profile the network traffic that is being generated to better understand, e.g., the amount of data that is being downloaded and uploaded, the number of requests that are being sent/received, and others.
To simplify the implementation of benchmarks for tracking network statistics, we implemented in the nwb_benchmarks.core
module various helper classes and functions. The network tracking functionality is designed to track the network traffic generated by the main Python process that our tests are running during a user-defined period of time. The network_activity_tracker
context manager can be used to track the network traffic generated by the code within the context. A basic network benchmark, then looks as follows:
from nwb_benchmarks import TSHARK_PATH
from nwb_benchmarks.core import network_activity_tracker
import requests # Only used here for illustration purposes
class SimpleNetworkBenchmark:
def track_network_activity_uri_request():
with network_activity_tracker(tshark_path=TSHARK_PATH) as network_tracker:
x = requests.get('https://nwb-benchmarks.readthedocs.io/en/latest/setup.html')
return network_tracker.asv_network_statistics
In cases where a context manager may not be sufficient, we can alternatively use the NetworkTracker
class directly to explicitly control when to start and stop the tracking.
from nwb_benchmarks import TSHARK_PATH
from nwb_benchmarks.core import NetworkTracker
import requests # Only used here for illustration purposes
class SimpleNetworkBenchmark:
def track_network_activity_uri_request():
tracker = NetworkTracker()
tracker.start_network_capture(tshark_path=TSHARK_PATH)
x = requests.get('https://nwb-benchmarks.readthedocs.io/en/latest/setup.html')
tracker.stop_network_capture()
return tracker.asv_network_statistics
By default, the NetworkTracker
and network_activity_tracker
track the network activity of the current process ID (i.e., os.getpid()
), but the PID to track can also be set explicitly if a different process needs to be monitored.