Miscellaneous routines in scipy included convenient functions. This article cover the removal of this feature.
Table of content
- Why was
_Miscellaneous routines_removed from
- A Very Brief Introduction to
- What does this implement/fix ?
- Pooch and scipy.datasets partnership
_Miscellaneous routines_ removed from
Back in the olden days the Miscellaneous routines of Scipy used to have some importance. But in 2022 it only has five methods
electrocardiogram. Most of these methods have moved under some other module/submodule and many Users have complaints about the usefulness/computational inefficiencies of this submodule. For example:
” Stephan Hoyer I would vote for removing them entirely, I haven’t used either of them, it just came up in a search for finite differences in Python”
scipy.misc is a submodule with five methods only. This increases the package size of the Library and comprises optimization, an overall decrease in the processing speed of other methods.
Considering all the reasons given above people were now Frustrated with this! ↓
in 2018 An enthusiast Warren Weckesser created a pull request to remove
.miscfrom SciPy And introduce a new Idea
scipy.datasetsfor some unfortunate reasons it did not proceed. ↓ This year Anirudh Dagar Picked up this idea and eventually convinced the Scipy Maintainers to add
scipy.datasetsas a submodule (which includes all methods of
A Very Brief Introduction to
>>> from scipy import datasets # Example ascent dataset loading with the new module >>> datasets.ascent() array([[ 83, 83, 83, ..., 117, 117, 117], [ 82, 82, 83, ..., 117, 117, 117], [ 80, 81, 83, ..., 117, 117, 117], ..., [178, 178, 178, ..., 57, 59, 57], [178, 178, 178, ..., 56, 57, 57], [178, 178, 178, ..., 57, 57, 58]])
What does this implement/fix ?
With gh-8707, in 2018, SciPy wanted to introduce the
datasets submodule and move a handful of dataset functions from the current
misc module to this new
datasets submodule. A Big Thanks to @WarrenWeckesser for discovering this idea. With this PR (indeed inspired by gh-8707) they (Anirudh and Ralf) resume those efforts after making some improvements (explained below) and move away from the
scipy.misc module, finally deprecating it in a separate PR #15901)
- Utilize pooch to handle the dataset downloading and caching.
- Enable meson support for
- Move all dataset files (eg.
scipy.statshas its own test datasets within the repository) to their respective new repository (explained below). This is something that can be done after landing this PR once we have a concrete datasets API and approach defined for adding new datasets.
- Deprecate the misc module (DEP: Deprecate scipy.misc in favour of scipy.datasets #15901)
Pooch and scipy.datasets partnership
Pooch manages data registrations by downloading your data files from a server only when needed and storing them locally in a data cache (a folder on your computer).
Poochyou can easily decouple the datasets that are currently present within the
SicPyrepository and move them to their new repository. For example, see https://github.com/scipy-datasets, where each dataset has its own repository. This will lead to a lightweight
SciPyPackage decreasing the download size for future releases. Keeping the datasets in individual repositories or a single
scipy-datasetsrepository is a point of discussion.....
- Dependency: Pooch is an extremely light package and has only a few dependencies, so if you were to add a new dependency i.e. Pooch, you can expect it to be small and at the same time it won’t add a lot of sub-dependencies.