A very well written article! I admire the analysis done by the author regarding the difficulties of Python packaging.
With the advent of uv, I'm finally feeling like Python packaging is solved. As mentioned in the article, being able to have inline dependencies in a single-file Python script and running it naturally is just beautiful.
#!/usr/bin/env -S uv run
# /// script
# dependencies = ['requests', 'beautifulsoup4']
# ///
import requests
from bs4 import BeautifulSoup
After being used to this workflow, I have been thinking that a dedicated syntax for inline dependencies would be great, similar to JavaScript's `import ObjectName from 'module-name';` syntax. Python promoted type hints from comment-based to syntax-based, so a similar approach seems feasible.
> It used to be that either you avoided dependencies in small Python script, or you had some cumbersome workaround to make them work for you. Personally, I used to manage a gigantic venv just for my local scripts, which I had to kill and clean every year.
I had the same fear for adding dependencies, and did exactly the same thing.
> This is the kind of thing that changes completely how you work. I used to have one big test venv that I destroyed regularly. I used to avoid testing some stuff because it would be too cumbersome. I used to avoid some tooling or pay the price for using them because they were so big or not useful enough to justify the setup. And so on, and so on.
One other key part of this is freezing a timestamp with your dependency list, because Python packages are absolutely terrible at maintaining compatibility a year or three or five later as PyPI populates with newer and newer versions. The special toml incantation is [tool.uv] exclude-newer:
This has also let me easily reconstruct some older environments in less than a minute, when I've been version hunting for 30-60 minutes in the past. The speed of uv environment building helps a ton too.
Maybe I'm missing something, but why wouldn't you just pin to an exact version of `requests` (or whatever) instead? I think that would be equivalent in practice to limiting resolutions by release date, except that it would express your intent directly ("resolve these known working things") rather than indirectly ("resolve things from when I know they worked").
Pinning deps is a good thing, but it won't necessarily solve the issue of transitive dependencies (ie: the dependencies of requests itself for example), which will not be pinned themselves, given you don't have a lock file.
To be clear, a lock file is strictly the better option—but for single file scripts it's a bit overkill.
If there's a language that does this right, all ears? But I havn't seen it -
The use case described is for a small one off script for use in CI, or a single file script you send off to a colleague over Slack. Very, very common scenario for many of us. If your script depends on
a => c
b => c
You can pin versions of those direct dependencies like "a" and "b" easy enough, but 2 years later you may not get the same version of "c", unless the authors of "a" and "b" handle their dependency constraints perfectly. In practice that's really hard and never happens.
The timestamp appraoch described above isn't perfect, but would result in the same dep graph, and results, 99% of the time..
Try Scala with an Ammonite script like https://ammonite.io/#ScalaScripts . The JVM ecosystem does dependencies right, there's no need to "pin" in the first place because dependency resolution is deterministic to start with. (Upgrading to e.g. all newer patch versions of your current dependencies is easy, but you have to make an explicit action to do so, it will never happen "magically")
> 1 file, 2 files, N files, why does it matter how many files?
One file is better for sharing than N, you can post it in a messenger program like Slack and easily copy-and-paste (while this becomes annoying with more than one file), or upload this somewhere without needing to compress, etc.
> I can't think of any other language where "I want my script to use dependencies from the Internet, pinned to precise versions" is a thing.
This is the same issue you would have in any other programming language. If it is fine for possibly having breakage in the future you don't need to do it, but I can understand the use case for it.
I think it's a general principle across all of software engineering that, when given the choice, fewer disparate locations in the codebase need to have correlated changes.
Documentation is hard enough, and that's often right there at exactly the same location.
I'm not a python packaging expert or anything but an issue I run into with lock files is they can become machine dependent (for example different flavors of torch on some machines vs others).
One could indicate implicit time-based pinning of transitive dependencies, using the time point at which the dependended-on versions were released. Not a perfect solution, but it's a possible approach.
I think OP was saying to look at when the package was build instead of explicitly adding a timestamp. Of course, this would only work if you speficied `requests@1.2.3` instead of just `requests`.
This looks like a good strategy, but I wouldn't want it by default since it would be very weird to suddenly having a script pull dependencies from 1999 without explanation why.
Except at least for the initial run, the date-based approach is the one closer to my intent, as I don't know what specific versions I need, just that this script used to work around specific date.
Well, of course you should, but it’s easy to forget as it’s not required. It also used to be recommended to not save it, so some people put it in their gitignore.
For completeness, there's also a script.py.lock file that can be checked into version controls but then you have twice as many files to maintain, and potentially lose sync as people forget about it or don't know what to do with it.
A major part of the point of PEP 723 (and the original competing design in PEP 722) is that the information a) is contained in the same physical file and b) can be produced by less sophisticated users.
Where would binary search come into it? In the example, the version solver just sees the world as though no versions released after `2023-10-16T00:00:00Z` existed.
My feeling sadly is because uv is the new thing, it hasn't had to handle anything but the common cases. This kinda gets a mention in the article, but is very much glossed over. There are still some sharp edges, and assumptions which aren't true in general (but are for the easy cases), and this only going to make things worse, because now there's a new set of issues people run into.
PEP 751 is defining a new lockfile standard for the ecosystem, and tools including uv look committed to collaborating on the design and implementing whatever results. From what I've been able to tell of the surrounding discussion, the standard is intended to address this use case - rather, to be powerful enough that tools can express the necessary per-architecture locking.
The point of the PEP 723 comment style in the OP is that it's human-writable with relatively little thought. Cases like yours are always going to require actually doing the package resolution ahead of time, which isn't feasible by hand. So a separate lock file is necessary if you want resolved dependencies.
If you use this kind of inline script metadata and just specify the Python dependency version, the resolution process is deferred. So you won't have the same kind of control as the script author, but instead the user's tooling can automatically do what's needed for the user's machine. There's inherently a trade-off there.
I think this is an awesome feature and will probably a great alternative to my use of nix to do similar things for scripts/python if nothing else because it's way less overhead to get it running and playing with something.
Nix for all it's benefits here can be quite slow and make it otherwise pretty annoying to use as a shebang in my experience versus just writing a package/derivation to add to your shell environment (i.e. it's already fully "built" and wrapped. but also requires a lot more ceremony + "switching" either the OS or HM configs).
`nix-shell` (that is what the OP seems to be referring) is always slow-ish (not really that slow if you are used with e.g.: Java CLI commands, but definitely slower than I would like) because it doesn't cache evaluations AFAIK.
Flakes has caching but support for `nix shell` as shebang is relatively new (nix 2.19) and not widespread.
Agreed. I did the exact same thing with that giant script venv and it was a constant source of pain because some scripts would require conflicting dependencies. Now with uv shebang and metadata, it’s trivial.
Before uv I avoided writing any scripts that depended on ML altogether, which is now unlocked.
You know what we need? In both python and JS, and every other scripting language, we should be able to import packages from a url, but with a sha384 integrity check like exists in HTML. Not sure why they didn't adopt this into JS or Deno.
Otherwise installing random scripts is a security risk
Python has fully-hashed requirements[1], which is what you'd use to assert the integrity of your dependencies. These work with both `pip` and `uv`. You can't use them to directly import the package, but that's more because "packages" aren't really part of Python's import machinery at all.
(Note that hashes themselves don't make "random scripts" not a security risk, since asserting the hash of malware doesn't make it not-malware. You still need to establish a trust relationship with the hash itself, which decomposes to the basic problem of trust and identity distribution.)
Right, still a security risk, but at least if I come back to a project after a year or two I can know that even if some malicious group took over a project, they at least didn't backport a crypto-miner or worse into my script.
The code that you obtain for a Python "package" does not have any inherent mapping to a "package" that you import in the code. The name overload is recognized as unfortunate; the documentation writing community has been promoting the terms "distribution package" and "import package" as a result.
While you could of course put an actual Python code file at a URL, that wouldn't solve the problem for anything involving compiled extensions in C, Fortran etc. You can't feasibly support NumPy this way, for example.
That said, there are sufficient hooks in Numpy's `import` machinery that you can make `import foo` programmatically compute a URL (assuming that the name `foo` is enough information to determine the URL), download the code and create and import the necessary `module` object; and you can add this with appropriate priority to the standard set of strategies Python uses for importing modules. A full description of this process is out of scope for a HN comment, but relevant documentation:
Anything that makes it easier to make a script that I wrote run on a colleagues machine without having to give them a 45 minute crash course of the current state of python environment setup and package management is a huge win in my book.
There's about 50 different versions of "production" for Python, and if this particular tool doesn't appear useful to it, you're probably using Python in a very very different way than those of us who find it useful. One of the great things about Python is that it can be used in such diverse ways by people with very very very different needs and use cases.
What does "production" look like in your environment, and why would this be terrible for it?
> As mentioned in the article, being able to have inline dependencies in a single-file Python script and running it naturally is just beautiful.
The syntax for this (https://peps.python.org/pep-0723/) isn't uv's work, nor are they first to implement it (https://iscinumpy.dev/post/pep723/). A shebang line like this requires the tool to be installed first, of course; I've repeatedly heard about how people want tooling to be able to bootstrap the Python version, but somehow it's not any more of a problem for users to bootstrap the tooling themselves.
And some pessimism: packaging is still not seen as the core team's responsibility, and uv realistically won't enjoy even the level of special support that Pip has any time soon. As such, tutorials will continue to recommend Pip (along with inferior use patterns for it) for quite some time.
> I have been thinking that a dedicated syntax for inline dependencies would be great, similar to JavaScript's `import ObjectName from 'module-name';` syntax. Python promoted type hints from comment-based to syntax-based, so a similar approach seems feasible.
First off, Python did no such thing. Type annotations are one possible use for an annotation system that was added all the way back in 3.0 (https://peps.python.org/pep-3107/); the original design explicitly contemplated other uses for annotations besides type-checking. When it worked out that people were really only using them for type-checking, standard library support was added (https://peps.python.org/pep-0484/) and expanded upon (https://peps.python.org/pep-0526/ etc.); but this had nothing to do with any specific prior comment-based syntax (which individual tools had up until then had to devise for themselves).
Python doesn't have existing syntax to annotate import statements; it would have to be designed specifically for the purpose. It's not possible in general (as your example shows) to infer a PyPI name from the `import` name; but not only that, dependency names don't map one-to-one to imports (anything that you install from PyPI may validly define zero or more importable top-level names, and of course the code might directly use a sub-package or an attribute of some module (which doesn't even have to be a class). So there wouldn't be a clear place to put such names except in a separate block by themselves, which the existing comment syntax already does.
Finally, promoting the syntax to an actual part of the language doesn't seem to solve a problem. Using annotations instead of comments for types allows the type information to be discovered at runtime (e.g. through the `__annotations__` attribute of functions). What problem would it solve for packaging? It's already possible for tools to use a PEP 723 comment, and it's also possible (through the standard library - https://docs.python.org/3/library/importlib.metadata.html) to introspect the metadata of installed packages at runtime.
Any flow that does not state checksums/hashsums is not ready for production and all but beautiful. But I haven't used uv yet, so maybe it is possible to specify the dependencies with hashsums in the same file too?
Actually the order of import statement is one of the things, that Python does better than JS. It makes completions much less costly to calculate when you type the code. An IDE or other tool only has to check one module or package for its contents, rather than whether any module has a binding of the name so and so. If I understand correctly, you are talking about an additional syntax though.
When mentioning a gigantic venv ... Why did they do that? Why not have smaller venvs for separate projects? It is really not that hard to do and avoids dependency conflicts between projects, which have nothing to do with each other. Using one giant venv is basically telling me that they either did not understand dependency conflicts, or did not care enough about their dependencies, so that one script can run with one set of dependencies one day, and another set of deps the other day, because a new project's deps have been added to the mix in the meantime.
Avoiding deps for small scripts is a good thing! If possible.
To me it just reads like a user now having a new tool allowing them to continue the lazy ways of not properly managing dependencies. I mean all deps in one huge venv? Who does that?? No wonder they had issues with that. Can't even keep deps separated, let alone properly having a lock file with checksums. Yeah no surprise they'll run into issues with that workflow.
And while we are relating to the JS world: One may complain in many ways about how NPM works, but it has had automatic lock file for aaages. Being the default tool in the ecosystem. And its competitors had it to. At least that part they got right for a long time, compared to pip, which does nothing of the sort eithout extra effort.
Why did they do that? Why not have smaller venvs for separate projects?
What's a 'project'? If you count every throw away data processing script and one off exploratory Jupyter notebook, that can easily be 100 projects. Certainly before uv, having one huge venv or conda environment with 'everything' installed made it much faster and easier to get that sort of work done.
In what kind of scope are these data processing scripts? In some kind of pipeline used in production I very much would expect them to have reproducible dependencies.
I can understand it for exploratory Jupyter Notebook. But only in the truly exploratory stage. Say for example you are writing a paper. Reproducibility crisis. Exploring is fine, but when it gets to actually writing the paper, one needs to make ones setup reproducible, or lose credibility right away. Most academics are not aware of, or don't know how to, or don't care to, make things reproducible, leading to non-reproducible research.
I would be lying, if I claimed, that I personally always set up a lock file with hashsums for every script. Of course there can be scripts and things we care so little about, that we don't make them reproducible.
For the (niche) Python library that I co-develop, we use this for demo scripts that live in an example/ directory in our repo. These scripts will never be run in production, but it’s nice to allow users to try them out and get a feel for how the library works before committing to installing dependencies and setting up a virtual environment.
In other words, of course, in most long-term cases, it’s better to create a real project - this is the main uv flow for a reason. But there’s value in being able to easily specify requirements for quick one-off scripts.
> Why not have smaller venvs for separate projects?
Because they are annoying and unnecessary additional work. If I write something, I won't know the dependencies in the beginning. And if it's a personal tool/script or even a throwaway one-shoot, then why bother with managing unnecessary parts? I just manage my personal stack of dependencies for my own tools in a giant env, and pull imports from it or not, depending on the moment. This allows me to move fast. Of course it is a liability, but not one which usually bites me. Every some years, some dependency goes wrong, and I either fix it or remove it, but at the end the benefit I save in time far outweighs the time I would lose from micromanaging small separate envs.
Managing dependencies is for production and important things. Big messy envs is good enough for everything else. I have hundred of script and tools, micromanaging them on that level has no benefit. And it seems uv now offers some options for making small envs effortless without costing much time, so it's a net benefit in that area, but it's not something world shattering which will turn my world upside down.
Well, if generating a lock file and installing dependencies is "unnecessary" then you obviously don't have any kind of need to be production ready project. Any project serious about managing its dependencies will mandate hashsums for each dependency, to avoid things having issues a week or a month later, without change to the project.
If you do have a project that needs to manage its dependencies well and you still don't store hashsums and use them to install your dependencies based on them, then basically you forfeit any credibility when complaining about things going wrong with regard to bugs happening or changed behavior of the code without changing the code itself and similar things.
This can be all fine, if it is just your personal project, that gets shit done. I am not saying you must properly manage dependencies for such a personal project. Just not something ready for production.
I for one find it quite easy to make venvs per project. I have my Makefiles, which I slightly adapt to the needs of the project and then I run 1 single command, and get all set up with dependencies in a project specific venv, hashsums, reproducibility. Not that much to it really, and not at all annoying to me. Also can be sourced from any other script when that script uses another project. Could also use any other task runner thingy, doesn't have to be GNU Make, if one doesn't like it.
>Any flow that does not state checksums/hashsums is not ready for production
It's not designed nor intended for such. There are tons of Python users out there who have no concept of what you would call "production"; they wrote something that requires NumPy to be installed and they want to communicate this as cleanly and simply (and machine-readably) as possible, so that they can give a single Python file to associates and have them be able to use it in an appropriate environment. It's explicitly designed for users who are not planning to package the code properly in a wheel and put it up on PyPI (or a private index) or anything like that.
>and all but beautiful
De gustibus non est disputandum. The point is to have something simple, human-writable and machine-readable, for those to whom it applies. If you need to make a wheel, make one. If you need a proper lock file, use one. Standardization for lock files is finally on the horizon in the ecosystem (https://peps.python.org/pep-0751/).
>Actually the order of import statement is one of the things, that Python does better than JS.
Import statements are effectively unrelated to package management. Each installed package ("distribution package") may validly define zero or more top-level names (of "import packages") which don't necessarily bear any relationship to each other, and the `import` syntax can validly import one or more sub-packages and/or attributes of a package or module (a false distinction, anyway; packages are modules), and rename them.
>An IDE or other tool only has to check one module or package for its contents
The `import` syntax serves these tools by telling them about names defined in installed code, yes. The PEP 723 syntax is completely unrelated: it tells different tools (package managers, environment managers and package installers) about names used for installing code.
>Why not have smaller venvs for separate projects? It is really not that hard to do
It isn't, but it introduces book-keeping (Which venv am I supposed to use for this project? Where is it? Did I put the right things in it already? Should I perhaps remove some stuff from it that I'm no longer using? What will other people need in a venv after I share my code with them?) that some people would prefer to delegate to other tooling.
Historically, creating venvs has been really slow. People have noticed that `uv` solves this problem, and come up with a variety of explanations, most of which are missing the mark. The biggest problem, at least on Linux, is the default expectation of bootstrapping Pip into the new venv; of course uv doesn't do this by default, because it's already there to install packages for you. (This workflow is equally possible with modern versions of Pip, but you have to know some tricks; I describe some of this in https://zahlman.github.io/posts/2025/01/07/python-packaging-... . And it doesn't solve other problems with Pip, of course.) Anyway, the point is that people will make single "sandbox" venvs because it's faster and easier to think about - until the first actual conflict occurs, or the first attempt to package a project and accurately convey its dependencies.
> Avoiding deps for small scripts is a good thing! If possible.
I'd like to agree, but that just isn't going to accommodate the entire existing communities of people writing random 100-line analysis scripts with Pandas.
>One may complain in many ways about how NPM works, but it has had automatic lock file for aaages.
Cool, but the issues with Python's packaging system are really not comparable to those of other modern languages. NPM isn't really retrofitted to JavaScript; it's retrofitted to the Node.JS environment, which existed for only months before NPM was introduced. Pip has to support all Python users, and Python is about 18 years older than Pip (19 years older than NPM). NPM was able to do this because Node was a new project that was being specifically designed to enable JavaScript development in a new environment (i.e., places that aren't the user's browser sandbox). By contrast, every time any incremental improvement has been introduced for Python packaging, there have been massive backwards-compatibility concerns. PyPI didn't stop accepting "egg" uploads until August 1 2023 (https://blog.pypi.org/posts/2023-06-26-deprecate-egg-uploads...), for example.
But more importantly, npm doesn't have to worry about extensions to JavaScript code written in arbitrary other languages (for Python, C is common, but by no means exclusive; NumPy is heavily dependent on Fortran, for example) which are expected to be compiled on the user's machine (through a process automatically orchestrated by the installer) with users complaining to anyone they can get to listen (with no attempt at debugging, nor at understanding whose fault the failure was this time) when it doesn't work.
There are many things wrong with the process, and I'm happy to criticize them (and explain them at length). But "everyone else can get this right" is usually a very short-sighted line of argument, even if it's true.
> It's not designed nor intended for such. There are tons of Python users out there who have no concept of what you would call "production"; they wrote something that requires NumPy to be installed and they want to communicate this as cleanly and simply (and machine-readably) as possible, so that they can give a single Python file to associates and have them be able to use it in an appropriate environment. It's explicitly designed for users who are not planning to package the code properly in a wheel and put it up on PyPI (or a private index) or anything like that.
Thus my warning about its use. And we as a part of the population need to learn and be educated about dependency management, so that we do not keep running into the same issues over and over again, that come through non-reproducible software.
> Import statements are effectively unrelated to package management. Each installed package ("distribution package") may validly define zero or more top-level names (of "import packages") which don't necessarily bear any relationship to each other, and the `import` syntax can validly import one or more sub-packages and/or attributes of a package or module (a false distinction, anyway; packages are modules), and rename them.
I did not claim them to be related to package management, and I agree. I was making an assertion, trying guess the meaning of what the other poster wrote about some "import bla from blub" statement.
> The `import` syntax serves these tools by telling them about names defined in installed code, yes. The PEP 723 syntax is completely unrelated: it tells different tools (package managers, environment managers and package installers) about names used for installing code.
If you had read my comment a bit more closely, you would have seen, that this is the assertion I made one phrase later.
> It isn't, but it introduces book-keeping (Which venv am I supposed to use for this project? Where is it? Did I put the right things in it already? Should I perhaps remove some stuff from it that I'm no longer using? What will other people need in a venv after I share my code with them?) that some people would prefer to delegate to other tooling.
I understand that. The issue is, that people keep complaining about things that can be solved in rather simple ways. For example:
> Which venv am I supposed to use for this project?
Well, the one in the directory of the project, of course.
> Where is it?
In the project directory of course.
> Did I put the right things in it already?
If it exists, it should have the dependencies installed. If you change the dependencies, then update the venv right away. You are always in a valid state this way. Simple.
> Should I perhaps remove some stuff from it that I'm no longer using?
That is done in the "update the venv" step mentioned above. Whether you delete the venv and re-create it, or have a dependency managing tool, that removes unused dependencies, I don't care, but you will know it, when you use such a tool. If you don't use such a tool, just recreate the venv. Nothing complicated so far.
> What will other people need in a venv after I share my code with them?
One does not share a venv itself, one shares the reproducible way to recreate it on another machine. Thus others will have just what you have, once they create the same venv. Reproducibility is key, if you want your code to run elsewhere reliably.
All of those have rather simple answers. I grant, some of these answers one learns over time, when dealing with these questions many times. However, none of it must be made difficult.
> I'd like to agree, but that just isn't going to accommodate the entire existing communities of people writing random 100-line analysis scripts with Pandas.
True, but those have apparently a need to have Pandas. Then it cannot be avoided to install dependencies. Then it depends on whether their stuff is one-off stuff, that no one will ever need to run again later, or part of some need to be reliable pipeline. The use-case changes the requirements with regard to reproducibility.
---
About the NPM - PIP comparison. Sure there may be differences. None of those however justify not having hashsums of dependencies where they can be had. And if there is a C thing? Well, you will still download that in some tarball or archive when you install it as a dependency. Easy to get a checksum of that. Store the checksum.
I was merely pointing out a basic facility of NPM, that is there for as long as I remember using NPM, that is still not existent with PIP, except for using some additional packages to facilitate it (I think hashtools or something like that was required). I am not holding up NPM as the shining star, that we all should follow. It has its own ugly corners. I was pointing out that specific aspect of dependency management. Any artifact downloaded from anywhere one can calculate the hashes of. There are no excuses for not having the hashes of artifacts.
That Pip is 19 years older than NPM doesn't have to be a negative. Those are 19 years more time to have worked on the issues as well. In those 19 years no one had issues with non-reproducible builds? I find that hard to believe. If anything the many people complaining about not being able to install some dependency in some scenario tell us, that reproducible builds are key, to avoid these issues.
>I did not claim them to be related to package management, and I agree.
Sure, but TFA is about installation, and I wanted to make sure we're all on the same page.
>I understand that. The issue is, that people keep complaining about things that can be solved in rather simple ways.
Can be. But there are many comparably simple ways, none of which is obvious. For example, using the most basic level of tooling, I put my venvs within a `.local` directory` which contains other things I don't want to put in my repo nor mention in .gitignore. Other workflow managers put them in an entirely separate directory and maintain their own mapping.
>Whether you delete the venv and re-create it, or have a dependency managing tool, that removes unused dependencies, I don't care, but you will know it, when you use such a tool.
Well, yes. That's the entire point. When people are accustomed to using a single venv, it's because they haven't previously seen the point of separating things out. When they realize the error of their ways, they may "prefer to delegate to other tooling", as I said. Because it represents a pretty radical change to their workflow.
> That Pip is 19 years older than NPM doesn't have to be a negative. Those are 19 years more time to have worked on the issues as well.
In those 19 years people worked out ways to use Python and share code that bear no resemblance to anything that people mean today when they use the term "ecosystem". And they will be very upset if they're forced to adapt. Reading the Packaging section of the Python Discourse forum (https://discuss.python.org/c/packaging/14) is enlightening in this regard.
> In those 19 years no one had issues with non-reproducible builds?
Of course they have. That's one of the reasons why uv is the N+1th competitor in its niche; why Conda exists; why meson-python (https://mesonbuild.com/meson-python/index.html) exists; why https://pypackaging-native.github.io/ exists; etc. Pip isn't in a position to solve these kinds of problems because of a) the core Python team's attitude towards packaging; b) Pip's intended and declared scope; and c) the sheer range of needs of the entire Python community. (Pip doesn't actually even do builds; it delegates to whichever build backend is declared in the project metadata, defaulting to Setuptools.)
But it sounds more like you're talking about lockfiles with hashes. In which case, please just see https://peps.python.org/pep-0751/ and the corresponding discussion ("Post-History" links there).
But... the 86GB python dependency download cache on my primary SSD, most of which can be attributed to the 50 different versions of torch, is testament to the fact that even uv cannot salvage the mess that is pip.
Never felt this much rage at the state of a language/build system in the 25 years that I have been programming. And I had to deal with Scala's SBT ("Simple Build Tool") in another life.
I don't think pip is to blame for that. PyTorch is sadly an enormous space hog.
I just started a fresh virtual environment with "python -m venv venv" - running "du -h" showed it to be 21MB. After running "venv/bin/pip install torch" it's now 431MB.
I use uv pip to install dependencies for any LLM software I run. I am not sure if uv re-implements the pip logic or hands over resolution to pip. But it does not change the fact that I have multiple versions of torch + multiple installations of the same version of torch in the cache.
Compare this to the way something like maven/gradle handles this and you have to wonder WTF is going on here.
There is a "stable ABI" which is a subset of the full ABI, but no requirement to stick to it. The ABI effectively changes with every minor Python version - because they're constantly trying to improve the Python VM, which often involves re-working the internal representations of built-in types, etc. (Consider for example the improvements made to dictionaries in Python 3.6 - https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-compa... .) Of course they try to make proper abstracted interfaces for those C structs, but this is a 34 year old project and design decisions get re-thought all the time and there are a huge variety of tiny details which could change and countless people with legacy code using deprecated interfaces.
The bytecode also changes with every minor Python version (and several times during the development of each). The bytecode file format is versioned for this reason, and .pyc caches need to be regenerated. (And every now and then you'll hit a speed bump, like old code using `async` as an identifier which subsequently becomes a keyword. That hit TensorFlow once: https://stackoverflow.com/questions/51337939 .)
Very different way of doing things compared to the JVM which is what I have most experience with.
Was some kind of FFI using dlopen and sharing memory across the vm boundary ever considered in the past, instead of having to compile extensions alongside a particular version of python?
I remember seeing some ffi library, probably on pypi. But I don't think it is part of standard python.
You can in fact use `dlopen`, via the support provided in the `ctypes` standard library. `freetype-py` (https://github.com/rougier/freetype-py) is an example of a project that works this way.
To my understanding, though, it's less performant. And you still need a stable ABI layer to call into. FFI can't save you if the C code decides in version N+1 that it expects the "memory shared across the vm boundary" to have a different layout.
> Something to do with breakage in ABI perhaps. Was looking at the way python implements extensions the other day. Very weird.
Yes, it's essentially that: CPython doesn't guarantee exact ABI stability between versions unless the extension (and its enclosing package) intentionally build against the stable ABI[1].
The courteous thing to do in the Python packaging ecosystem is to build "abi3" wheels that are stable and therefore don't need to be duplicated as many times (either on the index or on the installing client). Torch doesn't build these wheels for whatever reason, so you end up with multiple slightly different but functionally identical builds for each version of Python you're using.
TL;DR: This happens because of an interaction between two patterns that Python makes very easy: using multiple Python versions, and building/installing binary extensions. In a sense, it's a symptom of Python's success: other ecosystems don't have these problems because they have far fewer people running multiple configurations simultaneously.
My use of python is somewhat recent. But the two languages that I have used a lot of - Java and JS - have interpreters that were heavily optimized over time. I wonder why that never happened with python and, instead, everyone continues to write their critical code in C/Rust.
I am planning to shift some of my stuff to pypy (so a "fast" python exists, kind of). But some dependencies can be problematic, I have heard.
Neither Java nor JS encourages the use of native extensions to the same degree that Python does. So some of it is a fundamental difference in approach: Python has gotten very far by offloading hot paths into native code instead of optimizing the interpreter itself.
(Recent positive developments in Python’s interpreted performance have subverted this informal tendency.)
Node also introduced a stable extension API that people could build native code against relatively early in its history compared to Python. That and the general velocity of the V8 interpreter and its complex API kept developers from reaching in like they did with Python, or leaving tons of libraries in the ecosystem that are too critical to drop.
Yeah, I think it's mostly about complexity: CPython's APIs also change quite a bit, but they're pretty simple (in the "simple enough to hang yourself with" sense).
> Neither Java nor JS encourages the use of native extensions to the same degree that Python does.
You already had billions of lines of Java and JS code that HAD to be sped up. So they had no alternative. If python had gone down the same route, speeding it up without caveats would have been that much easier.
I don't think that's the reason. All three ecosystems had the same inflection point, and chose different solutions to it. Python's was especially "easy" since the C API was already widely used and there were no other particular constraints (WORA for Java, pervasive async for JS) that impeded it.
>> My use of python is somewhat recent. But the two languages that I have used a lot of - Java and JS - have interpreters that were heavily optimized over time. I wonder why that never happened with python and, instead, everyone continues to write their critical code in C/Rust.
Improving Python performance has been a topic as far back as 2008 when I attended my first PyCon. A quick detour on Python 3 because there is some historical revisionism because many online people weren't around in the earlier days.
Back then the big migration to Python 3 was in front of the community. The timeline concerns that popped up when Python really got steam in the industry between 2012 and 2015 weren't as huge a concern. You can refer to Guido's talks from PyCon 2008 and 2009 if they are available somewhere to get the vibe on the urgency. Python is impactful because it changes the language and platform while requiring a massive amount of effort.
Back to perf. Around 2008, there was a feeling that an alternative to CPython might be the future. Candidates included IronPython, Jython, and PyPy. Others like Unladen Swallow wanted to make major changes to CPython (https://peps.python.org/pep-3146/).
All of these alternative implementations of Python from this time period have basically failed at the goal of replacing CPython. IronPython was a Python 2 implmentation and updating to Python 3 while trying grow to challenge CPython was impossible. Eventually, Microsoft lost interest and that was that. Similar things happened for the others.
GIL removal was a constant topic from 2008 until recently. Compatibility of extensions was a major concern causing inertia and the popularity meant even more C/C++/Rust code relying on a GIL. The option to disable (https://peps.python.org/pep-0703/) only happened because the groundwork was eventually done properly to help the community move.
The JVM has very clearly defined interfaces and specs similar to the CLR which make optimization viable. JS doesn't have the compatibility concerns.
That was just a rough overview but many of the stories of Python woes miss a lot of this context. Many discussions about perf over the years have descended into a GIL discussion without any data to show the GIL would change performance. People love to talk about it but turn out to be IO-bound when you profile code.
A bit baffling, IMO, the focus on GIL over actual python performance, particularly when you had so many examples of language virtual machines improving performance in that era. So many lost opportunities.
They don't want to throw away the extensions and ecosystem. Let's say Jython, or some other modern implementation became the successor. All of the extensions need to be updated (frequently rewritten) to be compatible with and exploit the characteristics of that platform.
It was expected that extension maintainers would respond negatively to this. In many cases it presents a decision: do I port this to the new platform, or move away from Python completely? You have to remember, the impactful decisions leading us down this path were closer to 2008 than today when dropping Python or making it the second option to help people migrate, would have been viable for a lot of these extensions. There was also a lot of potential for people to follow a fork of the traditional CPython interpreter.
There were no great options because there are many variables to consider. Perf is only one of them. Pushing ahead only on perf is hard when it's unclear if it'll actually impact people in the way they think it will when they can't characterize their actual perf problem beyond "GIL bad".
As a long time Pythonista I was going to push back against your suggestion that Python didn't have much momentum until recently, but then I looked at the historic graph on https://www.tiobe.com/tiobe-index/ and yeah, Python's current huge rise in popularity didn't really get started until around 2018.
Yes, TIOBE is garbage. The biggest problem is that because they're coy about methodology we don't even know what we're talking about. Rust's "Most Loved" Stack Overflow numbers were at least a specific thing where you can say OK that doesn't mean there's more Rust software or that Rust programmers get paid more, apparently the people programming in Rust really like Rust, more so than say, Python programmers loved Python - so that's good to know, but it's that and not anything else.
From what I can tell it wasn't as prominent as it has been recently, with being a popular pick for random projects that weren't just gluing things together. The big companies that used it were perfectly happy specializing the interpreter to their use case instead of upstreaming general improvements
The reason why people don't always use abi3 is because not everything that can be done with the full API is even possible with the limited one, and some things that are possible carry a significant perf hit.
I think that's a reason, but I don't think it's the main one: the main one is that native builds don't generally default to abi3, so people (1) publish larger matrices than they actually need to, and (2) end up depending on non-abi3 constructs when abi3 ones are available.
(I don't know if this is the reason in Torch's case or not, but I know from experience that it's the reason for many other popular Python packages.)
Yes, you're right; I should have clarified my comment with, "people who know the difference to begin with", which is something one needs to learn first (and very few tutorials etc on Python native modules even mention the limited API).
That is very cool! I would imagine that even if they didn’t implement explicit cloning, APFS would still clone the files.
`nix-store --optimize` is a little different because it looks for duplicates and hardlinks those files across all of the packages. I don’t know how much additional savings that would yield with a typical uv site-packages, but it guarantees no duplicate files.
It probably is 5.5GB. UV caches versions of packages and symlinks to them but pytorch is infamous for many different versions especially for different features.
And since its compiled dependency even if UV was to attempt the the more complicated method of symlinking to use a single version of identical files it wouldn't help much. You'd probably need to store binary diffs or chunks of files that are binary identical, at that point your code would probably start to resemble a file system in user space and time to switch to a particular version of the files (ie create thek as qctual files in filesystem) would be much higher.
Also I believe uvs cache is separate from the pip cache so you could have different copies in both.
I think there's a uv cache prune command. Arguably it should offer to install a from job to do it periodically
Testing this out locally, repro'd pretty similar numbers on macOS ARM and docker. Unfortunately the CPU-only build isn't really any smaller either, thought it was
That problem is very much not pip (pip is only the installer), the issue is:
* We have a conflict between being easy to use (people don't need to work out which version of cuda/which gpu settings/libraries/etc. to use) vs install size (it's basically the x86 vs arm issue, except at least 10 fold larger). Rather than making it the end-users problem, packages bundle all possible options into a single artifact (MacOS does this same, but see the 10 fold larger issue).
* The are almost fundamental assumptions (and the newer Python packaging tools, including uv, very much rely on these) that Python packaging makes that are inherently about how a system should act (basically, frozen, no detection available apart from the "OS"), which do not align with having hardware/software that much be detected. One could do this via sdists, but Windows plus the issues around dynamic metadata make this a non-starter (and hence tools like conda, spack and others from the more scientific side of the ecosystem have been created—notably on the more webby side, this problems are solved either via vendoring non-python libraries, or making it someone/something else's problem, hence docker or the cloud for databases or other companion services).
* Frankly, more and more developers have no idea how systems are built (and this isn't just a Python issue). Docker lets people hide their sins, with magical invocations that just work (and static linking in many cases sadly does the same). There are tools out of the hyperscalars which are designed to solve these problems, but they solve it by creating tools that experts can wrangle many systems and hence imply you have a team which can do the wrangling.
Can this be solved? Maybe, but not by a new tool (on its own). It would require a lot of devs who may not see much improvement to their workflow change their workflow for others (to newer ones which remove the assumptions which are built in to the current workflows), plus a bunch of work by key stakeholders (and maybe even the open sourcing of some crown jewels), and I don't see that happening.
> people don't need to work out which version of cuda/which gpu settings/libraries/etc. to use
This is not true in my case. The regular pytorch does not work on my system. I had to download a version specific to my system from the pytorch website using --index-url.
> packages bundle all possible options into a single artifact
Cross-platform Java apps do it too. For e.g., see https://github.com/xerial/sqlite-jdbc. But it does not become a clusterfuck like it does with python. After downloading gigabytes and gigabytes of dependencies repeatedly, the python tool you are trying to run will refuse to do so for random reasons.
You cannot serve end-users a shit-sandwich of this kind.
The python ecosystem is a big mess and, outside of a few projects like uv, I don't see anyone trying to build a sensible solution that tries to improve both speed/performance and packaging/distribution.
That's a pytorch issue. The solution is, as always, build from source. You will understand how the system is assembled, then you can build a minimal version meeting your specific needs (which, given wheels are a well-defined thing, you can then store on a server for reuse).
Cross-OS (especially with a VM like Java or JS) is relatively easy compared to needing specific versions for every single sub-architecture of a CPU and GPU system (and that's ignoring all the other bespoke hardware that's out there).
Cross platform Java doesn't have the issue because the JVM is handling all of that for you. But if you want native extensions written in C you get back to the same problem pretty quickly.
The SQLite project I linked to is a JDBC driver that makes use of the C version of the library appropriate to each OS. LWJGL (https://repo1.maven.org/maven2/org/lwjgl/lwjgl/3.3.6/) is another project which heavily relies on native code. But distributing these, or using these as dependencies, does not result in hair-pulling like it does with python.
There's native code like SQLite which assuming a sensible file system and C compiler is quite portable, and then there's native code which cares about exact compiler versions, driver version, and the exact model of your CPU, GPU and NIC. My suggestion is go look at how to program a GPU using naive vulkan/metal, and then look for the dark magic that is used to make GPUs run fast. It's the latter you're encountering with the ML python projects.
Sure, I was just curious, since you mentioned not wanting to use ZFS without kernel support and BTRFS does have that. Being familiar with ZFS, I guess is a decent explanation.
When the topic of backups came up last year, I talked about my current solution: https://news.ycombinator.com/item?id=41042790. Someone suggested a workaround in the form of zfsbootmenu but I decided to stick to the simple way of doing things.
Main issue with symlink is needing to choose the source of truth— one needs to be the real file, and the other point to it. You also need to make sure they have the same lifetimes to prevent dangling links.
Hardlink is somewhat better because both point to the same inode, but will also not work if the file needs different permissions or needs to be independently mutable from different locations.
Reflink hits the sweetspot where it can have different permissions, updates trigger CoW preventing confusing mutations, and all while still reducing total disk usage.
I don't disagree but I think some of these problems could potentially be solved by having somewhat of a birds nest of a filesystem for large blobs, eg.
/blobs/<sha256_sum>/filename.zip
and then symlinking/reflinking filename.zip to wherever it needs to be in the source tree...
It's more portable than hardlinks, solves your "source of truth" problem and has pretty wide platform support.
Platforms that don't support symlinks/reflinks could copy the files to where they need to be then delete the blob store at the end and be no worse off than they are now.
Anyway, I'm just a netizen making a drive-by comment.
Not sure. The simplest solution is to store all files under a hashed name and sym/hardlink on a case to case basis. But some applications tend to behave weirdly with such files. Windows has its own implementation of symlinks and hardlinks. They simply call it something else. Perhaps portability could be an issue.
It defines a DSL for your build that looks roughly like Scala code. But… it’s not! And there is a confusing “resolution” system for build tasks/settings. It’s also slow as shit. See https://www.lihaoyi.com/post/SowhatswrongwithSBT.html for a comprehensive takedown. If you’re interested in just playing around with scala I would use
Like so many other articles that make some offhand remarks about conda, this article raves about a bunch of "new" features that conda has had for years.
> Being independent from Python bootstrapping
Yep, conda.
> Being capable of installing and running Python in one unified congruent way across all situations and platforms.
Yep, conda.
> Having a very strong dependency resolver.
Yep, conda (or mamba).
The main thing conda doesn't seem to have which uv has is all the "project management" stuff. Which is fine, it's clear people want that. But it's weird to me to see these articles that are so excited about being able to install Python easily when that's been doable with conda for ages. (And conda has additional features not present in uv or other tools.)
The pro and con of tools like uv is that they layer over the base-level tools like pip. The pro of that is that they interoperate well with pip. The con is that they inherit the limitations of that packaging model (notably the inability to distribute non-Python dependencies separately).
That's not to say uv is bad. It seems like a cool tool and I'm intrigued to see where it goes.
These are good points. But I think there needs to be an explanation why conda hasn't taken off more. Especially since it can handle other languages too. I've tried to get conda to work for me for more than a decade, at least once a year. What happens to me:
1) I can't solve for the tools I need and I don't know what to do. I try another tool, it works, I can move forward and don't go back to conda
2) it takes 20-60 minutes to solve, if it ever does. I quit and don't come back. I hear this doesn't happen anymore, but to this day I shudder before I hit enter on a conda install command
3) I spoil my base environment with an accidental install of something, and get annoyed and switch away.
On top of that the commands are opaque, unintuitive, and mysterious. Do I do conda env command or just conda command? Do I need a -n? The basics are difficult and at this point I'm too ashamed to ask which of the many many docs explain it, and I know I will forget within two months.
I have had zero of these problems with uv. If I screw up or it doesn't work it tells me right away. I don't need to wait for a couple minutes before pressing y to continue, I just get what I need in at most seconds, if my connection is slow.
If you're ina controlled environment and need audited packages, I would definitely put up with conda. But for open source, personal throw away projects, and anything that doesn't need a security clearance, I'm not going to deal with that beast.
Conda is the dreaded solution to the dreadful ML/scientific Python works-on-my-computer dependency spaghetti projects. One has to be crazy to suggest it for anything else.
uv hardly occupies the same problem space. It elevates DX with disciplined projects to new heights, but still falls short with undisciplined projects with tons of undeclared/poorly declared external dependencies, often transitive — commonly seen in ML (now AI) and scientific computing. Not its fault of course. I was pulling my hair out with one such project the other day, and uv didn’t help that much beyond being a turbo-charged pip and pyenv.
Eh, ML/scientific Python is large and not homogeneous. For code that should work on cluster, I would lean towards a Docker/container solution. For simpler dependancy use cases, pyenv/venv duo is alright. For some specific lib that have a conda package, it might be better to use conda, _might be_.
One illustration is the CUDA toolkit with torch install on conda. If you need a basic setup, it would work (and takes age). But if you need some other specific tools in the suite, or need it to be more lightweight for whatever reason then good luck.
btw, I do not see much interest in uv. pyenv/pip/venv/hatch are simple enough to me. No need for another layer of abstraction between my machine and my env. I will still keep an eye on uv.
I always enjoyed the "one-stop" solution with conda/mamba that installed the right version of cudatoolkit along with pytorch. How do you handle that without conda? (I'm genuinely ask because I never had to actually care about it.) If I manually install it, it looks like it is going to be a mess if I have to juggle multiple versions.
Add to that the licensing of conda. In my company, we are not allowed to use conda because the company would rather not pay, so might as well use some other tool which does things faster.
conda (the package) is open source, it's the installer from Anaconda Corp (nee ContinuumIO) and their package index that are a problem. If you use the installer from https://conda-forge.org/download/, you get the conda-forge index instead, which avoids the license issues.
We've been working on all the shortcomings in `pixi`: pixi.sh
It's very fast, comes with lockfiles and a project-based approach. Also comes with a `global` mode where you can install tools into sandboxed environments.
My completely unvarnished thoughts, in the hope that they are useful: I had one JIRA-ticket-worth of stuff to do on a conda environment, and was going to try to use pixi, but IIRC I got confused about how to use the environment.yml and went back to conda grudgingly. I still have pixi installed on my machine, and when I look through the list of subcommands, it does seem to probably have a better UX than conda.
When I go to https://prefix.dev, the "Get Started Quickly" section has what looks like a terminal window, but the text inside is inscrutable. What do they various lines mean? There's directories, maybe commands, check boxes... I don't get it. It doesn't look like a shell despite the Terminal wrapping box.
Below that I see that there's a pixi.toml, but I don't really want a new toml or yml file, there's enough repository lice to confuse new people on projects already.
Any time spent educating on packaging is time not spent on discovery, and is an impediment to onboarding.
I'd probably check this out in my home lab but as a corporate user the offering of discord as a support channel makes me nervous.
Discord is predominately blocked on a corporate networks. Artifactory ( & Nexus) are very common in corporate environments. Corporate proxies are even more common. This is why I'd hesitate. These are common use cases (albeit corporate) that may not be readily available in the docs.
Have you used the contemporary tooling in this space? `mamba` (and ~therefore, `pixi`) is fast, and you can turn off the base environment. The UX is nicer,too!
Conda might have all these features, but it's kinda moot all the time no one can get them to work. My experience with conda is pulling a project, trying to install it, and it then spending hours trying to resolve dependencies. And any change would often break the whole environment.
Yes, conda has a lot more features on paper. And it supports non-Python dependencies which is super important in some contexts.
However, after using conda for over three years I can confidently say I don't like using it. I find it to be slow and annoying, often creating more problems than it solves. Mamba is markedly better but still manages to confuse itself.
uv just works, if your desktop environment is relatively modern. that's its biggest selling point, and why I'm hooked on it.
Besides being much slower, and taking up much more space per environment, than uv, conda also has a nasty habit of causing unrelated things to break in weird ways. I've mostly stopped using it at this point, for that reason, tho I've still had to reach for it on occasion. Maybe pixi can replace those use cases. I really should give it a try.
There's also the issue the license for using the repos, which makes it risky to rely on conda/anaconda. See e.g. https://stackoverflow.com/a/74762864
Not sure what you mean about space. Conda uses hardlinks for the most part, so environment size is shared (although disk usage tools don't always correctly report this).
>Like so many other articles that make some offhand remarks about conda, this article raves about a bunch of "new" features that conda has had for years.
Agreed. (I'm also tired of seeing advances like PEP 723 attributed to uv, or uv's benefits being attributed to it being written in Rust, or at least to it not being written in Python, in cases where that doesn't really hold up to scrutiny.)
> The pro and con of tools like uv is that they layer over the base-level tools like pip. The pro of that is that they interoperate well with pip.
It's a pretty big pro ;) But I would say it's at least as much about "layering over the base-level tools" like venv.
> The con is that they inherit the limitations of that packaging model (notably the inability to distribute non-Python dependencies separately).
I still haven't found anything that requires packages to contain any Python code (aside from any build system configuration). In principle you can make a wheel today that just dumps a platform-appropriate shared library file for, e.g. OpenBLAS into the user's `site-packages`; and others could make wheels declaring yours as a dependency. The only reason they wouldn't connect up - that I can think of, anyway - is because their own Python wrappers currently don't hard-code the right relative path, and current build systems wouldn't make it easy to fix that. (Although, I guess SWIG-style wrappers would have to somehow link against the installed dependency at their own install time, and this would be a problem when using build isolation.)
> The only reason they wouldn't connect up - that I can think of, anyway - is because their own Python wrappers currently don't hard-code the right relative path
It's not just that, it's that you can't specify them as dependencies in a coordinated way as you can with Python libs. You can dump a DLL somewhere but if it's the wrong version for some other library, it will break, and there's no way for packages to tell each other what versions of those shared libraries they need. With conda you can directly specify the version constraints on non-Python packages. Now, yeah, they still need to be built in a consistent manner to work, but that's what conda-forge handles.
Ah, right, I forgot about those issues (I'm thankful I don't write that sort of code myself - I can't say I ever enjoyed C even if I used to use it regularly many years ago). I guess PEP 725 is meant to address this sort of thing, too (as well as build-time requirements like compilers)... ?
I guess one possible workaround is to automate making a wheel for each version of the compiled library, and have the wheel version move in lockstep. Then you just specify the exact wheel versions in your dependencies, and infer the paths according to the wheel package names... it certainly doesn't sound pleasant, though. And, C being what it is, I'm sure that still overlooks something.
> I still haven't found anything that requires packages to contain any Python code (aside from any build system configuration). In principle you can make a wheel today that just...
Ah, I forgot the best illustration of this: uv itself is available this way - and you can trivially install it with Pipx as a result. (I actually did this a while back, and forgot about it until I wanted to test venv creation for another comment...)
> But it's weird to me to see these articles that are so excited about being able to install Python easily when that's been doable with conda for ages. (And conda has additional features not present in uv or other tools.)
I used conda for awhile around 2018. My environment became borked multiple times and I eventually gave up on it. After that, I never had issues with my environment becoming corrupted. I knew several other people who had the same issues and it stopped after they switched away from conda.
I've heard it's better now, but that experience burned me so I haven't kept up with it.
Having the features is not nearly as much use if the whole thing's too slow to use. I frequently get mamba taking multiple minutes to figure out how to install a package. I use and like Anaconda and miniforge, but their speed for package management is really frustrating.
Thanks for bringing up conda. We're definitely trying to paint this vision as well with `pixi` (https://pixi.sh) - which is a modern package manager, written in Rust, but using the Conda ecosystem under the hood.
It follows more of a project based approach, comes with lockfiles and a lightweight task system. But we're building it up for much bigger tasks as well (`pixi build` will be a bit like Bazel for cross-platform, cross-language software building tasks).
While I agree that conda has many short-comings, the fundamental packages are alright and there is a huge community keeping the fully open source (conda-forge) distribution running nicely.
I just want to give a hearty thank you for pixi. It's been an absolute godsend for us. I can't express how much of a headache it was to deal with conda environments with student coursework and research projects in ML, especially when they leave and another student builds upon their work. There was no telling if the environment.yml in a student's repo was actually up to date or not, and most often didn't include actual version constraints for dependencies. We also provide an HPC cluster for students, which brings along its own set of headaches.
Now, I just give students a pixi.toml and pixi.lock, and a few commands in the README to get them started. It'll even prevent students from running their projects, adding packages, or installing environments when working on our cluster unless they're on a node with GPUs. My inbox used to be flooded with questions from students asking why packages weren't installing or why their code was failing with errors about CUDA, and more often than not, it was because they didn't allocate any GPUs to their HPC job.
And, as an added bonus, it lets me install tools that I use often with the global install command without needing to inundate our HPC IT group with requests.
It's worth noting that uv does not use pip, and it is entirely possible (as noted by uv's existence) to write a new installer that uses PyPI. The conflicts between pip, conda, and any other installers is all about one (or more) installers not having a complete view of the system, typically by having different packaging formats/indices/metadata.
> The main thing conda doesn't seem to have which uv has is all the "project management" stuff.
Pixi[1] is an alternative conda package manager (as in it still uses conda repositories; conda-forge by default) that bridges this gap. It even uses uv for PyPI packages if you can't find what you need in conda repositories.
I come from the viewpoint that I don't want my build tool to install my Python for me.
In the same vein, I don't want Gradle or Maven to install my JVM for me.
In JVM land I use SDKMAN! (Yes, that's what the amazingly awesome (in the original sense of awe: "an emotion variously combining dread, veneration, and wonder") concretion of Bash scripts is called).
In Python land I use pyenv.
And I expect my build tool to respect the JVM/Python versions I've set (looking at you Poetry...) and fail if they can't find them (You know what you did, Poetry. But you're still so much better than Pipenv)
Suppose conda had projects. Still, it is somewhat incredible to see uv resolve + install in 2 seconds what takes conda 10 minutes. It immediately made me want to replace conda with uv whenever possible.
(I have actively used conda for years, and don’t see myself stopping entirely because of non Python support, but I do see myself switching primarily to uv.)
It's true conda used to be slow, but that was mostly at a time when pip had no real dependency resolver at all. Since I started using mamba, I haven't noticed meaningful speed problems. I confess I'm always a bit puzzled at how much people seem to care about speed for things like install. Like, yes, 10 minutes is a problem, but these days mamba often takes like 15 seconds or so. Okay, that could be faster, but installing isn't something I do super often so I don't see it as a huge problem.
The near instant install speed is just such a productivity boost. It's not the time you save, it's how it enables you to stay in flow.
In my previous job we had a massive internal library hosted on Azure that took like 5 minutes to install with pip or conda. Those on my team not using uv either resorted to using a single global environment for everything, which they dreaded experiment with, or made a new environment once in the project's history and avoided installing new dependencies like the plague. Uv took less than 30 seconds to install the packages, so it freed up a way better workflow of having disposable envs that I could just nuke and start over if they went bad.
Agreed. When I tried uv first I was immediately taken aback by the sheer speed. The functionality seemed OK, but the speed - woah. Inspiring. So I kept using it. Got used to it now.
In my mind, conda has always been a purpose-specific environment manager for people who live in data science / matplotlib / numpy / jupyter all day every day.
Ruby-the-language is now inseparable from Rails because the venn diagram of the “Ruby” community and the “rails” community is nearly a circle. It can be hard to find help with plain Ruby because 99% of people will assume you have the rails stdlib monkeypatches.
In a similar way, conda and data science seem to be conjoined, and I don’t really see anybody using conda as a general-purpose Python environment manager.
That is the license for the anaconda package channel, not conda. The page you linked explains that conda and conda-forge are not subject to those licensing issues.
> To bootstrap a conda installation, use a minimal installer such as Miniconda or Miniforge.
> Conda is also included in the Anaconda Distribution.
Bam, you’ve already lost me there. Good luck getting this approved on our locked down laptops.
No pip compatibility? No venv compatibility? Into the trash it goes, it’s not standard. The beauty of uv is that it mostly looks like glue (even though it is more) for standard tooling.
True and uv will probbaly never brings non python deps on the table.
But anaconda doesn't do inline deps, isn't a consitent experience (the typical conda project doesn't exist), is it's own island incompatible with most python ecosystem, is super slow, the yaml config is very quirky, and it's very badly documented while having poor ergonomics.
In short, anaconda solves many of those problems but brings other ones on the table.
I think at this point, the only question it remains is how Astral will make money. But if they can package some sort enterprise package index with some security bells and whistles it seems an easy sell into a ton of orgs.
before you can install the package, you first have to install some other package whose only purpose is to break pip so it uses nvidia's package registry. This does not work with uv, even with the `uv pip` interface, because uv rightly doesn't put up with that shit.
This is of course not Astral's fault, I don't expect them to handle this, but uv has spoiled me so much it makes anything else even more painful than it was before uv.
>whose only purpose is to break pip so it uses nvidia's package registry. This does not work with uv, even with the `uv pip` interface, because uv rightly doesn't put up with that shit.
I guess you're really talking about `nvidia-pyindex`. This works by leveraging the legacy Setuptools build system to "build from source" on the user's machine, but really just running arbitrary code. From what I can tell, it could be made to work just as well with any build system that supports actually orchestrating the build (i.e., not Flit, which is designed for pure Python projects), and with the modern `pyproject.toml` based standards. It's not that it "doesn't work with uv"; it works specifically with Pip, by trying to run the current (i.e.: target for installation) Python environment's copy of Pip, calling undocumented internal APIs (`from pip._internal.configuration import get_configuration_files`) to locate Pip's config, and then parsing and editing those files. If it doesn't work with `uv pip`, I'm assuming that's because uv is using a vendored Pip that isn't in that environment and thus can't be run that way.
Nothing prevents you, incidentally, from setting up a global Pip that's separate from all your venvs, and manually creating venvs that don't contain Pip (which makes that creation much faster): https://zahlman.github.io/posts/2025/01/07/python-packaging-... But it does, presumably, interfere with hacks like this one. Pip doesn't expose a programmatic API, and there's no reason why it should be in the environment if you haven't explicitly declared it as a dependency - people just assume it will be there, because "the user installed my code and presumably that was done using Pip, so of course it's in the environment".
yes, but if you’re not in their carefully directed “nemo environment” the nemo2riva command fails complaining about some hydra dependency. and on it goes…
I think the biggest praise I can give uv is that as a non Python dev, it makes Python a lot more accessible. The ecosystem can be really confusing to approach as an outsider. There’s like 5 different ways to create virtual environments. With uv, you don’t have to care about any of that. The venv and your Python install are just handled for you by ‘uv run’, which is magic.
Can someone explain a non-project based workflow/configuration for uv? I get creating a bespoke folder, repo, and uv venv for certain long-lived projects (like creating different apps?).
But most of my work, since I adopted conda 7ish years ago, involves using the same ML environment across any number of folders or even throw-away notebooks on the desktop, for instance. I’ll create the environment and sometimes add new packages, but rarely update it, unless I feel like a spring cleaning. And I like knowing that I have the same environment across all my machines, so I don’t have to think about if I’m running the same script or notebook on a different machine today.
The idea of a new environment for each of my related “projects” just doesn’t make sense to me. But, I’m open to learning a new workflow.
Addition: I don’t run other’s code, like pretrained models built with specific package requirements.
`uv` isn't great for that, I've been specifying and rebuilding my environments for each "project".
My one off notebook I'm going to set up to be similar to the scripts, will require some mods.
It does take up a lot more space, it is quite a bit faster.
However, you could use the workspace concept for this I believe, and have the dependencies for all the projects described in one root folder and then all sub-folders will use the environment.
But I mean, our use case is very different than yours, its not necessary to use uv.
FYI, for anyone else that stumbles upon this: I decided to do a quick check on PyTorch (the most problem-prone dependency I've had), and noticed that they recommending specifically no longer using conda—and have since last November.
I personally have a "sandbox" directory that I put one-off and prototype projects in. My rule is that git repos never go in any dir there. I can (and do) go in almost any time and rm anything older than 12 months.
In your case, I guess one thing you could do is have one git repo containing you most commonly-used dependencies and put your sub-projects as directories beneath that? Or even keep a branch for each sub-project?
One thing about `uv` is that dependency resolution is very fast, so updating your venv to switch between "projects" is probably no big deal.
> The idea of a new environment for each of my related “projects” just doesn’t make sense to me. But, I’m open to learning a new workflow.
First, let me try to make sense of it for you -
One of uv's big ideas is that it has a much better approach to caching downloaded packages, which lets it create those environments much more quickly. (I guess things like "written in Rust", parallelism etc. help, but as far as I can tell most of the work is stuff like hard-linking files, so it's still limited by system calls.) It also hard-links duplicates, so that you aren't wasting tons of space by having multiple environments with common dependencies.
A big part of the point of making separate environments is that you can track what each project is dependent on separately. In combination with Python ecosystem standards (like `pyproject.toml`, the inline script metadata described by https://peps.python.org/pep-0723/, the upcoming lock file standard in https://peps.python.org/pep-0751/, etc.) you become able to reproduce a minimal environment, automate that reproduction, and create an installable sharable package for the code (a "wheel", generally) which you can publish on PyPI - allowing others to install the code into an environment which is automatically updated to have the needed dependencies. Of course, none of this is new with `uv`, nor depends on it.
The installer and venv management tool I'm developing (https://github.com/zahlman/paper) is intended to address use cases like yours more directly. It isn't a workflow tool, but it's intended to make it easier to set up new venvs, install packages into venvs (and say which venv to install it into) and then you can just activate the venv you want normally.
(I'm thinking of having it maintain a mapping of symbolic names for the venvs it creates, and a command to look them up - so you could do things like "source `paper env-path foo`/bin/activate", or maybe put a thin wrapper around that. But I want to try very hard to avoid creating the impression of implementing any kind of integrated development tool - it's an integrated user tool, for setting up applications and libraries.)
That's my main use case not-yet-supported by uv. It should not be too difficult to add a feature or wrapper to uv so that it works like pew/virtualenvwrapper.
E.g. calling that wrapper uvv, something like
1. uvv new <venv-name> --python=... ...# venvs stored in a central location
2. uvv workon <venv-name> # now you are in the virtualenv
3. deactive # now you get out of the virtualenv
You could imagine additional features such as keeping a log of the installed packages inside the venv so that you could revert to arbitrary state, etc. as goodies given how much faster uv is.
I've worked like you described for years and it mostly works. Although I've recently started to experiment with a new uv based workflow that looks like this:
To open a notebook I run (via an alias)
uv tool run jupyter lab
and then in the first cell of each notebook I have
!uv pip install my-dependcies
This takes care of all the venv management stuff and makes sure that I always have the dependencies I need for each notebook. Only been doing this for a few weeks, but so far so good.
Why not just copy your last env into the next dir? If you need to change any of the package versions, or add something specific, you can do that without risking any breakages in your last project(s). From what I understand uv has a global package cache so the disk usage shouldn't be crazy.
Yeah, this is how I feel too. A lot of the movement in Python packaging seems to be more in managing projects than managing packages or even environments. I tend to not want to think about a "project" until very late in the game, after I've already written a bunch of code. I don't want "make a project" to be something I'm required or even encouraged to do at the outset.
I have the opposite feeling, and that's why I like uv. I don't want to deal with "environments". When I run a Python project I want its PYTHONPATH to have whatever libraries its config file says it should have, and I don't want to have to worry about how they get there.
I set up a "sandbox" project as an early step of setting up a new PC.
Sadly for certain types of projects like GIS, ML, scientific computing, the dependencies tend to be mutually incompatible and I've learned the hard way to set up new projects for each separate task when using those packages. `uv init; uv add <dependencies>` is a small amount of work to avoid the headaches of Torch etc.
Since this seems to be a love fest let me offer a contrarian view. I use conda for environment management and pip for package management. This neatly separates the concerns into two tools that are good at what they do. I'm afraid that uv is another round of "Let's fix everything" just to create another soon to be dead set of patterns. I find nothing innovative or pleasing in its design, nor do I feel that it is particularly intuitive or usable.
You don't have to love uv, and there are plenty of reasons not to.
Dozens of threads of people praising how performant and easy uv is, how it builds on standards and current tooling instead of inventing new incompatible set of crap, and every time one comment pops up with “akshually my mix of conda, pyenv, pipx, poetry can already do that in record time of 5 minutes, why do you need uv? Its going to be dead soon”.
To be fair here: conda was praised as the solution to everything by many when it was new. It did have its own standards of course. Now most people hate it.
Every packaging PEP is also hailed as the solution to everything, only to be superseded by a new and incompatible PEP within two years.
Naive take. https://gwern.net/holy-war counsels that, in fact, becoming the One True Package Manager for a very popular programming language is an extremely valuable thing to aim towards. This is even outside of the fact that `uv` is backed by a profit-seeking company (cf https://astral.sh/about). I'm all for people choosing what works best for them, but I'm also staunchly pro-arguing over it.
>Which is why when random people start acting like it’s important, you have to wonder why it’s important to them.
I don't use uv because I don't currently trust that it will be maintained on the timescales I care about. I stick with pip and venv, because I expect they will still be around 10 years from now, because they have much deeper pools of interested people to draw contributors and maintainers from, because - wait for it - they are really popular. Your theory about random people being corporate shills for anything they keep an eye on the popularity of can be explained much more parsimoniously like that.
> I find nothing innovative or pleasing in its design, nor do I feel that it is particularly intuitive or usable.
TFA offers a myriad innovative and pleasing examples. It would have been nice if you actually commented on any of those, or otherwise explained why you think otherwise.
conda user for 10 years and uv skeptic for 18 months.
I get it! I loved my long-lived curated conda envs.
I finally tried uv to manage an environment and it’s got me hooked. That a projects dependencies can be so declarative and separated from the venv really sings for me! No more meticulous tracking of a env.yml or requirements.txt just ‘uv add` and `uv sync` and that’s it! I just don’t think about it anymore
I'm also a long time conda user and have recently switched to pixi (https://pixi.sh/), which gives a very similar experience for conda packages (and uses uv under the hood if you want to mix dependencies from pypi). It's been great and also has a `pixi global` similar to `pipx`, etc the makes it easy to grab general tools like ripgrep, ruff etc and make them widely available, but still managed.
Pip installs packages, but it provides rather limited functionality for actually managing what it installed. It won't directly spit out a dependency graph, won't help you figure out which of the packages installed in the current environment are actually needed for the current project, leaves any kind of locking up to you...
I agree that uv is the N+1th competing standard for a workflow tool, and I don't like workflow tools anyway, preferring to do my own integration. But the installer it provides does solve a lot of real problems that Pip has.
What does the additional flexibility get you? It's just one less thing to worry about when coordinating with a team, and it's easy to shift between different versions for a project if need be.
uv is so much better than everything else, I'm just can't afraid they can't keep the team going. Time will tell but I just use uv and ruff in every project now tbh.
A familiar tale: Joe is hesitant about switching to UV and isn't particularly excited about it. Eventually, he gives it a try and becomes a fan. Soon, Joe is recommending UV to everyone he knows.
I know good naming is hard, and there are an awful lot of project names that clash, but naming a project uv is unfortunate due to the ubiquitous nature of libuv
I don't think it's particularly problematic, uv the concurrency library and uv the Python tool cover such non-overlapping domains that opportunities for confusion are minimal.
(The principle is recognized in trademark law -- some may remember Apple the record label and Apple the computer company. They eventually clashed, but I don't see either of the uv's encroaching on the other's territory.)
Sure, there are so few backend Node.js engineers. Let alone game engine developers and Blender users with their UV mapping tools. None of these people will ever encounter Python in their daily lives.
> I don't think it's particularly problematic, uv the concurrency library and uv the Python tool cover such non-overlapping domains that opportunities for confusion are minimal.
Google returns mixed results. You may assert it's not problematic, but this is a source of noise that projects with distinct names don't have.
I'm not sure that's true. uvloop, built on libuv, is a pretty popular alternative event loop for async Python, much faster than the built-in. It certainly confused me at first to see a tool called "uv" that had nothing to do with that, because I'd been using libuv with Python for years before it came out.
A big thing that trips people up until they try to use a public project (from source) or an older project, is the concept of a dependencies file and a lock file.
The dependency file (what requirements.txt is supposed to be), just documents the things you depend on directly, and possibly known version constraints. A lock file captures the exact version of your direct and indirect dependencies at the moment in time it's generated.
When you go to use the project, it will read the lock file, if it exists, and match those versions for anything listed directly or indirectly in the dependency file. It's like keeping a snapshot of the exact last-working dependency configuration. You can always tell it to update the lock file and it will try to recaclulate everything from latest that meets your dependency constraints in the dependency file, but if something doesn't work with that you'll presumably have your old lock file to fall back on _that will still work_.
It's a standard issue/pattern in all dependency managers, but it's only been getting attention for a handful of years with the focus on reproducibility for supply chain verification/security. It has the side effect of helping old projects keep working much longer though.
Python has had multiple competing options for solutions, and only in the lat couple years did they pick a format winner.
If the dependency file is wrong, and describes versions that are incompatible with the project, it should be fixed. Duplicating that information elsewhere is wrong.
Lockfiles have a very obvious use case: Replicable builds across machines in CI. You want to ensure that all the builds in the farm are testing the same thing across multiple runs, and that new behaviors aren't introduced because numpy got revved in the middle of the process. When that collective testing process is over, the lockfile is discarded.
You should not use lockfiles as a "backup" to pyproject.toml. The version constraints in pyproject.toml should be correct. If you need to restrict to a single specific version, do so, "== 2.2.9" works fine.
Dependency files - whether the project's requirements (or optional requirements, or in the future, other arbitrary dependency groups) in `pyproject.toml`, or a list in a `requirements.txt` file (the filename here is actually arbitrary) don't describe versions at all, in general. Their purpose is to describe what's needed to support the current code: its direct dependencies, with only as much restriction on versions as is required. The base assumption is that if a new version of a dependency comes out, it's still expected to work (unless a cap is set explicitly), and has a good chance of improving things in general (better UI, more performant, whatever). This is suitable for library development: when others will cite your code as a dependency, you avoid placing unnecessary restrictions on their environment.
Lockfiles are meant to describe the exact version of everything that should be in the environment to have exact reproducible behaviour (not just "working"), including transitive dependencies. The base assumption is that any change to anything in the environment introduces an unacceptable risk; this is the tested configuration. This is suitable for application development: your project is necessarily the end of the line, so you expect others to be maximally conservative in meeting your specific needs.
You could also take this as an application of Postel's Law.
>Lockfiles have a very obvious use case: Replicable builds across machines in CI.
There are others who'd like to replicate their builds: application developers who don't want to risk getting bug reports for problems that turn out to be caused by upstream updates.
> You should not use lockfiles as a "backup" to pyproject.toml. The version constraints in pyproject.toml should be correct. If you need to restrict to a single specific version, do so, "== 2.2.9" works fine.
In principle, if you need a lockfile, you aren't distributing a library package anyway. But the Python ecosystem is still geared around the idea that "applications" would be distributed the same way as libraries - as wheels on PyPI, which get set up in an environment, using the entry points specified in `pyproject.toml` to create executable wrappers. Pipx implements this (and rejects installation when no entry points are defined); but the installation will still ignore any `requirements.txt` file (again, the filename is arbitrary; but also, Pipx is delegating to Pip's ordinary library installation process, not passing `-r`).
You can pin every version in `pyproject.toml`. Your transitive dependencies still won't be pinned that way. You can explicitly pin those, if you've done the resolution. You still won't have hashes or any other supply-chain info in `pyproject.toml`, because there's nowhere to put it. (Previous suggestions of including actual lockfile data in `pyproject.toml` have been strongly rejected - IIRC, Hatch developer Ofek Lev was especially opposed to this.)
Perhaps in the post-PEP 751 future, this could change. PEP 751 specifies both a standard lockfile format (with all the sorts of metadata that various tools might want) and a standard filename (or at least filename pattern). A future version of Pipx could treat `pylock.toml` as the "compiled" version of the "source" dependencies in `pyproject.toml`, much like Pip (and other installers) treat `PKG-INFO` (in an sdist, or `METADATA` in a wheel) as the "compiled" version (dependency resolution notwithstanding!) of other metadata.
Just two reasons (there are more): 1) uv is vastly faster than pip. Just using uv pip install -r requirements.txt and nothing else is a win. 2) uv can handle things like downloading the correct python person, creating a venv (or activating an existing venv if one exists) and essentially all the other cognitive load in a way that's completely transparent to the user. It means you can give someone a Python project and a single command to run it, and you can have confidence it will work on regardless of the platform or a dozen other little variables that trip people up.
I create a .venv directory for each project(even for those test projects named pytest, djangotest). And each project has its own requirements file. Personally, Python packaging has never been a problem.
What do you do when you accidentally run pip install -r requirements.txt with the wrong .venv activated?
If your answer is "delete the venv and recreate it", what do you do when your code now has a bunch of errors it didn't have before?
If your answer is "ignore it", what do you do when you try to run the project on a new system and find half the imports are missing?
None of these problems are insurmountable of course. But they're niggling irritations. And of course they become a lot harder when you try to work with someone else's project, or come back to a project from a couple of years ago and find it doesn't work.
>What do you do when you accidentally run pip install -r requirements.txt with the wrong .venv activated?
As someone with a similar approach (not using requirements.txt, but using all the basic tools and not using any kind of workflow tool or sophisticated package manager), I don't understand the question. I just have a workflow where this isn't feasible.
Why would the wrong venv be activated?
I activate a venv according to the project I'm currently working on. If the venv for my current code isn't active, it's because nothing is active. And I use my one global Pip through a wrapper, which (politely and tersely) bonks me if I don't have a virtual environment active. (Other users could rely on the distro bonking them, assuming Python>=3.11. But my global Pip is actually the Pipx-vendored one, so I protect myself from installing into its environment.)
You might as well be asking Poetry or uv users: "what do you do when you 'accidentally' manually copy another project's pyproject.toml over the current one and then try to update?" I'm pretty sure they won't be able to protect you from that.
>If your answer is "delete the venv and recreate it", what do you do when your code now has a bunch of errors it didn't have before?
If it did somehow happen, that would be the approach - but the code simply wouldn't have those errors. Because that venv has its own up-to-date listing of requirements; so when I recreated the venv, it would naturally just contain what it needs to. If the listing were somehow out of date, I would have to fix that anyway, and this would be a prompt to do so. Do tools like Poetry and uv scan my source code and somehow figure out what dependencies (and versions) I need? If not, I'm not any further behind here.
>And of course they become a lot harder when you try to work with someone else's project, or come back to a project from a couple of years ago and find it doesn't work.
I spent this morning exploring ways to install Pip 0.2 in a Python 2.7 virtual environment, "cleanly" (i.e. without directly editing/moving/copying stuff) starting from scratch with system Python 3.12. (It can't be done directly, for a variety of reasons; the simplest approach is to let a specific version of `virtualenv` make the environment with an "up-to-date" 20.3.4 Pip bootstrap, and then have that Pip downgrade itself.)
I can deal with someone else's (or past me's) requirements.txt being a little wonky.
Yeah, this is where I've been for a while. Maybe it helps that I don't do any ML work with lots of C or Fortran libraries that depend on exact versions of Python or whatever. But for just writing an application in Python, venv and pip are fine. I'll probably still try uv eventually if everyone really decides they're adopting it, but I won't rush.
if you come back to a project you haven't worked on for a year or two, you'll end up with new versions of dependencies that don't work with your code or environment any more.
you can solve this with constraints, pip-tools etc., but the argument is uv does this better
I write internal tools using Python at work. These tools are often used by non-Python devs. I am so, so tired of adding a blurb on creating a venv.
(Of course, the alternative—"install this software you've never heard of"—isn't fantastic either. But once they do have it, it'd be pretty neat to be able to tell them to just "uvx <whatever>".)
Is it not the same blurb every time that you could copy and paste?
Or you can make sure you have an entry point - probably a better UX for your coworkers anyway - and run them through a `pipx` install.
Or you could supply your own Bash script or whatever.
Or since you could use a simple packager like pex (https://docs.pex-tool.org/). (That one even allows you to embed a Python executable, if you need to and if you don't have to worry about different platforms.) Maybe even the standard library `zipapp` works for your needs.
The problem I have with uv is that it is not opinionated enough, or complete enough. It still needs a backend for building the package, and you still have a choice of backends.
In other words, it is a nice frontend to hide the mess that is the Python packaging ecosystem, but the mess of an ecosystem is still there, and you still have to deal with it. You'll still have to go through hatchling's docs to figure out how to do x/y/z. You'll still have to switch from hatchling to flit/pdm/setuptools/... if you run into a limitation of hatchling. As a package author, you're never using uv, you're using uv+hatchling (or uv+something) and a big part of your pyproject.toml are not uv's configuration, it is hatchling configuration.
I'm sticking with Poetry for now, which has a more streamlined workflow. Things work together. Every Poetry project uses the same configuration syntax (there are no Poetry+X and Poetry+Y projects). Issues in Poetry can be fixed by Poetry rather than having to work with the backend.
I understand that uv is still young and I am sure this will improve. Maybe they'll even pick a specific backend and put a halt to this. But of course Poetry might catch up before then.
> I had a friend who decided to not use uv, because the first time he used it, it was on a 15 years old codebase that had just been migrated to Python 3. It was standing on a pile of never cleaned up pip freeze exports, and uv could not make it work.
This is my only gripe with uv, despite how the author decided to depict it, this really turns into a headache fast as soon as you have ~4-5 in-house packages.
I don't think it's that bad that uv is so unforgiving in those case because it leads to better overall project quality/cohesion, but I wish there was a way to more progressively onboard and downgrade minor version mismatch to warnings.
Did they run into a hard blocker, or was it just that using version overrides was possible but painful? I started looking seriously at uv/pdm once poetry made it entirely clear they didn't intend to support version overrides [1]. uv's support for overrides seems serviceable if unsophisticated [2][3].
The linked poetry Issue is pretty understandable why they aren't going to support it. I've honestly never heard of any dependency resolver that allows you to dynamically inject an override of a package's built in specification for an indirect dependency.
Point blank, that's a packaging failure and the solution is, and always has been, to immediately yank the offending package.
That said, the Python case has pretty limited ability to specify dependency package versions, which makes it nigh impossible to handle it downstream by blocklisting specific versions from an otherwise contiguous range.
Take for example the werkzeug package that released a breaking API regression in a patch release version. It didn't affect everyone, but notably did affect certain Flask(?) use cases that used werkzeug as a dependency. In a sane system, either werkzeug immediately removes the last released version as buggy (and optionally re-releases it as a non-backwards compatible SemVer change), or everyone starts looks for an alternative to non-compliant werkzeug. Pragmatically though, Python dependency specification syntax should have a way for Flask to specify, in a patch release of its own, that werkzeug up to the next minor version, _but excluding a specific range of patch versions_, is a dependency. Allowing them to monkey patch the problem in the short term.
It should never be on the end user to be specifying overrides of indirect dependency specifications at the top level though, which is what was requested from the poetry tool.
I've honestly never heard of any dependency resolver that allows you to dynamically inject an override of a package's built in specification for an indirect dependency.
npm and yarn both let you do it. PDM and uv think about it differently, but both allow overrides.
It should never be on the end user to be specifying overrides of indirect dependency specifications at the top level though, which is what was requested from the poetry tool.
I'm jealous of your upstreams. I just want to use Django package XYZ that says it's only compatible with Django 3.X on Django 4. Works just fine, but poetry won't let it happen. Upstream seems like they might literally be dead in some cases, with an unmerged unanswered PR open for years. In other cases a PR was merged but no new PyPI release was ever made because I allowed for more liberal requirements for a 0.7.X release last made in 2019 and they're on version 4.X or whatever these days.
On one decade old application I have a half dozen forks of old packages with only alterations to dependency specifications specifically to please poetry. It's really annoying as opposed to just being able to say "I know better than what this package says" like in npm and yarn.
This is exactly what half the comments in poetry's "please allow overrides" issue are saying.
Package com.foo.Something pins a dependency on crap.bollocks.SomethingElse v1.1.0?
But I want to use crap.bollocks.SomethingElse v1.1.5? And I know that they're compatible?
Then I can configure a dependency exclusion.
I really really miss this feature in every non-JVM build tool.
It's another one of those things that the JVM ecosystem did right that everyone else forgot to copy.
(The other massive one being packages having verifiable namespaces. Can't really typosquat Guava because it's namespace is com.google and they can prove it)
> I've honestly never heard of any dependency resolver that allows you to dynamically inject an override of a package's built in specification for an indirect dependency.
You can do it with [patch] in cargo (I think), or .exclude in SBT. In Maven you can use <dependencyManagement>. In fact I can't think of a package manager that doesn't support it, it's something I'd always expect to be possible.
> Point blank, that's a packaging failure and the solution is, and always has been, to immediately yank the offending package.
Be that as it may, PyPi won't.
> It should never be on the end user to be specifying overrides of indirect dependency specifications at the top level though
It "shouldn't", but sometimes the user will find themselves in that situation. The only choice is whether you give them the tools to work around it or you don't.
By default pip will give you a warning that package X requires package Y ==1.0 but you have 1.1 which is incompatible instead of just failing. That's the feature that I'd like to have, basically a "--just-warn" flag or a way to set a package as "version does not matter".
> What’s a typical or better way of handling in-house packages?
Fixing your dependencies properly, but on some older codebases that pull also old dependencies this be a headache.
For example "Pillow" a Python image library is a dependency in just about everything that manipulates images. This means that one package might have >=9.6<9.7, some package will have ==9.8 and another will have >=10<11. In practice it never matters and any of those version would work but you have a "version deadlock" and now you need to bump the version in packages that you may not actually own. Having some override of "this project uses Pillow==10, if some package ask for something else, ignore it" is something that pip does that uv doesn't.
> You don't even need to know there is a venv, or what activation means.
> All those commands update the lock file automatically and transparently.... It's all taken care of.
When is the python community going to realize that simple is the opposite of easy? I don't see how hiding these aspects is desirable at all; I want to know how my programming tools work!
With all due respect to the author, I don't like the assumption that all programmers want magic tools that hide everything under the rug. Some programmers still prefer simplicity, ie understanding exactly what every part of the system does.
Nothing against uv, it seems like a fine tool. And I'm sure one could make a case for it on other technical merits. But choosing it specifically to avoid critical thinking is self-defeating.
There is simplicity of interface and then of implementation. If you try uv you will find it is both convenient and easier to understand than competing solutions, because everything _just works_ and you find the proof of how it works waiting in your `git status`. You can be assisted in knowing that it just works because its install takes no time and no system setup. It is slick.
I think that when you, as most HN commenters are wont to do, are trying to achieve a certain level of mastery in your craft, you're going to want to dive deep, understand the abstractions, have full control, be able to start with nothing but a UDP socket and a steady hand, but that approach misses a huge number of users of these languages. You don't want scientists to have to worry about setting up a venv, you want them to analyze their data and move on with their lives. Sure, people like you or me will be able to setup and source a venv in no time, without expending much mental energy, but we're not who this product is for. It's for the rest of the users, the 99% who aren't even aware places like this exist.
All that said, I'm pretty skeptical of using uv until their monetization strategy is clear. The current setup is making me think we're in for a Docker-like license change.
uv may be an improvement, but the Python packaging hell is a cultural problem that will not be solved without changing culture. And the main cultural issue is: 1. Depending on small and huge packages for trivial things. 2. A culture of breaking API compatibility. The two things combined create the mess we see.
This is not packaging hell, this is you having a personal problem with how other developers work.
Python has come an immensely long way in the world of packaging, the modern era of PEP 517/518 and the tooling that has come along with it is a game changer. There are very few language communities as old as Python with packaging ecosystems this healthy.
I've had conversations with members of SG15, the C++ tooling subgroup, where Python's packaging ecosystem and interfaces are looked on enviously as systems to steal ideas from.
I am a casual python user, and for that I love uv. Something I haven't quite figured out yet is integration with the pyright lsp - when I edit random projects in neovim, any imports have red squiggles. Does anyone know of a good way to resolve imports for the lsp via uv?
I start a shell with "uv run bash" and start neovim from there. I'm sure there's other ways but it's a quick fix and doesn't involve mucking around with neovim config.
I really like uv and I have successfully got rid of miniconda but :
- I wish there was a global virtual environment which could be referenced and activated from terminal. Not every new scripts needs their own .venv in their respective folder. uv takes the route of being project centered and based on file system, this works for me most of the time but sometime it doesn't.
- I wish we could avoid the .python_version file and bundle it in the pyproject.toml file.
I've been using Hermit to install uv, then pointing scripts at $REPO_ROOT/bin/uv. That gives you a repo where the scripts can be run directly after cloning (Hermit is smart enough to install itself if necessary).
Unfortunately, Hermit doesn't do Windows, although I'm pretty sure that's because the devs don't have Windows machines: PRs welcome.
Is there a conda to uv migration tutorial written by anyone?
I have installed miniconda system-wide. For any Python package that I use a lot, I install them on base environment. And on other environments. Like ipython.
For every new project, I create a conda environment, and install everything in it. Upon finishing/writing my patch, I remove that environment and clean the caches. For my own projects, I create an environment.yaml and move on.
Everything works just fine. Now, the solving with mamba is fast. I can just hand someone the code and environment.yaml, and it runs on other platforms.
Can someone say why using uv is a good idea? Has anyone written a migration guide for such use cases?
I am mightily impressed by one line dependency declaration in a file. But I don't know (yet) where the caches are stored, how to get rid of them later, etc.
It does replace pipenv. They might not mention it because it's not widely used these days; most pipenv users switched to poetry sometime over the last half-decade or so.
There is a request for `uv shell` or similar[0], but it's trickier than it looks, and even poetry gave up `poetry shell` in their recent 2.0 release.
Use direnv for automatically creating, activating, and deactivating your virtual environments. I don't know how anyone lives any other way. Just put this in a .envrc file (substituting whichever python version you are using):
While working in Django projects, one would prefer to have an environment activarted to perform all kinds of django-admin commands, I certainly wouldn't want to do via `uv run`.
Also, `nvim` is started with an environment activated if you want all the LSP goodies.
`uv run` is good for some things, but I prefer to have my venv activated as well.
But still, it's not good enough for Django as there are too many management commands and I don't want to configure them in pyproject.toml file, especially since some of them take additional arguments... There is no point in using anything but django-admin command (I do have a wrapper around it, but the point remains) and that requires activated venv.
How exactly does `uv` determine which is "the" venv? Is it simply based on the working directory like the `direnv`/`autoenv`/etc. workflows others are describing here?
It does seem like people have use cases for running code in a different environment vs. the one being actively used to develop the package.
Honest question: is uv more reproducible/portable than cramming your Python project into a Docker container? I've used pyenv, pip, venv, and a couple of other things, and they all work fine, at first, in simple scenarios.
Docker isn't a project manager, I'm struggling to see the comparison. If you have an app (api/web etc) you would use uv to manage dependencies, lock files and a locam virtual environment for development, and then you could install the same dependencies and the project in a docker image also for deployment.
People certainly use Docker for this purpose. Need a new package? Add a pip install line and rebuild the image.
I agree it isn’t the best use of Docker, but with the hell that is conda (and I say this as someone who likes conda more than most other options) and what can feel like insanity managing python environments, Docker isn’t the worst solution.
All that said, I moved to uv last year and have been loving it.
Yeah, I also used Docker (actually, Podman) as an alternative Python package manager and it worked well enough. Most of all, it felt somewhat cleaner and more reproducible than using plain virtualenv.
> Honest question: is uv more reproducible/portable than cramming your Python project into a Docker container?
Yes (unless you use uv in your Dockerfile). I mean, a Docker container will freeze one set of dependencies, but as soon as you change one dependency you've got to run your Dockerfile again and will end up with completely different versions of all your transitive dependencies.
even if you go through the hassle of using docker for your local dev environment, you still need something to install dependencies in a reproducible way when you rebuild the image.
I can't speak to uv from an app dev perspective (I use Go for that), but as someone who dips into the Python world for data science reasons only, uv is great and I'm thankful for it.
Just going to plug https://mise.jdx.dev as a perfect accompaniment to uv. It simplifies installing tooling across languages and projects. I even install uv via mise, and it uses uv under the hood for Python related things.
Adding to the article (which I agree with): Lack of `uv pip install --user` has made transitioning our existing python environment a bit more challenging than I'd like, but not a deal breaker.
Out of curiosity, how does `--user` fall in your use case? It got me confused because this flag makes it install to a central location within the user home directory and not to a virtual environment.
Now we should just figure out why to stop here. Why not write everything in Rust? Recently I have moved all my projects to Rust from Python and never looked back. Of course we need projects like Torch and we are not yet there, but those simpler projects that do not require GPU libraries Rust is great.
"It's written in Rust" is not responsible for most of the improvements on offer here. (TFA barely mentions Rust, to its credit.) Many of them come from algorithmic improvements, better design decisions, and simply just not having the tool reside in the same environment as the installation target. (It is perfectly possible to use Pip cross-environment like this, too. People just don't do it, because a) they don't know and b) the standard library `venv` and `ensurepip` tools are designed to bootstrap Pip into new virtual environments by default. My recent blog post https://zahlman.github.io/posts/2025/01/07/python-packaging-... offers relevant advice here, and upcoming posts are in the works and/or planned about design issues in Pip.)
If your purpose is to denigrate Python as a language, then uv isn't solving problems for you anyway. But I will say that the kind of evangelism you're doing here is counterproductive, and is the exact sort of thing I'd point to when trying to explain why the project of integrating Rust code into the Linux kernel has been so tumultuous.
I don't suppose it could be as fast as uv is, but it could be much closer to that than where it is now.
One immediate speed-up that requires no code changes: when uv creates a venv, it doesn't have to install Pip in that venv. You can trivially pass `--without-pip` to the standard library venv to do this manually. On my system:
$ time uv venv uv-test
Using CPython 3.12.3 interpreter at: /usr/bin/python
Creating virtual environment at: uv-test
Activate with: source uv-test/bin/activate
real 0m0.106s
user 0m0.046s
sys 0m0.021s
$ time python -m venv --without-pip venv-test
real 0m0.053s
user 0m0.044s
sys 0m0.009s
For comparison:
$ time python -m venv venv-test
real 0m3.308s
user 0m3.031s
sys 0m0.234s
(which is around twice as long as Pip actually takes to install itself; I plan to investigate this in more detail for a future blog post.)
To install in this environment, I use a globally installed pip (actually the one vendored by pipx), simply passing the `--python` argument to tell it which venv to install into. I have a few simple wrappers around this; see https://zahlman.github.io/posts/2025/01/07/python-packaging-... for details.
In my own project, Paper, I see the potential for many immediate wins. In particular, Pip's caching strategy is atrocious. It's only intended to avoid the cost of actually hitting the Internet, and basically simulates an Internet connection to its own file-database cache in order to reuse code paths. Every time it installs from this cache, it has to parse some saved HTTP-session artifacts to get the actual wheel file, unpack the wheel into the new environment, generate script wrappers etc. (It also eagerly pre-compiles everything to .pyc files in the install directory, which really isn't necessary a lot of the time.) Whereas it could just take an existing unpacked cache and hard-link everything into the new environment.
If you check the TCO than probably it pays off. I am not sure about how much harder it is. I do not use lifetimes and I clone a lot. Still the performance and the reliability a Rust project has vs. Python is insane.
That's the thing: you don't actually need Rust. uv has simply chosen it as an implementation language, but there isn't anything about Python that inherently prevents it from being used to write tools better than Pip. The problems with Pip are problems with Pip, not problems with Python.
(And many of them are completely fake, anyway. You don't actually need to spend an extra 15MB of space, and however many seconds of creation time, on a separate copy of Pip in each venv just so that Pip can install into that venv. You just need the `--python` flag. Which is a hack, but an effective one.)
(Last I checked, the uv compiled binary is something like 35MB. So sticking with a properly maintained Pip cuts down on that. And Pip is horrendously bloated, as Python code goes, especially if you only have the common use cases.)
Interesting. So pip install times did not make them to kill each other and the result is sometimes works, but if we wait on cargo build somehow it triggers them.
I tried out uv a bit ago and dropped it. But about two weeks ago, I switched to it and migrated two projects with no issues.
Things like pypi sources per dep are there finally.
I still find rough points (as many others pointed out, especially with non sandboxed installs), that are problematic, but on the whole it’s better than Mamba for my use.
Is uv better than micromamba? I tried using uv once and got some big ugly error I don't remember, and that was the end of that, whereas mm just worked (perhaps due to my familiarity). It was a project with the usual situation, i.e., torch, numpy, cuda support, nvcc, all had to play nicely together and satisfy requirements.txt.
It just hit its 12 month birthday a few days ago and has evolved a LOT on those past 12 months. One of the problems I ran into with it was patched out within days of me first hitting it. https://simonwillison.net/2024/Nov/8/uv/
Just a few weeks ago. Sadly I can't remember the specifics but meta-management toolchains are always a hard sell if what you currently have works and is Good Enough. mm is quite fast compared to anaconda, though not perfect. Maybe I'd also benefit from uv having an environment import feature, since I have like 30 *conda environments by now.
I used uv for the first time a few months ago and it was a total revelation (previously I used venv and pip). I would be happy if pip itself and conda went away (although that won't happen).
Really like uv too but surprised he doesn’t mention the lack of conda compliance. Some scientific packages only being available on conda is the only reason I can’t use uv (but micromamba) for some projects.
conda compiance is nearly impossible to get because the typical anaconda project doesn't exist: it's a separate ecosystem with huge variability that is by design incompatible with everything else and that no two teams use in the same way.
Rye is softly being sunset in favor of `uv` (though still officially supported, and I haven't heard of any plans to change that). As it says on https://rye.astral.sh/,
> If you're getting started with Rye, consider uv, the successor project from the same maintainers.
> While Rye is actively maintained, uv offers a more stable and feature-complete experience, and is the recommended choice for new projects.
uv has been fantastic for most of my personal projects. It feels so much smoother than any other python tooling combo I tried in past. That said, it just does not work well behind corporate proxies. This is single most annoying thing that has stopped me from recommending it at work.
Works fine. But for people on Github for now I recommend using uv only for building distributions and using the official PyPA GitHub action for publishing them to PyPI. This way you can take advantage of attestation, something not yet supported by uv.
How are they handling the returns VCs expect through this free software? If it's so easy to deploy, surely we should expect a Docker-like licensing model in the near future?
All else equal, I prefer julia to python, but uv makes the python experience so much nicer. I'd love it if julia copied uv and replaced the Pkg system with it.
Yes. The main benefits for me and my coworkers are speed and automatic fetching of Python interpreter. It's so fast that it's a pleasure to use (same thing could be said about ruff vs. black). And the fact that it downloads the right Python interpreter specified in the project's pyproject.toml means my coworker don't have to care about installing and managing Python versions. They don't even need to have a Python installed in the first place.
Otherwise, it works pretty much like Poetry. Unfortunately Poetry is not standards-compliant with the pyproject.toml, so you'll have to rewrite it. There are tools for this, never bothered with them though.
In addition to all the benefits others have mentioned, biggest benefit for me was that with uv you can easily bring your own build system (scikit-build-core in my case). Poetry comes with its own build system that didn't work for my needs and working around that was a massive pain and quite fragile. With uv I can use the build system that works best for me. On the whole uv is less opinionated than poetry. By default it will do sensible things, but if that thing doesn't work in your particular weird case, it is much easier to make uv work for you than poetry. Poetry gets very angry if you try to hold it wrong.
I can't find any way to get that working in uv without some pretty major refactoring of my internal structure and import declarations. Maybe I've accidentally cornered myself in a terrible and ill-advised structure?
Assuming you have src/example.py with a function called hello, then "uv run hello" will call that function. I think you also need to have a (empty) src/__init__.py file.
I just reviewed uv for my team and there is one more reason against it, which isn't negligible for production-grade projects: Github Dependabot doesn't handle (yet) uv lock file. Supply chain management and vulnerability detection is such an important thing that it prevents the use of uv until this is resolved (the open github issue mentions the first quarter of 2025!)
Arnim Ronacher, author of rye (later uv) has very clearly highlighted that exact xkcd when he started working on rye. But he still decided that it was worth a try and as it turns out, rye/uv has become something that has a realistic chance of becoming the way to use python for most use-cases.
Can anyone explain to a non-python developer why python infrastructure is so much broken around the version management?
It looks to me that every new minor python release is a separate additional install because realistically you cannot replace python 3.11 with python 3.12 and expect things to work. How did they put themselves in such a mess?
Python code rarely breaks between minor version releases in my experience. Python 3.5 introduced the `async` keyword (PEP 492), and Python 3.7 changed `StopIteration` handling in generators (PEP 479).
I have seen references to using uv for Python package management before and been thoroughly confused. I never realized it was not the same thing as the very nice asynchronous cross-platform library libuv (https://libuv.org/) and I could never figure out what that library had to do with Python package management (answer: nothing).
Maybe we need a Geographic Names Board to deconflict open source project names, or at least the ones that are only two or three characters long.
A very well written article! I admire the analysis done by the author regarding the difficulties of Python packaging.
With the advent of uv, I'm finally feeling like Python packaging is solved. As mentioned in the article, being able to have inline dependencies in a single-file Python script and running it naturally is just beautiful.
After being used to this workflow, I have been thinking that a dedicated syntax for inline dependencies would be great, similar to JavaScript's `import ObjectName from 'module-name';` syntax. Python promoted type hints from comment-based to syntax-based, so a similar approach seems feasible.> It used to be that either you avoided dependencies in small Python script, or you had some cumbersome workaround to make them work for you. Personally, I used to manage a gigantic venv just for my local scripts, which I had to kill and clean every year.
I had the same fear for adding dependencies, and did exactly the same thing.
> This is the kind of thing that changes completely how you work. I used to have one big test venv that I destroyed regularly. I used to avoid testing some stuff because it would be too cumbersome. I used to avoid some tooling or pay the price for using them because they were so big or not useful enough to justify the setup. And so on, and so on.
I 100% sympathize with this.
One other key part of this is freezing a timestamp with your dependency list, because Python packages are absolutely terrible at maintaining compatibility a year or three or five later as PyPI populates with newer and newer versions. The special toml incantation is [tool.uv] exclude-newer:
https://docs.astral.sh/uv/guides/scripts/#improving-reproduc...This has also let me easily reconstruct some older environments in less than a minute, when I've been version hunting for 30-60 minutes in the past. The speed of uv environment building helps a ton too.
Maybe I'm missing something, but why wouldn't you just pin to an exact version of `requests` (or whatever) instead? I think that would be equivalent in practice to limiting resolutions by release date, except that it would express your intent directly ("resolve these known working things") rather than indirectly ("resolve things from when I know they worked").
Pinning deps is a good thing, but it won't necessarily solve the issue of transitive dependencies (ie: the dependencies of requests itself for example), which will not be pinned themselves, given you don't have a lock file.
To be clear, a lock file is strictly the better option—but for single file scripts it's a bit overkill.
1 file, 2 files, N files, why does it matter how many files?
Use a lock file if you want transitive dependencies pinned.
I can't think of any other language where "I want my script to use dependencies from the Internet, pinned to precise versions" is a thing.
If there's a language that does this right, all ears? But I havn't seen it -
The use case described is for a small one off script for use in CI, or a single file script you send off to a colleague over Slack. Very, very common scenario for many of us. If your script depends on
You can pin versions of those direct dependencies like "a" and "b" easy enough, but 2 years later you may not get the same version of "c", unless the authors of "a" and "b" handle their dependency constraints perfectly. In practice that's really hard and never happens.The timestamp appraoch described above isn't perfect, but would result in the same dep graph, and results, 99% of the time..
Try Scala with an Ammonite script like https://ammonite.io/#ScalaScripts . The JVM ecosystem does dependencies right, there's no need to "pin" in the first place because dependency resolution is deterministic to start with. (Upgrading to e.g. all newer patch versions of your current dependencies is easy, but you have to make an explicit action to do so, it will never happen "magically")
Rust tends to handle this well. It'll share c if possible, or split dependencies. Cargo.lock preserves exact resolution
> 1 file, 2 files, N files, why does it matter how many files?
One file is better for sharing than N, you can post it in a messenger program like Slack and easily copy-and-paste (while this becomes annoying with more than one file), or upload this somewhere without needing to compress, etc.
> I can't think of any other language where "I want my script to use dependencies from the Internet, pinned to precise versions" is a thing.
This is the same issue you would have in any other programming language. If it is fine for possibly having breakage in the future you don't need to do it, but I can understand the use case for it.
> why does it matter how many files?
Because this is for scripts in ~/bin, not projects.
They need to be self-contained.
I think it's a general principle across all of software engineering that, when given the choice, fewer disparate locations in the codebase need to have correlated changes.
Documentation is hard enough, and that's often right there at exactly the same location.
For N scripts, you will need N lock files littering your directories and then need venvs for all of them.
Sometimes, the lock files can be larger than the scripts themselves...
I'm not a python packaging expert or anything but an issue I run into with lock files is they can become machine dependent (for example different flavors of torch on some machines vs others).
Oh yeah, I completely forgot about transitive dependencies. That makes perfect sense, then! Very thoughtful design/inclusion from `uv`.
One could indicate implicit time-based pinning of transitive dependencies, using the time point at which the dependended-on versions were released. Not a perfect solution, but it's a possible approach.
isn't that quite exactly what the above does?
I think OP was saying to look at when the package was build instead of explicitly adding a timestamp. Of course, this would only work if you speficied `requests@1.2.3` instead of just `requests`.
This looks like a good strategy, but I wouldn't want it by default since it would be very weird to suddenly having a script pull dependencies from 1999 without explanation why.
Except at least for the initial run, the date-based approach is the one closer to my intent, as I don't know what specific versions I need, just that this script used to work around specific date.
Oh that's neat!
I've just gotten into the habit of using only the dependencies I really must, because python culture around compatibility is so awful
This is the feature I would most like added to rust, if you don’t save a lock file it is horrible trying to get back to the same versions of packages.
Why wouldn't you save the lock file?
Well, of course you should, but it’s easy to forget as it’s not required. It also used to be recommended to not save it, so some people put it in their gitignore.
For example, here is a post saying it was previously recommended to not save it for libraries: https://blog.rust-lang.org/2023/08/29/committing-lockfiles.h...
Gosh, thanks for sharing! This is the remaining piece I felt I was missing.
For completeness, there's also a script.py.lock file that can be checked into version controls but then you have twice as many files to maintain, and potentially lose sync as people forget about it or don't know what to do with it.
Wow, this is such an insanely useful tip. Thanks!
Why didn't you create a lock file with the versions and of course hashsums in it? No version hunting needed.
Because the aim is to have a single file, fairly short, script. Even if we glued the lock file in somehow, it would be huge!
I prefer this myself, as almost all lock files are in practice “the version of packages at this time and date”, so why not be explicit about that?
A major part of the point of PEP 723 (and the original competing design in PEP 722) is that the information a) is contained in the same physical file and b) can be produced by less sophisticated users.
That's fantastic, that's exactly what I need to revive a bit-rotten python project I am working with.
Oooh! Do you end up doing a binary search by hand and/or does uv provide tools for that?
Where would binary search come into it? In the example, the version solver just sees the world as though no versions released after `2023-10-16T00:00:00Z` existed.
I mean a binary search or a bisect over dates.
My feeling sadly is because uv is the new thing, it hasn't had to handle anything but the common cases. This kinda gets a mention in the article, but is very much glossed over. There are still some sharp edges, and assumptions which aren't true in general (but are for the easy cases), and this only going to make things worse, because now there's a new set of issues people run into.
As an example of an edge case - you have Python dependencies that wrap C libs that come in x86-64 flavour and arm-64.
Pipenv, when you create a lockfile, will only specify the architecture specific lib that your machine runs on.
So if you're developing on an ARM Macbook, but deploying on an Ubuntu x86-64 box, the Pipenv lockfile will break.
Whereas a Poetry lockfile will work fine.
And I've not found any documentation about how uv handles this, is it the Pipenv way or the Poetry way?
PEP 751 is defining a new lockfile standard for the ecosystem, and tools including uv look committed to collaborating on the design and implementing whatever results. From what I've been able to tell of the surrounding discussion, the standard is intended to address this use case - rather, to be powerful enough that tools can express the necessary per-architecture locking.
The point of the PEP 723 comment style in the OP is that it's human-writable with relatively little thought. Cases like yours are always going to require actually doing the package resolution ahead of time, which isn't feasible by hand. So a separate lock file is necessary if you want resolved dependencies.
If you use this kind of inline script metadata and just specify the Python dependency version, the resolution process is deferred. So you won't have the same kind of control as the script author, but instead the user's tooling can automatically do what's needed for the user's machine. There's inherently a trade-off there.
Your reply is unrelated to my query - is a uv lockfile able to handle multiple arches like a Poetry lockfile?
uv works like poetry, rather than pipenv, in this regards.
Yeah uv uses a platform independent resolution for its lockfiles supports features that Poetry does not, like
- Specifying a subset of platforms to resolve for
- Requiring wheel coverage for specific platforms
- Conflicting optional dependencies
https://docs.astral.sh/uv/concepts/resolution/#universal-res...
https://docs.astral.sh/uv/concepts/projects/config/#conflict...
I think this is an awesome feature and will probably a great alternative to my use of nix to do similar things for scripts/python if nothing else because it's way less overhead to get it running and playing with something.
Nix for all it's benefits here can be quite slow and make it otherwise pretty annoying to use as a shebang in my experience versus just writing a package/derivation to add to your shell environment (i.e. it's already fully "built" and wrapped. but also requires a lot more ceremony + "switching" either the OS or HM configs).
It's not a feature that's exclusive to uv. It's a PEP, and other tools will eventually support it if they don't already.
Will nix be slow after the first run? I guess it will have to build the deps, but in a second run should be fast, no?
`nix-shell` (that is what the OP seems to be referring) is always slow-ish (not really that slow if you are used with e.g.: Java CLI commands, but definitely slower than I would like) because it doesn't cache evaluations AFAIK.
Flakes has caching but support for `nix shell` as shebang is relatively new (nix 2.19) and not widespread.
Agreed. I did the exact same thing with that giant script venv and it was a constant source of pain because some scripts would require conflicting dependencies. Now with uv shebang and metadata, it’s trivial.
Before uv I avoided writing any scripts that depended on ML altogether, which is now unlocked.
You know what we need? In both python and JS, and every other scripting language, we should be able to import packages from a url, but with a sha384 integrity check like exists in HTML. Not sure why they didn't adopt this into JS or Deno. Otherwise installing random scripts is a security risk
Python has fully-hashed requirements[1], which is what you'd use to assert the integrity of your dependencies. These work with both `pip` and `uv`. You can't use them to directly import the package, but that's more because "packages" aren't really part of Python's import machinery at all.
(Note that hashes themselves don't make "random scripts" not a security risk, since asserting the hash of malware doesn't make it not-malware. You still need to establish a trust relationship with the hash itself, which decomposes to the basic problem of trust and identity distribution.)
[1]: https://pip.pypa.io/en/stable/topics/secure-installs/
Good point, but it's still a very useful way to ensure it doesn't get swapped out underneath you.
Transitive dependencies are still a problem though. You kind of fall back to needing a lock file or specifying everything explicitly.
Right, still a security risk, but at least if I come back to a project after a year or two I can know that even if some malicious group took over a project, they at least didn't backport a crypto-miner or worse into my script.
The code that you obtain for a Python "package" does not have any inherent mapping to a "package" that you import in the code. The name overload is recognized as unfortunate; the documentation writing community has been promoting the terms "distribution package" and "import package" as a result.
https://packaging.python.org/en/latest/discussions/distribut...
https://zahlman.github.io/posts/2024/12/24/python-packaging-...
While you could of course put an actual Python code file at a URL, that wouldn't solve the problem for anything involving compiled extensions in C, Fortran etc. You can't feasibly support NumPy this way, for example.
That said, there are sufficient hooks in Numpy's `import` machinery that you can make `import foo` programmatically compute a URL (assuming that the name `foo` is enough information to determine the URL), download the code and create and import the necessary `module` object; and you can add this with appropriate priority to the standard set of strategies Python uses for importing modules. A full description of this process is out of scope for a HN comment, but relevant documentation:
https://docs.python.org/3/library/importlib.html
https://docs.python.org/3/library/sys.html#sys.meta_path
Deno and npm both store the hashes of all the dependencies you use in a lock file and verify them on future reinstalls.
The lockfile is good, but I'm talking about this inline dependency syntax,
And likewise, Deno can import by URL. Neither include an integrity hash. For JS, I'd suggest which mirrors https://developer.mozilla.org/en-US/docs/Web/Security/Subres... and https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...The Python/UV thing will have to come up with some syntax, I don't know what. Not sure if there's a precedent for attributes.
Where do you initially get the magical sha384 hash that proves the integrity of the package the first time it's imported?
Same way we do in JS-land: https://developer.mozilla.org/en-US/docs/Web/Security/Subres...
tl;dr use `openssl` on command-line to compute the hash.
Ideally, any package repositories ought to publish the hash for your convenience.
This of course does nothing to prove that the package is safe to use, just that it won't change out from under your nose.
This is a nice feature, but I've not found it to be useful, because my IDE wont recognize these dependencies.
Or is it a skill issue?
What exactly do you imagine that such "recognition" would entail? Are you expecting the IDE to provide its own package manager, for example?
Generally it means "my inspections and autocomplete works as expected".
It can't possibly autocomplete or inspect based off code it doesn't actually have.
That’s the point. Many modern IDEs are in fact capable of downloading their own copy off some source and praising it.
No, it's the fact that it's a rather new PEP, and our IDEs don't yet support it, because, rather new.
This looks horrible for anything but personal scripts/projects. For anything close to production purposes, this seems like a nightmare.
Anything that makes it easier to make a script that I wrote run on a colleagues machine without having to give them a 45 minute crash course of the current state of python environment setup and package management is a huge win in my book.
There's about 50 different versions of "production" for Python, and if this particular tool doesn't appear useful to it, you're probably using Python in a very very different way than those of us who find it useful. One of the great things about Python is that it can be used in such diverse ways by people with very very very different needs and use cases.
What does "production" look like in your environment, and why would this be terrible for it?
It's not meant for production.
Don’t use it in production, problem solved.
I find this feature amazing for one-off scripts. It’s removing a cognitive burden I was unconsciously ignoring.
> As mentioned in the article, being able to have inline dependencies in a single-file Python script and running it naturally is just beautiful.
The syntax for this (https://peps.python.org/pep-0723/) isn't uv's work, nor are they first to implement it (https://iscinumpy.dev/post/pep723/). A shebang line like this requires the tool to be installed first, of course; I've repeatedly heard about how people want tooling to be able to bootstrap the Python version, but somehow it's not any more of a problem for users to bootstrap the tooling themselves.
And some pessimism: packaging is still not seen as the core team's responsibility, and uv realistically won't enjoy even the level of special support that Pip has any time soon. As such, tutorials will continue to recommend Pip (along with inferior use patterns for it) for quite some time.
> I have been thinking that a dedicated syntax for inline dependencies would be great, similar to JavaScript's `import ObjectName from 'module-name';` syntax. Python promoted type hints from comment-based to syntax-based, so a similar approach seems feasible.
First off, Python did no such thing. Type annotations are one possible use for an annotation system that was added all the way back in 3.0 (https://peps.python.org/pep-3107/); the original design explicitly contemplated other uses for annotations besides type-checking. When it worked out that people were really only using them for type-checking, standard library support was added (https://peps.python.org/pep-0484/) and expanded upon (https://peps.python.org/pep-0526/ etc.); but this had nothing to do with any specific prior comment-based syntax (which individual tools had up until then had to devise for themselves).
Python doesn't have existing syntax to annotate import statements; it would have to be designed specifically for the purpose. It's not possible in general (as your example shows) to infer a PyPI name from the `import` name; but not only that, dependency names don't map one-to-one to imports (anything that you install from PyPI may validly define zero or more importable top-level names, and of course the code might directly use a sub-package or an attribute of some module (which doesn't even have to be a class). So there wouldn't be a clear place to put such names except in a separate block by themselves, which the existing comment syntax already does.
Finally, promoting the syntax to an actual part of the language doesn't seem to solve a problem. Using annotations instead of comments for types allows the type information to be discovered at runtime (e.g. through the `__annotations__` attribute of functions). What problem would it solve for packaging? It's already possible for tools to use a PEP 723 comment, and it's also possible (through the standard library - https://docs.python.org/3/library/importlib.metadata.html) to introspect the metadata of installed packages at runtime.
> I'm finally feeling like Python packaging is solved
Not really: https://github.com/astral-sh/uv/issues/5190
> by the author
And the author is?
Any flow that does not state checksums/hashsums is not ready for production and all but beautiful. But I haven't used uv yet, so maybe it is possible to specify the dependencies with hashsums in the same file too?
Actually the order of import statement is one of the things, that Python does better than JS. It makes completions much less costly to calculate when you type the code. An IDE or other tool only has to check one module or package for its contents, rather than whether any module has a binding of the name so and so. If I understand correctly, you are talking about an additional syntax though.
When mentioning a gigantic venv ... Why did they do that? Why not have smaller venvs for separate projects? It is really not that hard to do and avoids dependency conflicts between projects, which have nothing to do with each other. Using one giant venv is basically telling me that they either did not understand dependency conflicts, or did not care enough about their dependencies, so that one script can run with one set of dependencies one day, and another set of deps the other day, because a new project's deps have been added to the mix in the meantime.
Avoiding deps for small scripts is a good thing! If possible.
To me it just reads like a user now having a new tool allowing them to continue the lazy ways of not properly managing dependencies. I mean all deps in one huge venv? Who does that?? No wonder they had issues with that. Can't even keep deps separated, let alone properly having a lock file with checksums. Yeah no surprise they'll run into issues with that workflow.
And while we are relating to the JS world: One may complain in many ways about how NPM works, but it has had automatic lock file for aaages. Being the default tool in the ecosystem. And its competitors had it to. At least that part they got right for a long time, compared to pip, which does nothing of the sort eithout extra effort.
Why did they do that? Why not have smaller venvs for separate projects?
What's a 'project'? If you count every throw away data processing script and one off exploratory Jupyter notebook, that can easily be 100 projects. Certainly before uv, having one huge venv or conda environment with 'everything' installed made it much faster and easier to get that sort of work done.
In what kind of scope are these data processing scripts? In some kind of pipeline used in production I very much would expect them to have reproducible dependencies.
I can understand it for exploratory Jupyter Notebook. But only in the truly exploratory stage. Say for example you are writing a paper. Reproducibility crisis. Exploring is fine, but when it gets to actually writing the paper, one needs to make ones setup reproducible, or lose credibility right away. Most academics are not aware of, or don't know how to, or don't care to, make things reproducible, leading to non-reproducible research.
I would be lying, if I claimed, that I personally always set up a lock file with hashsums for every script. Of course there can be scripts and things we care so little about, that we don't make them reproducible.
For the (niche) Python library that I co-develop, we use this for demo scripts that live in an example/ directory in our repo. These scripts will never be run in production, but it’s nice to allow users to try them out and get a feel for how the library works before committing to installing dependencies and setting up a virtual environment.
In other words, of course, in most long-term cases, it’s better to create a real project - this is the main uv flow for a reason. But there’s value in being able to easily specify requirements for quick one-off scripts.
> Why not have smaller venvs for separate projects?
Because they are annoying and unnecessary additional work. If I write something, I won't know the dependencies in the beginning. And if it's a personal tool/script or even a throwaway one-shoot, then why bother with managing unnecessary parts? I just manage my personal stack of dependencies for my own tools in a giant env, and pull imports from it or not, depending on the moment. This allows me to move fast. Of course it is a liability, but not one which usually bites me. Every some years, some dependency goes wrong, and I either fix it or remove it, but at the end the benefit I save in time far outweighs the time I would lose from micromanaging small separate envs.
Managing dependencies is for production and important things. Big messy envs is good enough for everything else. I have hundred of script and tools, micromanaging them on that level has no benefit. And it seems uv now offers some options for making small envs effortless without costing much time, so it's a net benefit in that area, but it's not something world shattering which will turn my world upside down.
Well, if generating a lock file and installing dependencies is "unnecessary" then you obviously don't have any kind of need to be production ready project. Any project serious about managing its dependencies will mandate hashsums for each dependency, to avoid things having issues a week or a month later, without change to the project.
If you do have a project that needs to manage its dependencies well and you still don't store hashsums and use them to install your dependencies based on them, then basically you forfeit any credibility when complaining about things going wrong with regard to bugs happening or changed behavior of the code without changing the code itself and similar things.
This can be all fine, if it is just your personal project, that gets shit done. I am not saying you must properly manage dependencies for such a personal project. Just not something ready for production.
I for one find it quite easy to make venvs per project. I have my Makefiles, which I slightly adapt to the needs of the project and then I run 1 single command, and get all set up with dependencies in a project specific venv, hashsums, reproducibility. Not that much to it really, and not at all annoying to me. Also can be sourced from any other script when that script uses another project. Could also use any other task runner thingy, doesn't have to be GNU Make, if one doesn't like it.
>Any flow that does not state checksums/hashsums is not ready for production
It's not designed nor intended for such. There are tons of Python users out there who have no concept of what you would call "production"; they wrote something that requires NumPy to be installed and they want to communicate this as cleanly and simply (and machine-readably) as possible, so that they can give a single Python file to associates and have them be able to use it in an appropriate environment. It's explicitly designed for users who are not planning to package the code properly in a wheel and put it up on PyPI (or a private index) or anything like that.
>and all but beautiful
De gustibus non est disputandum. The point is to have something simple, human-writable and machine-readable, for those to whom it applies. If you need to make a wheel, make one. If you need a proper lock file, use one. Standardization for lock files is finally on the horizon in the ecosystem (https://peps.python.org/pep-0751/).
>Actually the order of import statement is one of the things, that Python does better than JS.
Import statements are effectively unrelated to package management. Each installed package ("distribution package") may validly define zero or more top-level names (of "import packages") which don't necessarily bear any relationship to each other, and the `import` syntax can validly import one or more sub-packages and/or attributes of a package or module (a false distinction, anyway; packages are modules), and rename them.
>An IDE or other tool only has to check one module or package for its contents
The `import` syntax serves these tools by telling them about names defined in installed code, yes. The PEP 723 syntax is completely unrelated: it tells different tools (package managers, environment managers and package installers) about names used for installing code.
>Why not have smaller venvs for separate projects? It is really not that hard to do
It isn't, but it introduces book-keeping (Which venv am I supposed to use for this project? Where is it? Did I put the right things in it already? Should I perhaps remove some stuff from it that I'm no longer using? What will other people need in a venv after I share my code with them?) that some people would prefer to delegate to other tooling.
Historically, creating venvs has been really slow. People have noticed that `uv` solves this problem, and come up with a variety of explanations, most of which are missing the mark. The biggest problem, at least on Linux, is the default expectation of bootstrapping Pip into the new venv; of course uv doesn't do this by default, because it's already there to install packages for you. (This workflow is equally possible with modern versions of Pip, but you have to know some tricks; I describe some of this in https://zahlman.github.io/posts/2025/01/07/python-packaging-... . And it doesn't solve other problems with Pip, of course.) Anyway, the point is that people will make single "sandbox" venvs because it's faster and easier to think about - until the first actual conflict occurs, or the first attempt to package a project and accurately convey its dependencies.
> Avoiding deps for small scripts is a good thing! If possible.
I'd like to agree, but that just isn't going to accommodate the entire existing communities of people writing random 100-line analysis scripts with Pandas.
>One may complain in many ways about how NPM works, but it has had automatic lock file for aaages.
Cool, but the issues with Python's packaging system are really not comparable to those of other modern languages. NPM isn't really retrofitted to JavaScript; it's retrofitted to the Node.JS environment, which existed for only months before NPM was introduced. Pip has to support all Python users, and Python is about 18 years older than Pip (19 years older than NPM). NPM was able to do this because Node was a new project that was being specifically designed to enable JavaScript development in a new environment (i.e., places that aren't the user's browser sandbox). By contrast, every time any incremental improvement has been introduced for Python packaging, there have been massive backwards-compatibility concerns. PyPI didn't stop accepting "egg" uploads until August 1 2023 (https://blog.pypi.org/posts/2023-06-26-deprecate-egg-uploads...), for example.
But more importantly, npm doesn't have to worry about extensions to JavaScript code written in arbitrary other languages (for Python, C is common, but by no means exclusive; NumPy is heavily dependent on Fortran, for example) which are expected to be compiled on the user's machine (through a process automatically orchestrated by the installer) with users complaining to anyone they can get to listen (with no attempt at debugging, nor at understanding whose fault the failure was this time) when it doesn't work.
There are many things wrong with the process, and I'm happy to criticize them (and explain them at length). But "everyone else can get this right" is usually a very short-sighted line of argument, even if it's true.
> It's not designed nor intended for such. There are tons of Python users out there who have no concept of what you would call "production"; they wrote something that requires NumPy to be installed and they want to communicate this as cleanly and simply (and machine-readably) as possible, so that they can give a single Python file to associates and have them be able to use it in an appropriate environment. It's explicitly designed for users who are not planning to package the code properly in a wheel and put it up on PyPI (or a private index) or anything like that.
Thus my warning about its use. And we as a part of the population need to learn and be educated about dependency management, so that we do not keep running into the same issues over and over again, that come through non-reproducible software.
> Import statements are effectively unrelated to package management. Each installed package ("distribution package") may validly define zero or more top-level names (of "import packages") which don't necessarily bear any relationship to each other, and the `import` syntax can validly import one or more sub-packages and/or attributes of a package or module (a false distinction, anyway; packages are modules), and rename them.
I did not claim them to be related to package management, and I agree. I was making an assertion, trying guess the meaning of what the other poster wrote about some "import bla from blub" statement.
> The `import` syntax serves these tools by telling them about names defined in installed code, yes. The PEP 723 syntax is completely unrelated: it tells different tools (package managers, environment managers and package installers) about names used for installing code.
If you had read my comment a bit more closely, you would have seen, that this is the assertion I made one phrase later.
> It isn't, but it introduces book-keeping (Which venv am I supposed to use for this project? Where is it? Did I put the right things in it already? Should I perhaps remove some stuff from it that I'm no longer using? What will other people need in a venv after I share my code with them?) that some people would prefer to delegate to other tooling.
I understand that. The issue is, that people keep complaining about things that can be solved in rather simple ways. For example:
> Which venv am I supposed to use for this project?
Well, the one in the directory of the project, of course.
> Where is it?
In the project directory of course.
> Did I put the right things in it already?
If it exists, it should have the dependencies installed. If you change the dependencies, then update the venv right away. You are always in a valid state this way. Simple.
> Should I perhaps remove some stuff from it that I'm no longer using?
That is done in the "update the venv" step mentioned above. Whether you delete the venv and re-create it, or have a dependency managing tool, that removes unused dependencies, I don't care, but you will know it, when you use such a tool. If you don't use such a tool, just recreate the venv. Nothing complicated so far.
> What will other people need in a venv after I share my code with them?
One does not share a venv itself, one shares the reproducible way to recreate it on another machine. Thus others will have just what you have, once they create the same venv. Reproducibility is key, if you want your code to run elsewhere reliably.
All of those have rather simple answers. I grant, some of these answers one learns over time, when dealing with these questions many times. However, none of it must be made difficult.
> I'd like to agree, but that just isn't going to accommodate the entire existing communities of people writing random 100-line analysis scripts with Pandas.
True, but those have apparently a need to have Pandas. Then it cannot be avoided to install dependencies. Then it depends on whether their stuff is one-off stuff, that no one will ever need to run again later, or part of some need to be reliable pipeline. The use-case changes the requirements with regard to reproducibility.
---
About the NPM - PIP comparison. Sure there may be differences. None of those however justify not having hashsums of dependencies where they can be had. And if there is a C thing? Well, you will still download that in some tarball or archive when you install it as a dependency. Easy to get a checksum of that. Store the checksum.
I was merely pointing out a basic facility of NPM, that is there for as long as I remember using NPM, that is still not existent with PIP, except for using some additional packages to facilitate it (I think hashtools or something like that was required). I am not holding up NPM as the shining star, that we all should follow. It has its own ugly corners. I was pointing out that specific aspect of dependency management. Any artifact downloaded from anywhere one can calculate the hashes of. There are no excuses for not having the hashes of artifacts.
That Pip is 19 years older than NPM doesn't have to be a negative. Those are 19 years more time to have worked on the issues as well. In those 19 years no one had issues with non-reproducible builds? I find that hard to believe. If anything the many people complaining about not being able to install some dependency in some scenario tell us, that reproducible builds are key, to avoid these issues.
>I did not claim them to be related to package management, and I agree.
Sure, but TFA is about installation, and I wanted to make sure we're all on the same page.
>I understand that. The issue is, that people keep complaining about things that can be solved in rather simple ways.
Can be. But there are many comparably simple ways, none of which is obvious. For example, using the most basic level of tooling, I put my venvs within a `.local` directory` which contains other things I don't want to put in my repo nor mention in .gitignore. Other workflow managers put them in an entirely separate directory and maintain their own mapping.
>Whether you delete the venv and re-create it, or have a dependency managing tool, that removes unused dependencies, I don't care, but you will know it, when you use such a tool.
Well, yes. That's the entire point. When people are accustomed to using a single venv, it's because they haven't previously seen the point of separating things out. When they realize the error of their ways, they may "prefer to delegate to other tooling", as I said. Because it represents a pretty radical change to their workflow.
> That Pip is 19 years older than NPM doesn't have to be a negative. Those are 19 years more time to have worked on the issues as well.
In those 19 years people worked out ways to use Python and share code that bear no resemblance to anything that people mean today when they use the term "ecosystem". And they will be very upset if they're forced to adapt. Reading the Packaging section of the Python Discourse forum (https://discuss.python.org/c/packaging/14) is enlightening in this regard.
> In those 19 years no one had issues with non-reproducible builds?
Of course they have. That's one of the reasons why uv is the N+1th competitor in its niche; why Conda exists; why meson-python (https://mesonbuild.com/meson-python/index.html) exists; why https://pypackaging-native.github.io/ exists; etc. Pip isn't in a position to solve these kinds of problems because of a) the core Python team's attitude towards packaging; b) Pip's intended and declared scope; and c) the sheer range of needs of the entire Python community. (Pip doesn't actually even do builds; it delegates to whichever build backend is declared in the project metadata, defaulting to Setuptools.)
But it sounds more like you're talking about lockfiles with hashes. In which case, please just see https://peps.python.org/pep-0751/ and the corresponding discussion ("Post-History" links there).
Well, big fan of uv.
But... the 86GB python dependency download cache on my primary SSD, most of which can be attributed to the 50 different versions of torch, is testament to the fact that even uv cannot salvage the mess that is pip.
Never felt this much rage at the state of a language/build system in the 25 years that I have been programming. And I had to deal with Scala's SBT ("Simple Build Tool") in another life.
I don't think pip is to blame for that. PyTorch is sadly an enormous space hog.
I just started a fresh virtual environment with "python -m venv venv" - running "du -h" showed it to be 21MB. After running "venv/bin/pip install torch" it's now 431MB.
The largest file in there is this one:
There's a whole section of the uv manual dedicated just to PyTorch: https://docs.astral.sh/uv/guides/integration/pytorch/(I just used find to locate as many libtorch_cpu.dylib files as possible on my laptop and deleted 5.5GB of them)
I use uv pip to install dependencies for any LLM software I run. I am not sure if uv re-implements the pip logic or hands over resolution to pip. But it does not change the fact that I have multiple versions of torch + multiple installations of the same version of torch in the cache.
Compare this to the way something like maven/gradle handles this and you have to wonder WTF is going on here.
uv implements its own resolution logic independently of pip.
Maybe your various LLM libraries are pinning different versions of Torch?
Different Python versions each need their own separate Torch binaries as well.
At least with uv you don't end up with separate duplicate copies of PyTorch in each of the virtual environments for each of your different projects!
> Different Python versions each need their own separate Torch binaries as well
Found this the hard way. Something to do with breakage in ABI perhaps. Was looking at the way python implements extensions the other day. Very weird.
>Something to do with breakage in ABI perhaps.
There is a "stable ABI" which is a subset of the full ABI, but no requirement to stick to it. The ABI effectively changes with every minor Python version - because they're constantly trying to improve the Python VM, which often involves re-working the internal representations of built-in types, etc. (Consider for example the improvements made to dictionaries in Python 3.6 - https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-compa... .) Of course they try to make proper abstracted interfaces for those C structs, but this is a 34 year old project and design decisions get re-thought all the time and there are a huge variety of tiny details which could change and countless people with legacy code using deprecated interfaces.
The bytecode also changes with every minor Python version (and several times during the development of each). The bytecode file format is versioned for this reason, and .pyc caches need to be regenerated. (And every now and then you'll hit a speed bump, like old code using `async` as an identifier which subsequently becomes a keyword. That hit TensorFlow once: https://stackoverflow.com/questions/51337939 .)
Very different way of doing things compared to the JVM which is what I have most experience with.
Was some kind of FFI using dlopen and sharing memory across the vm boundary ever considered in the past, instead of having to compile extensions alongside a particular version of python?
I remember seeing some ffi library, probably on pypi. But I don't think it is part of standard python.
You can in fact use `dlopen`, via the support provided in the `ctypes` standard library. `freetype-py` (https://github.com/rougier/freetype-py) is an example of a project that works this way.
To my understanding, though, it's less performant. And you still need a stable ABI layer to call into. FFI can't save you if the C code decides in version N+1 that it expects the "memory shared across the vm boundary" to have a different layout.
> Something to do with breakage in ABI perhaps. Was looking at the way python implements extensions the other day. Very weird.
Yes, it's essentially that: CPython doesn't guarantee exact ABI stability between versions unless the extension (and its enclosing package) intentionally build against the stable ABI[1].
The courteous thing to do in the Python packaging ecosystem is to build "abi3" wheels that are stable and therefore don't need to be duplicated as many times (either on the index or on the installing client). Torch doesn't build these wheels for whatever reason, so you end up with multiple slightly different but functionally identical builds for each version of Python you're using.
TL;DR: This happens because of an interaction between two patterns that Python makes very easy: using multiple Python versions, and building/installing binary extensions. In a sense, it's a symptom of Python's success: other ecosystems don't have these problems because they have far fewer people running multiple configurations simultaneously.
[1]: https://docs.python.org/3/c-api/stable.html
My use of python is somewhat recent. But the two languages that I have used a lot of - Java and JS - have interpreters that were heavily optimized over time. I wonder why that never happened with python and, instead, everyone continues to write their critical code in C/Rust.
I am planning to shift some of my stuff to pypy (so a "fast" python exists, kind of). But some dependencies can be problematic, I have heard.
Neither Java nor JS encourages the use of native extensions to the same degree that Python does. So some of it is a fundamental difference in approach: Python has gotten very far by offloading hot paths into native code instead of optimizing the interpreter itself.
(Recent positive developments in Python’s interpreted performance have subverted this informal tendency.)
Node also introduced a stable extension API that people could build native code against relatively early in its history compared to Python. That and the general velocity of the V8 interpreter and its complex API kept developers from reaching in like they did with Python, or leaving tons of libraries in the ecosystem that are too critical to drop.
Yeah, I think it's mostly about complexity: CPython's APIs also change quite a bit, but they're pretty simple (in the "simple enough to hang yourself with" sense).
> Neither Java nor JS encourages the use of native extensions to the same degree that Python does.
You already had billions of lines of Java and JS code that HAD to be sped up. So they had no alternative. If python had gone down the same route, speeding it up without caveats would have been that much easier.
I don't think that's the reason. All three ecosystems had the same inflection point, and chose different solutions to it. Python's was especially "easy" since the C API was already widely used and there were no other particular constraints (WORA for Java, pervasive async for JS) that impeded it.
For scientific stuff and ML, it's because people already had libraries written in C/Fortran/C++ and so calling it directly just made sense.
In other languages that didn't happen and you don't have anywhere near as good scientific/ML packages as a result.
>> My use of python is somewhat recent. But the two languages that I have used a lot of - Java and JS - have interpreters that were heavily optimized over time. I wonder why that never happened with python and, instead, everyone continues to write their critical code in C/Rust.
Improving Python performance has been a topic as far back as 2008 when I attended my first PyCon. A quick detour on Python 3 because there is some historical revisionism because many online people weren't around in the earlier days.
Back then the big migration to Python 3 was in front of the community. The timeline concerns that popped up when Python really got steam in the industry between 2012 and 2015 weren't as huge a concern. You can refer to Guido's talks from PyCon 2008 and 2009 if they are available somewhere to get the vibe on the urgency. Python is impactful because it changes the language and platform while requiring a massive amount of effort.
Back to perf. Around 2008, there was a feeling that an alternative to CPython might be the future. Candidates included IronPython, Jython, and PyPy. Others like Unladen Swallow wanted to make major changes to CPython (https://peps.python.org/pep-3146/).
Removing the GIL was another direction people wanted to take because it seemed simpler in a way. This is a well researched area with David Beazley having many talks like this oldie (https://www.youtube.com/watch?v=ph374fJqFPE). The idea is much older (https://dabeaz.blogspot.com/2011/08/inside-look-at-gil-remov...).
All of these alternative implementations of Python from this time period have basically failed at the goal of replacing CPython. IronPython was a Python 2 implmentation and updating to Python 3 while trying grow to challenge CPython was impossible. Eventually, Microsoft lost interest and that was that. Similar things happened for the others.
GIL removal was a constant topic from 2008 until recently. Compatibility of extensions was a major concern causing inertia and the popularity meant even more C/C++/Rust code relying on a GIL. The option to disable (https://peps.python.org/pep-0703/) only happened because the groundwork was eventually done properly to help the community move.
The JVM has very clearly defined interfaces and specs similar to the CLR which make optimization viable. JS doesn't have the compatibility concerns.
That was just a rough overview but many of the stories of Python woes miss a lot of this context. Many discussions about perf over the years have descended into a GIL discussion without any data to show the GIL would change performance. People love to talk about it but turn out to be IO-bound when you profile code.
A bit baffling, IMO, the focus on GIL over actual python performance, particularly when you had so many examples of language virtual machines improving performance in that era. So many lost opportunities.
They don't want to throw away the extensions and ecosystem. Let's say Jython, or some other modern implementation became the successor. All of the extensions need to be updated (frequently rewritten) to be compatible with and exploit the characteristics of that platform.
It was expected that extension maintainers would respond negatively to this. In many cases it presents a decision: do I port this to the new platform, or move away from Python completely? You have to remember, the impactful decisions leading us down this path were closer to 2008 than today when dropping Python or making it the second option to help people migrate, would have been viable for a lot of these extensions. There was also a lot of potential for people to follow a fork of the traditional CPython interpreter.
There were no great options because there are many variables to consider. Perf is only one of them. Pushing ahead only on perf is hard when it's unclear if it'll actually impact people in the way they think it will when they can't characterize their actual perf problem beyond "GIL bad".
python just didn't have much momentum until relatively recently, despite it's age. There are efforts to speed it up going on now backed by Microsoft.
For pypy it's in a weird spot as the things it does fast are the ones you'd usually just offload to a module implemented in C
As a long time Pythonista I was going to push back against your suggestion that Python didn't have much momentum until recently, but then I looked at the historic graph on https://www.tiobe.com/tiobe-index/ and yeah, Python's current huge rise in popularity didn't really get started until around 2018.
(TIOBE's methodology is a bit questionable though, as far as I can tell it's almost entirely based on how many search engine hits they get for "X programming". https://www.tiobe.com/tiobe-index/programminglanguages_defin...)
Yes, TIOBE is garbage. The biggest problem is that because they're coy about methodology we don't even know what we're talking about. Rust's "Most Loved" Stack Overflow numbers were at least a specific thing where you can say OK that doesn't mean there's more Rust software or that Rust programmers get paid more, apparently the people programming in Rust really like Rust, more so than say, Python programmers loved Python - so that's good to know, but it's that and not anything else.
Tiobe is garbage. I remember Python making waves since 2005 with Google using it and such.
From what I can tell it wasn't as prominent as it has been recently, with being a popular pick for random projects that weren't just gluing things together. The big companies that used it were perfectly happy specializing the interpreter to their use case instead of upstreaming general improvements
> There are efforts to speed it up
Well, the extensions are going to complicate this a lot.
A fast JIT interpreter cannot reach into a library and do its magic there the way HotSpot/V8 can with native Java/JS code.
The reason why people don't always use abi3 is because not everything that can be done with the full API is even possible with the limited one, and some things that are possible carry a significant perf hit.
I think that's a reason, but I don't think it's the main one: the main one is that native builds don't generally default to abi3, so people (1) publish larger matrices than they actually need to, and (2) end up depending on non-abi3 constructs when abi3 ones are available.
(I don't know if this is the reason in Torch's case or not, but I know from experience that it's the reason for many other popular Python packages.)
Yes, you're right; I should have clarified my comment with, "people who know the difference to begin with", which is something one needs to learn first (and very few tutorials etc on Python native modules even mention the limited API).
uv should hard link files if they’re identical like Nix does
If a package manager stores more than it needs to, it is a package manager problem.
You're about to be pleasantly surprised then.
https://docs.astral.sh/uv/reference/settings/#link-mode
It's even the default. Here's where it's implemented if you're curious https://github.com/astral-sh/uv/blob/f394f7245377b6368b9412d...
That is very cool! I would imagine that even if they didn’t implement explicit cloning, APFS would still clone the files.
`nix-store --optimize` is a little different because it looks for duplicates and hardlinks those files across all of the packages. I don’t know how much additional savings that would yield with a typical uv site-packages, but it guarantees no duplicate files.
What makes you think it doesn't?
This from above:
> I just used find to locate as many libtorch_cpu.dylib files as possible on my laptop and deleted 5.5GB of them
but maybe it wasn’t actually 5.5 GB!
It was like a dozen different versions of libtorch_cpu.dylib. So hardlink does not help.
It probably is 5.5GB. UV caches versions of packages and symlinks to them but pytorch is infamous for many different versions especially for different features.
And since its compiled dependency even if UV was to attempt the the more complicated method of symlinking to use a single version of identical files it wouldn't help much. You'd probably need to store binary diffs or chunks of files that are binary identical, at that point your code would probably start to resemble a file system in user space and time to switch to a particular version of the files (ie create thek as qctual files in filesystem) would be much higher.
Also I believe uvs cache is separate from the pip cache so you could have different copies in both.
I think there's a uv cache prune command. Arguably it should offer to install a from job to do it periodically
Testing this out locally, repro'd pretty similar numbers on macOS ARM and docker. Unfortunately the CPU-only build isn't really any smaller either, thought it was
You can specify platform specific wheels in your pyproject.toml
If you are using uv:
Take your pick and schedule to run weekly/monthly.Light user of uv here. `prune` just saved me 1.1GiB. Thanks!
That problem is very much not pip (pip is only the installer), the issue is: * We have a conflict between being easy to use (people don't need to work out which version of cuda/which gpu settings/libraries/etc. to use) vs install size (it's basically the x86 vs arm issue, except at least 10 fold larger). Rather than making it the end-users problem, packages bundle all possible options into a single artifact (MacOS does this same, but see the 10 fold larger issue). * The are almost fundamental assumptions (and the newer Python packaging tools, including uv, very much rely on these) that Python packaging makes that are inherently about how a system should act (basically, frozen, no detection available apart from the "OS"), which do not align with having hardware/software that much be detected. One could do this via sdists, but Windows plus the issues around dynamic metadata make this a non-starter (and hence tools like conda, spack and others from the more scientific side of the ecosystem have been created—notably on the more webby side, this problems are solved either via vendoring non-python libraries, or making it someone/something else's problem, hence docker or the cloud for databases or other companion services). * Frankly, more and more developers have no idea how systems are built (and this isn't just a Python issue). Docker lets people hide their sins, with magical invocations that just work (and static linking in many cases sadly does the same). There are tools out of the hyperscalars which are designed to solve these problems, but they solve it by creating tools that experts can wrangle many systems and hence imply you have a team which can do the wrangling.
Can this be solved? Maybe, but not by a new tool (on its own). It would require a lot of devs who may not see much improvement to their workflow change their workflow for others (to newer ones which remove the assumptions which are built in to the current workflows), plus a bunch of work by key stakeholders (and maybe even the open sourcing of some crown jewels), and I don't see that happening.
> people don't need to work out which version of cuda/which gpu settings/libraries/etc. to use
This is not true in my case. The regular pytorch does not work on my system. I had to download a version specific to my system from the pytorch website using --index-url.
> packages bundle all possible options into a single artifact
Cross-platform Java apps do it too. For e.g., see https://github.com/xerial/sqlite-jdbc. But it does not become a clusterfuck like it does with python. After downloading gigabytes and gigabytes of dependencies repeatedly, the python tool you are trying to run will refuse to do so for random reasons.
You cannot serve end-users a shit-sandwich of this kind.
The python ecosystem is a big mess and, outside of a few projects like uv, I don't see anyone trying to build a sensible solution that tries to improve both speed/performance and packaging/distribution.
That's a pytorch issue. The solution is, as always, build from source. You will understand how the system is assembled, then you can build a minimal version meeting your specific needs (which, given wheels are a well-defined thing, you can then store on a server for reuse).
Cross-OS (especially with a VM like Java or JS) is relatively easy compared to needing specific versions for every single sub-architecture of a CPU and GPU system (and that's ignoring all the other bespoke hardware that's out there).
Cross platform Java doesn't have the issue because the JVM is handling all of that for you. But if you want native extensions written in C you get back to the same problem pretty quickly.
> if you want native extensions written in C
The SQLite project I linked to is a JDBC driver that makes use of the C version of the library appropriate to each OS. LWJGL (https://repo1.maven.org/maven2/org/lwjgl/lwjgl/3.3.6/) is another project which heavily relies on native code. But distributing these, or using these as dependencies, does not result in hair-pulling like it does with python.
There's native code like SQLite which assuming a sensible file system and C compiler is quite portable, and then there's native code which cares about exact compiler versions, driver version, and the exact model of your CPU, GPU and NIC. My suggestion is go look at how to program a GPU using naive vulkan/metal, and then look for the dark magic that is used to make GPUs run fast. It's the latter you're encountering with the ML python projects.
A lot of that is cuda blobs, right?
ROCm. About 40%. But there is duplication there as well. Two 16GB folders containing the exact same version.
Run rmlint on it, it will replace duplicate files with reflinks (if your fs supports them — xfs and btrfs do), or hardlinks if not.
Thanks! Hearing about this for the first time. Never felt the need before.
Sounds like a great use case for ZFS’s deduplication at block level.
I use ZFS everywhere EXCEPT on this drive. Not willing to have ZFS on the primary drive till native support lands in the kernel (so, never).
Have you tried borg [0]? Also, why not BTRFS?
[0] https://borgbackup.readthedocs.io/en/stable/index.html
Have been using ZFS for the past thirteen years and all my workflows including backup are based on it. It just works.
Sure, I was just curious, since you mentioned not wanting to use ZFS without kernel support and BTRFS does have that. Being familiar with ZFS, I guess is a decent explanation.
When the topic of backups came up last year, I talked about my current solution: https://news.ycombinator.com/item?id=41042790. Someone suggested a workaround in the form of zfsbootmenu but I decided to stick to the simple way of doing things.
or, you know.. symlinks
Main issue with symlink is needing to choose the source of truth— one needs to be the real file, and the other point to it. You also need to make sure they have the same lifetimes to prevent dangling links.
Hardlink is somewhat better because both point to the same inode, but will also not work if the file needs different permissions or needs to be independently mutable from different locations.
Reflink hits the sweetspot where it can have different permissions, updates trigger CoW preventing confusing mutations, and all while still reducing total disk usage.
I don't disagree but I think some of these problems could potentially be solved by having somewhat of a birds nest of a filesystem for large blobs, eg.
/blobs/<sha256_sum>/filename.zip
and then symlinking/reflinking filename.zip to wherever it needs to be in the source tree...
It's more portable than hardlinks, solves your "source of truth" problem and has pretty wide platform support.
Platforms that don't support symlinks/reflinks could copy the files to where they need to be then delete the blob store at the end and be no worse off than they are now.
Anyway, I'm just a netizen making a drive-by comment.
Does uv have any plans for symlink/hardlink deduplication?
Not sure. The simplest solution is to store all files under a hashed name and sym/hardlink on a case to case basis. But some applications tend to behave weirdly with such files. Windows has its own implementation of symlinks and hardlinks. They simply call it something else. Perhaps portability could be an issue.
The article says it already hard links duplicates. But likely not able to help if you are using multiple versions of interpreter and lib.
[dead]
> torch
Ah found the issue.
> And I had to deal with Scala's SBT ("Simple Build Tool") in another life.
I feel you.
For someone who was just about to give Scala a try, what's wrong with it and are there alternative build tools?
It defines a DSL for your build that looks roughly like Scala code. But… it’s not! And there is a confusing “resolution” system for build tasks/settings. It’s also slow as shit. See https://www.lihaoyi.com/post/SowhatswrongwithSBT.html for a comprehensive takedown. If you’re interested in just playing around with scala I would use
https://scala-cli.virtuslab.org
Or for larger projects, the thing the author of the linked article is plugging (mill).
Try building uv itself. Cargo used something like 40GiB of disk space somehow.
I am only criticizing python because I am actively using it.
The node ecosystem and the rust one seem to have their own issues. I have zero interest in either of them so I haven't looked into them in detail.
However, I have to deal with node on occasion because a lot of JS/CSS tooling is written using it. It has a HUGE transitive dependency problem.
Like so many other articles that make some offhand remarks about conda, this article raves about a bunch of "new" features that conda has had for years.
> Being independent from Python bootstrapping
Yep, conda.
> Being capable of installing and running Python in one unified congruent way across all situations and platforms.
Yep, conda.
> Having a very strong dependency resolver.
Yep, conda (or mamba).
The main thing conda doesn't seem to have which uv has is all the "project management" stuff. Which is fine, it's clear people want that. But it's weird to me to see these articles that are so excited about being able to install Python easily when that's been doable with conda for ages. (And conda has additional features not present in uv or other tools.)
The pro and con of tools like uv is that they layer over the base-level tools like pip. The pro of that is that they interoperate well with pip. The con is that they inherit the limitations of that packaging model (notably the inability to distribute non-Python dependencies separately).
That's not to say uv is bad. It seems like a cool tool and I'm intrigued to see where it goes.
These are good points. But I think there needs to be an explanation why conda hasn't taken off more. Especially since it can handle other languages too. I've tried to get conda to work for me for more than a decade, at least once a year. What happens to me:
1) I can't solve for the tools I need and I don't know what to do. I try another tool, it works, I can move forward and don't go back to conda
2) it takes 20-60 minutes to solve, if it ever does. I quit and don't come back. I hear this doesn't happen anymore, but to this day I shudder before I hit enter on a conda install command
3) I spoil my base environment with an accidental install of something, and get annoyed and switch away.
On top of that the commands are opaque, unintuitive, and mysterious. Do I do conda env command or just conda command? Do I need a -n? The basics are difficult and at this point I'm too ashamed to ask which of the many many docs explain it, and I know I will forget within two months.
I have had zero of these problems with uv. If I screw up or it doesn't work it tells me right away. I don't need to wait for a couple minutes before pressing y to continue, I just get what I need in at most seconds, if my connection is slow.
If you're ina controlled environment and need audited packages, I would definitely put up with conda. But for open source, personal throw away projects, and anything that doesn't need a security clearance, I'm not going to deal with that beast.
Conda is the dreaded solution to the dreadful ML/scientific Python works-on-my-computer dependency spaghetti projects. One has to be crazy to suggest it for anything else.
uv hardly occupies the same problem space. It elevates DX with disciplined projects to new heights, but still falls short with undisciplined projects with tons of undeclared/poorly declared external dependencies, often transitive — commonly seen in ML (now AI) and scientific computing. Not its fault of course. I was pulling my hair out with one such project the other day, and uv didn’t help that much beyond being a turbo-charged pip and pyenv.
Eh, ML/scientific Python is large and not homogeneous. For code that should work on cluster, I would lean towards a Docker/container solution. For simpler dependancy use cases, pyenv/venv duo is alright. For some specific lib that have a conda package, it might be better to use conda, _might be_.
One illustration is the CUDA toolkit with torch install on conda. If you need a basic setup, it would work (and takes age). But if you need some other specific tools in the suite, or need it to be more lightweight for whatever reason then good luck.
btw, I do not see much interest in uv. pyenv/pip/venv/hatch are simple enough to me. No need for another layer of abstraction between my machine and my env. I will still keep an eye on uv.
Oh I'm not saying conda is a good solution for all ML/scientific. I was making the assertion that it's terrible for basically everything else.
My bad! As you said, this part of Python code bases is chaotic (:
I always enjoyed the "one-stop" solution with conda/mamba that installed the right version of cudatoolkit along with pytorch. How do you handle that without conda? (I'm genuinely ask because I never had to actually care about it.) If I manually install it, it looks like it is going to be a mess if I have to juggle multiple versions.
Add to that the licensing of conda. In my company, we are not allowed to use conda because the company would rather not pay, so might as well use some other tool which does things faster.
conda (the package) is open source, it's the installer from Anaconda Corp (nee ContinuumIO) and their package index that are a problem. If you use the installer from https://conda-forge.org/download/, you get the conda-forge index instead, which avoids the license issues.
We've been working on all the shortcomings in `pixi`: pixi.sh
It's very fast, comes with lockfiles and a project-based approach. Also comes with a `global` mode where you can install tools into sandboxed environments.
My completely unvarnished thoughts, in the hope that they are useful: I had one JIRA-ticket-worth of stuff to do on a conda environment, and was going to try to use pixi, but IIRC I got confused about how to use the environment.yml and went back to conda grudgingly. I still have pixi installed on my machine, and when I look through the list of subcommands, it does seem to probably have a better UX than conda.
When I go to https://prefix.dev, the "Get Started Quickly" section has what looks like a terminal window, but the text inside is inscrutable. What do they various lines mean? There's directories, maybe commands, check boxes... I don't get it. It doesn't look like a shell despite the Terminal wrapping box.
Below that I see that there's a pixi.toml, but I don't really want a new toml or yml file, there's enough repository lice to confuse new people on projects already.
Any time spent educating on packaging is time not spent on discovery, and is an impediment to onboarding.
I am trying to configure Pixi to use it with Artifactory proxy in a corporate environment, still could not figure it out how to configure it.
Join our discord and we can help! There are lots of users that run it with Artifactory :)
I'd probably check this out in my home lab but as a corporate user the offering of discord as a support channel makes me nervous.
Discord is predominately blocked on a corporate networks. Artifactory ( & Nexus) are very common in corporate environments. Corporate proxies are even more common. This is why I'd hesitate. These are common use cases (albeit corporate) that may not be readily available in the docs.
Have you used the contemporary tooling in this space? `mamba` (and ~therefore, `pixi`) is fast, and you can turn off the base environment. The UX is nicer,too!
Conda might have all these features, but it's kinda moot all the time no one can get them to work. My experience with conda is pulling a project, trying to install it, and it then spending hours trying to resolve dependencies. And any change would often break the whole environment.
uv "just works". Which is a feature in itself.
Yes, conda has a lot more features on paper. And it supports non-Python dependencies which is super important in some contexts.
However, after using conda for over three years I can confidently say I don't like using it. I find it to be slow and annoying, often creating more problems than it solves. Mamba is markedly better but still manages to confuse itself.
uv just works, if your desktop environment is relatively modern. that's its biggest selling point, and why I'm hooked on it.
Besides being much slower, and taking up much more space per environment, than uv, conda also has a nasty habit of causing unrelated things to break in weird ways. I've mostly stopped using it at this point, for that reason, tho I've still had to reach for it on occasion. Maybe pixi can replace those use cases. I really should give it a try.
There's also the issue the license for using the repos, which makes it risky to rely on conda/anaconda. See e.g. https://stackoverflow.com/a/74762864
Not sure what you mean about space. Conda uses hardlinks for the most part, so environment size is shared (although disk usage tools don't always correctly report this).
As far as I understand, the conda-forge distribution and the channel solves a lot issues. But it might not have the tools you need.
Good to see you again.
>Like so many other articles that make some offhand remarks about conda, this article raves about a bunch of "new" features that conda has had for years.
Agreed. (I'm also tired of seeing advances like PEP 723 attributed to uv, or uv's benefits being attributed to it being written in Rust, or at least to it not being written in Python, in cases where that doesn't really hold up to scrutiny.)
> The pro and con of tools like uv is that they layer over the base-level tools like pip. The pro of that is that they interoperate well with pip.
It's a pretty big pro ;) But I would say it's at least as much about "layering over the base-level tools" like venv.
> The con is that they inherit the limitations of that packaging model (notably the inability to distribute non-Python dependencies separately).
I still haven't found anything that requires packages to contain any Python code (aside from any build system configuration). In principle you can make a wheel today that just dumps a platform-appropriate shared library file for, e.g. OpenBLAS into the user's `site-packages`; and others could make wheels declaring yours as a dependency. The only reason they wouldn't connect up - that I can think of, anyway - is because their own Python wrappers currently don't hard-code the right relative path, and current build systems wouldn't make it easy to fix that. (Although, I guess SWIG-style wrappers would have to somehow link against the installed dependency at their own install time, and this would be a problem when using build isolation.)
> The only reason they wouldn't connect up - that I can think of, anyway - is because their own Python wrappers currently don't hard-code the right relative path
It's not just that, it's that you can't specify them as dependencies in a coordinated way as you can with Python libs. You can dump a DLL somewhere but if it's the wrong version for some other library, it will break, and there's no way for packages to tell each other what versions of those shared libraries they need. With conda you can directly specify the version constraints on non-Python packages. Now, yeah, they still need to be built in a consistent manner to work, but that's what conda-forge handles.
Ah, right, I forgot about those issues (I'm thankful I don't write that sort of code myself - I can't say I ever enjoyed C even if I used to use it regularly many years ago). I guess PEP 725 is meant to address this sort of thing, too (as well as build-time requirements like compilers)... ?
I guess one possible workaround is to automate making a wheel for each version of the compiled library, and have the wheel version move in lockstep. Then you just specify the exact wheel versions in your dependencies, and infer the paths according to the wheel package names... it certainly doesn't sound pleasant, though. And, C being what it is, I'm sure that still overlooks something.
> I still haven't found anything that requires packages to contain any Python code (aside from any build system configuration). In principle you can make a wheel today that just...
Ah, I forgot the best illustration of this: uv itself is available this way - and you can trivially install it with Pipx as a result. (I actually did this a while back, and forgot about it until I wanted to test venv creation for another comment...)
> But it's weird to me to see these articles that are so excited about being able to install Python easily when that's been doable with conda for ages. (And conda has additional features not present in uv or other tools.)
I used conda for awhile around 2018. My environment became borked multiple times and I eventually gave up on it. After that, I never had issues with my environment becoming corrupted. I knew several other people who had the same issues and it stopped after they switched away from conda.
I've heard it's better now, but that experience burned me so I haven't kept up with it.
Having the features is not nearly as much use if the whole thing's too slow to use. I frequently get mamba taking multiple minutes to figure out how to install a package. I use and like Anaconda and miniforge, but their speed for package management is really frustrating.
Thanks for bringing up conda. We're definitely trying to paint this vision as well with `pixi` (https://pixi.sh) - which is a modern package manager, written in Rust, but using the Conda ecosystem under the hood.
It follows more of a project based approach, comes with lockfiles and a lightweight task system. But we're building it up for much bigger tasks as well (`pixi build` will be a bit like Bazel for cross-platform, cross-language software building tasks).
While I agree that conda has many short-comings, the fundamental packages are alright and there is a huge community keeping the fully open source (conda-forge) distribution running nicely.
I just want to give a hearty thank you for pixi. It's been an absolute godsend for us. I can't express how much of a headache it was to deal with conda environments with student coursework and research projects in ML, especially when they leave and another student builds upon their work. There was no telling if the environment.yml in a student's repo was actually up to date or not, and most often didn't include actual version constraints for dependencies. We also provide an HPC cluster for students, which brings along its own set of headaches.
Now, I just give students a pixi.toml and pixi.lock, and a few commands in the README to get them started. It'll even prevent students from running their projects, adding packages, or installing environments when working on our cluster unless they're on a node with GPUs. My inbox used to be flooded with questions from students asking why packages weren't installing or why their code was failing with errors about CUDA, and more often than not, it was because they didn't allocate any GPUs to their HPC job.
And, as an added bonus, it lets me install tools that I use often with the global install command without needing to inundate our HPC IT group with requests.
So, once again, thank you
It's worth noting that uv does not use pip, and it is entirely possible (as noted by uv's existence) to write a new installer that uses PyPI. The conflicts between pip, conda, and any other installers is all about one (or more) installers not having a complete view of the system, typically by having different packaging formats/indices/metadata.
> The main thing conda doesn't seem to have which uv has is all the "project management" stuff.
Pixi[1] is an alternative conda package manager (as in it still uses conda repositories; conda-forge by default) that bridges this gap. It even uses uv for PyPI packages if you can't find what you need in conda repositories.
1: https://pixi.sh
Pixi goes a step further, they host mirrors of a lot of the important conda channels!
The killer feature of uv for me is much faster uv pip install -r requirements.txt.
I come from the viewpoint that I don't want my build tool to install my Python for me.
In the same vein, I don't want Gradle or Maven to install my JVM for me.
In JVM land I use SDKMAN! (Yes, that's what the amazingly awesome (in the original sense of awe: "an emotion variously combining dread, veneration, and wonder") concretion of Bash scripts is called).
In Python land I use pyenv.
And I expect my build tool to respect the JVM/Python versions I've set (looking at you Poetry...) and fail if they can't find them (You know what you did, Poetry. But you're still so much better than Pipenv)
I expect my build tool to respect the JVM/Python versions I've set (looking at you Poetry...) and fail if they can't find them
For what it's worth, uv does this if you tell it to. It's just not the default behaviour. Uv and pyenv can easily be used together.
You didn’t mention an important point: speed.
Suppose conda had projects. Still, it is somewhat incredible to see uv resolve + install in 2 seconds what takes conda 10 minutes. It immediately made me want to replace conda with uv whenever possible.
(I have actively used conda for years, and don’t see myself stopping entirely because of non Python support, but I do see myself switching primarily to uv.)
It's true conda used to be slow, but that was mostly at a time when pip had no real dependency resolver at all. Since I started using mamba, I haven't noticed meaningful speed problems. I confess I'm always a bit puzzled at how much people seem to care about speed for things like install. Like, yes, 10 minutes is a problem, but these days mamba often takes like 15 seconds or so. Okay, that could be faster, but installing isn't something I do super often so I don't see it as a huge problem.
The near instant install speed is just such a productivity boost. It's not the time you save, it's how it enables you to stay in flow. In my previous job we had a massive internal library hosted on Azure that took like 5 minutes to install with pip or conda. Those on my team not using uv either resorted to using a single global environment for everything, which they dreaded experiment with, or made a new environment once in the project's history and avoided installing new dependencies like the plague. Uv took less than 30 seconds to install the packages, so it freed up a way better workflow of having disposable envs that I could just nuke and start over if they went bad.
Agreed. When I tried uv first I was immediately taken aback by the sheer speed. The functionality seemed OK, but the speed - woah. Inspiring. So I kept using it. Got used to it now.
In my mind, conda has always been a purpose-specific environment manager for people who live in data science / matplotlib / numpy / jupyter all day every day.
Ruby-the-language is now inseparable from Rails because the venn diagram of the “Ruby” community and the “rails” community is nearly a circle. It can be hard to find help with plain Ruby because 99% of people will assume you have the rails stdlib monkeypatches.
In a similar way, conda and data science seem to be conjoined, and I don’t really see anybody using conda as a general-purpose Python environment manager.
UV doesn’t require an entire duplicated PyPI repository, so there’s that.
Also conda does not have a free licence, every organisation with more than 200 employes should pay for it (https://www.datacamp.com/blog/navigating-anaconda-licensing). uv is at date, Apache 2.0.
That is the license for the anaconda package channel, not conda. The page you linked explains that conda and conda-forge are not subject to those licensing issues.
So it's like conda but without getting stuck on "Solving environment: \" for hours?
https://docs.conda.io/projects/conda/user-guide/getting-star...
> To bootstrap a conda installation, use a minimal installer such as Miniconda or Miniforge.
> Conda is also included in the Anaconda Distribution.
Bam, you’ve already lost me there. Good luck getting this approved on our locked down laptops.
No pip compatibility? No venv compatibility? Into the trash it goes, it’s not standard. The beauty of uv is that it mostly looks like glue (even though it is more) for standard tooling.
Well, that's sort of the point though. By leaving behind the "standards" it can add benefits that aren't possible with the standards.
True and uv will probbaly never brings non python deps on the table.
But anaconda doesn't do inline deps, isn't a consitent experience (the typical conda project doesn't exist), is it's own island incompatible with most python ecosystem, is super slow, the yaml config is very quirky, and it's very badly documented while having poor ergonomics.
In short, anaconda solves many of those problems but brings other ones on the table.
I think at this point, the only question it remains is how Astral will make money. But if they can package some sort enterprise package index with some security bells and whistles it seems an easy sell into a ton of orgs.
Charlie Marsh said in our interview they plan to compete with anaconda on b2b:
https://www.bitecode.dev/p/charlie-marsh-on-astral-uv-and-th...
Make sense, the market is wide open for it.
A scenario for "don't use uv" I hope none of you encounter: many nvidia libraries not packaged up in something better like torch.
Here's just one example, nemo2riva, the first in several steps to taking a trained NeMo model and making it deployable: https://github.com/nvidia-riva/nemo2riva?tab=readme-ov-file#...
before you can install the package, you first have to install some other package whose only purpose is to break pip so it uses nvidia's package registry. This does not work with uv, even with the `uv pip` interface, because uv rightly doesn't put up with that shit.
This is of course not Astral's fault, I don't expect them to handle this, but uv has spoiled me so much it makes anything else even more painful than it was before uv.
>whose only purpose is to break pip so it uses nvidia's package registry. This does not work with uv, even with the `uv pip` interface, because uv rightly doesn't put up with that shit.
I guess you're really talking about `nvidia-pyindex`. This works by leveraging the legacy Setuptools build system to "build from source" on the user's machine, but really just running arbitrary code. From what I can tell, it could be made to work just as well with any build system that supports actually orchestrating the build (i.e., not Flit, which is designed for pure Python projects), and with the modern `pyproject.toml` based standards. It's not that it "doesn't work with uv"; it works specifically with Pip, by trying to run the current (i.e.: target for installation) Python environment's copy of Pip, calling undocumented internal APIs (`from pip._internal.configuration import get_configuration_files`) to locate Pip's config, and then parsing and editing those files. If it doesn't work with `uv pip`, I'm assuming that's because uv is using a vendored Pip that isn't in that environment and thus can't be run that way.
Nothing prevents you, incidentally, from setting up a global Pip that's separate from all your venvs, and manually creating venvs that don't contain Pip (which makes that creation much faster): https://zahlman.github.io/posts/2025/01/07/python-packaging-... But it does, presumably, interfere with hacks like this one. Pip doesn't expose a programmatic API, and there's no reason why it should be in the environment if you haven't explicitly declared it as a dependency - people just assume it will be there, because "the user installed my code and presumably that was done using Pip, so of course it's in the environment".
Instead of installing nvidia-pyindex, use https://docs.astral.sh/uv/configuration/indexes/ to configure the index nvidia-pyindex points to.
Surely you can just manually add their index, right?
yes, but if you’re not in their carefully directed “nemo environment” the nemo2riva command fails complaining about some hydra dependency. and on it goes…
Yes, but the point is that they automate the process for you, because it's finicky.
I think the biggest praise I can give uv is that as a non Python dev, it makes Python a lot more accessible. The ecosystem can be really confusing to approach as an outsider. There’s like 5 different ways to create virtual environments. With uv, you don’t have to care about any of that. The venv and your Python install are just handled for you by ‘uv run’, which is magic.
Can someone explain a non-project based workflow/configuration for uv? I get creating a bespoke folder, repo, and uv venv for certain long-lived projects (like creating different apps?).
But most of my work, since I adopted conda 7ish years ago, involves using the same ML environment across any number of folders or even throw-away notebooks on the desktop, for instance. I’ll create the environment and sometimes add new packages, but rarely update it, unless I feel like a spring cleaning. And I like knowing that I have the same environment across all my machines, so I don’t have to think about if I’m running the same script or notebook on a different machine today.
The idea of a new environment for each of my related “projects” just doesn’t make sense to me. But, I’m open to learning a new workflow.
Addition: I don’t run other’s code, like pretrained models built with specific package requirements.
`uv` isn't great for that, I've been specifying and rebuilding my environments for each "project".
My one off notebook I'm going to set up to be similar to the scripts, will require some mods.
It does take up a lot more space, it is quite a bit faster.
However, you could use the workspace concept for this I believe, and have the dependencies for all the projects described in one root folder and then all sub-folders will use the environment.
But I mean, our use case is very different than yours, its not necessary to use uv.
Gotcha. Thank you.
FYI, for anyone else that stumbles upon this: I decided to do a quick check on PyTorch (the most problem-prone dependency I've had), and noticed that they recommending specifically no longer using conda—and have since last November.
I personally have a "sandbox" directory that I put one-off and prototype projects in. My rule is that git repos never go in any dir there. I can (and do) go in almost any time and rm anything older than 12 months.
In your case, I guess one thing you could do is have one git repo containing you most commonly-used dependencies and put your sub-projects as directories beneath that? Or even keep a branch for each sub-project?
One thing about `uv` is that dependency resolution is very fast, so updating your venv to switch between "projects" is probably no big deal.
> The idea of a new environment for each of my related “projects” just doesn’t make sense to me. But, I’m open to learning a new workflow.
First, let me try to make sense of it for you -
One of uv's big ideas is that it has a much better approach to caching downloaded packages, which lets it create those environments much more quickly. (I guess things like "written in Rust", parallelism etc. help, but as far as I can tell most of the work is stuff like hard-linking files, so it's still limited by system calls.) It also hard-links duplicates, so that you aren't wasting tons of space by having multiple environments with common dependencies.
A big part of the point of making separate environments is that you can track what each project is dependent on separately. In combination with Python ecosystem standards (like `pyproject.toml`, the inline script metadata described by https://peps.python.org/pep-0723/, the upcoming lock file standard in https://peps.python.org/pep-0751/, etc.) you become able to reproduce a minimal environment, automate that reproduction, and create an installable sharable package for the code (a "wheel", generally) which you can publish on PyPI - allowing others to install the code into an environment which is automatically updated to have the needed dependencies. Of course, none of this is new with `uv`, nor depends on it.
The installer and venv management tool I'm developing (https://github.com/zahlman/paper) is intended to address use cases like yours more directly. It isn't a workflow tool, but it's intended to make it easier to set up new venvs, install packages into venvs (and say which venv to install it into) and then you can just activate the venv you want normally.
(I'm thinking of having it maintain a mapping of symbolic names for the venvs it creates, and a command to look them up - so you could do things like "source `paper env-path foo`/bin/activate", or maybe put a thin wrapper around that. But I want to try very hard to avoid creating the impression of implementing any kind of integrated development tool - it's an integrated user tool, for setting up applications and libraries.)
That's my main use case not-yet-supported by uv. It should not be too difficult to add a feature or wrapper to uv so that it works like pew/virtualenvwrapper.
E.g. calling that wrapper uvv, something like
You could imagine additional features such as keeping a log of the installed packages inside the venv so that you could revert to arbitrary state, etc. as goodies given how much faster uv is.I've worked like you described for years and it mostly works. Although I've recently started to experiment with a new uv based workflow that looks like this:
To open a notebook I run (via an alias)
and then in the first cell of each notebook I have This takes care of all the venv management stuff and makes sure that I always have the dependencies I need for each notebook. Only been doing this for a few weeks, but so far so good.Why not just copy your last env into the next dir? If you need to change any of the package versions, or add something specific, you can do that without risking any breakages in your last project(s). From what I understand uv has a global package cache so the disk usage shouldn't be crazy.
Just symlink the virtualenv folder and pyproject.toml it makes to whatever other project you want it to use.
Yeah, this is how I feel too. A lot of the movement in Python packaging seems to be more in managing projects than managing packages or even environments. I tend to not want to think about a "project" until very late in the game, after I've already written a bunch of code. I don't want "make a project" to be something I'm required or even encouraged to do at the outset.
I have the opposite feeling, and that's why I like uv. I don't want to deal with "environments". When I run a Python project I want its PYTHONPATH to have whatever libraries its config file says it should have, and I don't want to have to worry about how they get there.
I set up a "sandbox" project as an early step of setting up a new PC.
Sadly for certain types of projects like GIS, ML, scientific computing, the dependencies tend to be mutually incompatible and I've learned the hard way to set up new projects for each separate task when using those packages. `uv init; uv add <dependencies>` is a small amount of work to avoid the headaches of Torch etc.
Since this seems to be a love fest let me offer a contrarian view. I use conda for environment management and pip for package management. This neatly separates the concerns into two tools that are good at what they do. I'm afraid that uv is another round of "Let's fix everything" just to create another soon to be dead set of patterns. I find nothing innovative or pleasing in its design, nor do I feel that it is particularly intuitive or usable.
You don't have to love uv, and there are plenty of reasons not to.
> soon to be dead set of patterns.
Dozens of threads of people praising how performant and easy uv is, how it builds on standards and current tooling instead of inventing new incompatible set of crap, and every time one comment pops up with “akshually my mix of conda, pyenv, pipx, poetry can already do that in record time of 5 minutes, why do you need uv? Its going to be dead soon”.
To be fair here: conda was praised as the solution to everything by many when it was new. It did have its own standards of course. Now most people hate it.
Every packaging PEP is also hailed as the solution to everything, only to be superseded by a new and incompatible PEP within two years.
So what?
If someone doesn’t want to use it, or doesn’t like it, or is, quite reasonably skeptical that “this time it’ll be different!” … let them be.
If it’s good, it’ll stand on its own despite the criticism.
If it can’t survive with some people disliking and criticising it is, it deserves to die.
Right? Live and let live. We don’t have to all agree all the time about everything.
uv is great. So use it if you want to.
And if you don’t, that’s okay too.
Naive take. https://gwern.net/holy-war counsels that, in fact, becoming the One True Package Manager for a very popular programming language is an extremely valuable thing to aim towards. This is even outside of the fact that `uv` is backed by a profit-seeking company (cf https://astral.sh/about). I'm all for people choosing what works best for them, but I'm also staunchly pro-arguing over it.
> becoming the One True Package Manager for a very popular programming language is an extremely valuable thing
For companies. Which is why when random people start acting like it’s important, you have to wonder why it’s important to them.
For example, being a corporate shill. Or so deep in coolaid you can’t allow alternative opinions? Hm?
It’s called an echo chamber.
>Which is why when random people start acting like it’s important, you have to wonder why it’s important to them.
I don't use uv because I don't currently trust that it will be maintained on the timescales I care about. I stick with pip and venv, because I expect they will still be around 10 years from now, because they have much deeper pools of interested people to draw contributors and maintainers from, because - wait for it - they are really popular. Your theory about random people being corporate shills for anything they keep an eye on the popularity of can be explained much more parsimoniously like that.
Why on earth would it be only important for companies?
> I find nothing innovative or pleasing in its design, nor do I feel that it is particularly intuitive or usable.
TFA offers a myriad innovative and pleasing examples. It would have been nice if you actually commented on any of those, or otherwise explained why you think otherwise.
conda user for 10 years and uv skeptic for 18 months.
I get it! I loved my long-lived curated conda envs.
I finally tried uv to manage an environment and it’s got me hooked. That a projects dependencies can be so declarative and separated from the venv really sings for me! No more meticulous tracking of a env.yml or requirements.txt just ‘uv add` and `uv sync` and that’s it! I just don’t think about it anymore
I'm also a long time conda user and have recently switched to pixi (https://pixi.sh/), which gives a very similar experience for conda packages (and uses uv under the hood if you want to mix dependencies from pypi). It's been great and also has a `pixi global` similar to `pipx`, etc the makes it easy to grab general tools like ripgrep, ruff etc and make them widely available, but still managed.
whoa! TIL thanks will check it out
Pip installs packages, but it provides rather limited functionality for actually managing what it installed. It won't directly spit out a dependency graph, won't help you figure out which of the packages installed in the current environment are actually needed for the current project, leaves any kind of locking up to you...
I agree that uv is the N+1th competing standard for a workflow tool, and I don't like workflow tools anyway, preferring to do my own integration. But the installer it provides does solve a lot of real problems that Pip has.
Yeah, I don't want my build tool to manage my Python, and I don't want my tool that installs Pythons to manage my builds.
In JVM land I used Sdkman to manage JVMs, and I used Maven or Gradle to manage builds.
I don't want them both tied to one tool, because that's inflexible.
What does the additional flexibility get you? It's just one less thing to worry about when coordinating with a team, and it's easy to shift between different versions for a project if need be.
I'm very much looking forward to their upcoming static type checker. Hopefully it will lead to some interesting new opportunities in the python world!
uv is so much better than everything else, I'm just can't afraid they can't keep the team going. Time will tell but I just use uv and ruff in every project now tbh.
really need them to keep going
the amount of people who switch to R because Python is too hard to setup is crazy high.
Especially among the life scientists and statisticians
A familiar tale: Joe is hesitant about switching to UV and isn't particularly excited about it. Eventually, he gives it a try and becomes a fan. Soon, Joe is recommending UV to everyone he knows.
Joe has found the One True God, Joe must proselytise, the true God demands it.
I know good naming is hard, and there are an awful lot of project names that clash, but naming a project uv is unfortunate due to the ubiquitous nature of libuv
https://libuv.org/
I don't think it's particularly problematic, uv the concurrency library and uv the Python tool cover such non-overlapping domains that opportunities for confusion are minimal.
(The principle is recognized in trademark law -- some may remember Apple the record label and Apple the computer company. They eventually clashed, but I don't see either of the uv's encroaching on the other's territory.)
Sure, there are so few backend Node.js engineers. Let alone game engine developers and Blender users with their UV mapping tools. None of these people will ever encounter Python in their daily lives.
[flagged]
That's unnecessarily rude.
> I don't think it's particularly problematic, uv the concurrency library and uv the Python tool cover such non-overlapping domains that opportunities for confusion are minimal.
Google returns mixed results. You may assert it's not problematic, but this is a source of noise that projects with distinct names don't have.
I'm not sure that's true. uvloop, built on libuv, is a pretty popular alternative event loop for async Python, much faster than the built-in. It certainly confused me at first to see a tool called "uv" that had nothing to do with that, because I'd been using libuv with Python for years before it came out.
I think of uv in the gfx programming sense too.
uv is overposted as hell, but oh my god I love it so much I understand every single one of these posts. Simple things just work so effectively.
I've been mostly out of the python game for quite a while, but I never had that much issue with: pip install -r requirements.txt .
Seems like a lot of people have tried their hand at various tooling, so there must be more to it than I am aware of.
A big thing that trips people up until they try to use a public project (from source) or an older project, is the concept of a dependencies file and a lock file.
The dependency file (what requirements.txt is supposed to be), just documents the things you depend on directly, and possibly known version constraints. A lock file captures the exact version of your direct and indirect dependencies at the moment in time it's generated. When you go to use the project, it will read the lock file, if it exists, and match those versions for anything listed directly or indirectly in the dependency file. It's like keeping a snapshot of the exact last-working dependency configuration. You can always tell it to update the lock file and it will try to recaclulate everything from latest that meets your dependency constraints in the dependency file, but if something doesn't work with that you'll presumably have your old lock file to fall back on _that will still work_.
It's a standard issue/pattern in all dependency managers, but it's only been getting attention for a handful of years with the focus on reproducibility for supply chain verification/security. It has the side effect of helping old projects keep working much longer though. Python has had multiple competing options for solutions, and only in the lat couple years did they pick a format winner.
Then why even have the dependency file?
If the dependency file is wrong, and describes versions that are incompatible with the project, it should be fixed. Duplicating that information elsewhere is wrong.
Lockfiles have a very obvious use case: Replicable builds across machines in CI. You want to ensure that all the builds in the farm are testing the same thing across multiple runs, and that new behaviors aren't introduced because numpy got revved in the middle of the process. When that collective testing process is over, the lockfile is discarded.
You should not use lockfiles as a "backup" to pyproject.toml. The version constraints in pyproject.toml should be correct. If you need to restrict to a single specific version, do so, "== 2.2.9" works fine.
>Then why even have the dependency file?
Horses for courses.
Dependency files - whether the project's requirements (or optional requirements, or in the future, other arbitrary dependency groups) in `pyproject.toml`, or a list in a `requirements.txt` file (the filename here is actually arbitrary) don't describe versions at all, in general. Their purpose is to describe what's needed to support the current code: its direct dependencies, with only as much restriction on versions as is required. The base assumption is that if a new version of a dependency comes out, it's still expected to work (unless a cap is set explicitly), and has a good chance of improving things in general (better UI, more performant, whatever). This is suitable for library development: when others will cite your code as a dependency, you avoid placing unnecessary restrictions on their environment.
Lockfiles are meant to describe the exact version of everything that should be in the environment to have exact reproducible behaviour (not just "working"), including transitive dependencies. The base assumption is that any change to anything in the environment introduces an unacceptable risk; this is the tested configuration. This is suitable for application development: your project is necessarily the end of the line, so you expect others to be maximally conservative in meeting your specific needs.
You could also take this as an application of Postel's Law.
>Lockfiles have a very obvious use case: Replicable builds across machines in CI.
There are others who'd like to replicate their builds: application developers who don't want to risk getting bug reports for problems that turn out to be caused by upstream updates.
> You should not use lockfiles as a "backup" to pyproject.toml. The version constraints in pyproject.toml should be correct. If you need to restrict to a single specific version, do so, "== 2.2.9" works fine.
In principle, if you need a lockfile, you aren't distributing a library package anyway. But the Python ecosystem is still geared around the idea that "applications" would be distributed the same way as libraries - as wheels on PyPI, which get set up in an environment, using the entry points specified in `pyproject.toml` to create executable wrappers. Pipx implements this (and rejects installation when no entry points are defined); but the installation will still ignore any `requirements.txt` file (again, the filename is arbitrary; but also, Pipx is delegating to Pip's ordinary library installation process, not passing `-r`).
You can pin every version in `pyproject.toml`. Your transitive dependencies still won't be pinned that way. You can explicitly pin those, if you've done the resolution. You still won't have hashes or any other supply-chain info in `pyproject.toml`, because there's nowhere to put it. (Previous suggestions of including actual lockfile data in `pyproject.toml` have been strongly rejected - IIRC, Hatch developer Ofek Lev was especially opposed to this.)
Perhaps in the post-PEP 751 future, this could change. PEP 751 specifies both a standard lockfile format (with all the sorts of metadata that various tools might want) and a standard filename (or at least filename pattern). A future version of Pipx could treat `pylock.toml` as the "compiled" version of the "source" dependencies in `pyproject.toml`, much like Pip (and other installers) treat `PKG-INFO` (in an sdist, or `METADATA` in a wheel) as the "compiled" version (dependency resolution notwithstanding!) of other metadata.
Just two reasons (there are more): 1) uv is vastly faster than pip. Just using uv pip install -r requirements.txt and nothing else is a win. 2) uv can handle things like downloading the correct python person, creating a venv (or activating an existing venv if one exists) and essentially all the other cognitive load in a way that's completely transparent to the user. It means you can give someone a Python project and a single command to run it, and you can have confidence it will work on regardless of the platform or a dozen other little variables that trip people up.
I'll give it a whirl soon
This works if you only have one python project on your system, but most python developers need virtual environments to deal with various projects.
I create a .venv directory for each project(even for those test projects named pytest, djangotest). And each project has its own requirements file. Personally, Python packaging has never been a problem.
What do you do when you accidentally run pip install -r requirements.txt with the wrong .venv activated?
If your answer is "delete the venv and recreate it", what do you do when your code now has a bunch of errors it didn't have before?
If your answer is "ignore it", what do you do when you try to run the project on a new system and find half the imports are missing?
None of these problems are insurmountable of course. But they're niggling irritations. And of course they become a lot harder when you try to work with someone else's project, or come back to a project from a couple of years ago and find it doesn't work.
>What do you do when you accidentally run pip install -r requirements.txt with the wrong .venv activated?
As someone with a similar approach (not using requirements.txt, but using all the basic tools and not using any kind of workflow tool or sophisticated package manager), I don't understand the question. I just have a workflow where this isn't feasible.
Why would the wrong venv be activated?
I activate a venv according to the project I'm currently working on. If the venv for my current code isn't active, it's because nothing is active. And I use my one global Pip through a wrapper, which (politely and tersely) bonks me if I don't have a virtual environment active. (Other users could rely on the distro bonking them, assuming Python>=3.11. But my global Pip is actually the Pipx-vendored one, so I protect myself from installing into its environment.)
You might as well be asking Poetry or uv users: "what do you do when you 'accidentally' manually copy another project's pyproject.toml over the current one and then try to update?" I'm pretty sure they won't be able to protect you from that.
>If your answer is "delete the venv and recreate it", what do you do when your code now has a bunch of errors it didn't have before?
If it did somehow happen, that would be the approach - but the code simply wouldn't have those errors. Because that venv has its own up-to-date listing of requirements; so when I recreated the venv, it would naturally just contain what it needs to. If the listing were somehow out of date, I would have to fix that anyway, and this would be a prompt to do so. Do tools like Poetry and uv scan my source code and somehow figure out what dependencies (and versions) I need? If not, I'm not any further behind here.
>And of course they become a lot harder when you try to work with someone else's project, or come back to a project from a couple of years ago and find it doesn't work.
I spent this morning exploring ways to install Pip 0.2 in a Python 2.7 virtual environment, "cleanly" (i.e. without directly editing/moving/copying stuff) starting from scratch with system Python 3.12. (It can't be done directly, for a variety of reasons; the simplest approach is to let a specific version of `virtualenv` make the environment with an "up-to-date" 20.3.4 Pip bootstrap, and then have that Pip downgrade itself.)
I can deal with someone else's (or past me's) requirements.txt being a little wonky.
uv basically does that + python version handling + conveniences like auto-activating venv and installing dependencies
it was a massive problem at our company's hackathon. just so many hours wasted
Yeah, this is where I've been for a while. Maybe it helps that I don't do any ML work with lots of C or Fortran libraries that depend on exact versions of Python or whatever. But for just writing an application in Python, venv and pip are fine. I'll probably still try uv eventually if everyone really decides they're adopting it, but I won't rush.
sure, even so I think I had like one or two bash aliases to create/switch virtualenvs
if you come back to a project you haven't worked on for a year or two, you'll end up with new versions of dependencies that don't work with your code or environment any more.
you can solve this with constraints, pip-tools etc., but the argument is uv does this better
I write internal tools using Python at work. These tools are often used by non-Python devs. I am so, so tired of adding a blurb on creating a venv.
(Of course, the alternative—"install this software you've never heard of"—isn't fantastic either. But once they do have it, it'd be pretty neat to be able to tell them to just "uvx <whatever>".)
Is it not the same blurb every time that you could copy and paste?
Or you can make sure you have an entry point - probably a better UX for your coworkers anyway - and run them through a `pipx` install.
Or you could supply your own Bash script or whatever.
Or since you could use a simple packager like pex (https://docs.pex-tool.org/). (That one even allows you to embed a Python executable, if you need to and if you don't have to worry about different platforms.) Maybe even the standard library `zipapp` works for your needs.
The problem I have with uv is that it is not opinionated enough, or complete enough. It still needs a backend for building the package, and you still have a choice of backends.
In other words, it is a nice frontend to hide the mess that is the Python packaging ecosystem, but the mess of an ecosystem is still there, and you still have to deal with it. You'll still have to go through hatchling's docs to figure out how to do x/y/z. You'll still have to switch from hatchling to flit/pdm/setuptools/... if you run into a limitation of hatchling. As a package author, you're never using uv, you're using uv+hatchling (or uv+something) and a big part of your pyproject.toml are not uv's configuration, it is hatchling configuration.
I'm sticking with Poetry for now, which has a more streamlined workflow. Things work together. Every Poetry project uses the same configuration syntax (there are no Poetry+X and Poetry+Y projects). Issues in Poetry can be fixed by Poetry rather than having to work with the backend.
I understand that uv is still young and I am sure this will improve. Maybe they'll even pick a specific backend and put a halt to this. But of course Poetry might catch up before then.
They are currently in the process of creating their own build system: https://github.com/astral-sh/uv/issues/3957#issuecomment-265...
Hatchling maintainer here :)
What limitations have you personally experienced?
uv has been a complete game changer for me in Python, everyone who develops with me knows I won't shut up about it.
Astral is a great team, they built the ruff linter and are currently working on a static type checker called red-knot: https://x.com/charliermarsh/status/1884651482009477368
> I had a friend who decided to not use uv, because the first time he used it, it was on a 15 years old codebase that had just been migrated to Python 3. It was standing on a pile of never cleaned up pip freeze exports, and uv could not make it work.
This is my only gripe with uv, despite how the author decided to depict it, this really turns into a headache fast as soon as you have ~4-5 in-house packages.
I don't think it's that bad that uv is so unforgiving in those case because it leads to better overall project quality/cohesion, but I wish there was a way to more progressively onboard and downgrade minor version mismatch to warnings.
Did they run into a hard blocker, or was it just that using version overrides was possible but painful? I started looking seriously at uv/pdm once poetry made it entirely clear they didn't intend to support version overrides [1]. uv's support for overrides seems serviceable if unsophisticated [2][3].
[1] https://github.com/python-poetry/poetry/issues/697
[2] https://docs.astral.sh/uv/concepts/resolution/#dependency-ov...
[3] https://docs.astral.sh/uv/reference/settings/#override-depen...
The linked poetry Issue is pretty understandable why they aren't going to support it. I've honestly never heard of any dependency resolver that allows you to dynamically inject an override of a package's built in specification for an indirect dependency. Point blank, that's a packaging failure and the solution is, and always has been, to immediately yank the offending package. That said, the Python case has pretty limited ability to specify dependency package versions, which makes it nigh impossible to handle it downstream by blocklisting specific versions from an otherwise contiguous range.
Take for example the werkzeug package that released a breaking API regression in a patch release version. It didn't affect everyone, but notably did affect certain Flask(?) use cases that used werkzeug as a dependency. In a sane system, either werkzeug immediately removes the last released version as buggy (and optionally re-releases it as a non-backwards compatible SemVer change), or everyone starts looks for an alternative to non-compliant werkzeug. Pragmatically though, Python dependency specification syntax should have a way for Flask to specify, in a patch release of its own, that werkzeug up to the next minor version, _but excluding a specific range of patch versions_, is a dependency. Allowing them to monkey patch the problem in the short term.
It should never be on the end user to be specifying overrides of indirect dependency specifications at the top level though, which is what was requested from the poetry tool.
I've honestly never heard of any dependency resolver that allows you to dynamically inject an override of a package's built in specification for an indirect dependency.
npm and yarn both let you do it. PDM and uv think about it differently, but both allow overrides.
It should never be on the end user to be specifying overrides of indirect dependency specifications at the top level though, which is what was requested from the poetry tool.
I'm jealous of your upstreams. I just want to use Django package XYZ that says it's only compatible with Django 3.X on Django 4. Works just fine, but poetry won't let it happen. Upstream seems like they might literally be dead in some cases, with an unmerged unanswered PR open for years. In other cases a PR was merged but no new PyPI release was ever made because I allowed for more liberal requirements for a 0.7.X release last made in 2019 and they're on version 4.X or whatever these days.
On one decade old application I have a half dozen forks of old packages with only alterations to dependency specifications specifically to please poetry. It's really annoying as opposed to just being able to say "I know better than what this package says" like in npm and yarn.
This is exactly what half the comments in poetry's "please allow overrides" issue are saying.
Maven does it.
It's what I most miss about it.
Package com.foo.Something pins a dependency on crap.bollocks.SomethingElse v1.1.0?
But I want to use crap.bollocks.SomethingElse v1.1.5? And I know that they're compatible?
Then I can configure a dependency exclusion.
I really really miss this feature in every non-JVM build tool.
It's another one of those things that the JVM ecosystem did right that everyone else forgot to copy.
(The other massive one being packages having verifiable namespaces. Can't really typosquat Guava because it's namespace is com.google and they can prove it)
> I've honestly never heard of any dependency resolver that allows you to dynamically inject an override of a package's built in specification for an indirect dependency.
You can do it with [patch] in cargo (I think), or .exclude in SBT. In Maven you can use <dependencyManagement>. In fact I can't think of a package manager that doesn't support it, it's something I'd always expect to be possible.
> Point blank, that's a packaging failure and the solution is, and always has been, to immediately yank the offending package.
Be that as it may, PyPi won't.
> It should never be on the end user to be specifying overrides of indirect dependency specifications at the top level though
It "shouldn't", but sometimes the user will find themselves in that situation. The only choice is whether you give them the tools to work around it or you don't.
> this really turns into a headache fast as soon as you have ~4-5 in-house packages
What’s a typical or better way of handling in-house packages?
By default pip will give you a warning that package X requires package Y ==1.0 but you have 1.1 which is incompatible instead of just failing. That's the feature that I'd like to have, basically a "--just-warn" flag or a way to set a package as "version does not matter".
> What’s a typical or better way of handling in-house packages?
Fixing your dependencies properly, but on some older codebases that pull also old dependencies this be a headache.
For example "Pillow" a Python image library is a dependency in just about everything that manipulates images. This means that one package might have >=9.6<9.7, some package will have ==9.8 and another will have >=10<11. In practice it never matters and any of those version would work but you have a "version deadlock" and now you need to bump the version in packages that you may not actually own. Having some override of "this project uses Pillow==10, if some package ask for something else, ignore it" is something that pip does that uv doesn't.
> uv's handling of pre-releases is... unusual. https://github.com/astral-sh/uv/issues/10138
> You don't even need to know there is a venv, or what activation means.
> All those commands update the lock file automatically and transparently.... It's all taken care of.
When is the python community going to realize that simple is the opposite of easy? I don't see how hiding these aspects is desirable at all; I want to know how my programming tools work!
With all due respect to the author, I don't like the assumption that all programmers want magic tools that hide everything under the rug. Some programmers still prefer simplicity, ie understanding exactly what every part of the system does.
Nothing against uv, it seems like a fine tool. And I'm sure one could make a case for it on other technical merits. But choosing it specifically to avoid critical thinking is self-defeating.
There is simplicity of interface and then of implementation. If you try uv you will find it is both convenient and easier to understand than competing solutions, because everything _just works_ and you find the proof of how it works waiting in your `git status`. You can be assisted in knowing that it just works because its install takes no time and no system setup. It is slick.
I think that when you, as most HN commenters are wont to do, are trying to achieve a certain level of mastery in your craft, you're going to want to dive deep, understand the abstractions, have full control, be able to start with nothing but a UDP socket and a steady hand, but that approach misses a huge number of users of these languages. You don't want scientists to have to worry about setting up a venv, you want them to analyze their data and move on with their lives. Sure, people like you or me will be able to setup and source a venv in no time, without expending much mental energy, but we're not who this product is for. It's for the rest of the users, the 99% who aren't even aware places like this exist.
All that said, I'm pretty skeptical of using uv until their monetization strategy is clear. The current setup is making me think we're in for a Docker-like license change.
uv may be an improvement, but the Python packaging hell is a cultural problem that will not be solved without changing culture. And the main cultural issue is: 1. Depending on small and huge packages for trivial things. 2. A culture of breaking API compatibility. The two things combined create the mess we see.
This is not packaging hell, this is you having a personal problem with how other developers work.
Python has come an immensely long way in the world of packaging, the modern era of PEP 517/518 and the tooling that has come along with it is a game changer. There are very few language communities as old as Python with packaging ecosystems this healthy.
I've had conversations with members of SG15, the C++ tooling subgroup, where Python's packaging ecosystem and interfaces are looked on enviously as systems to steal ideas from.
I am a casual python user, and for that I love uv. Something I haven't quite figured out yet is integration with the pyright lsp - when I edit random projects in neovim, any imports have red squiggles. Does anyone know of a good way to resolve imports for the lsp via uv?
I start a shell with "uv run bash" and start neovim from there. I'm sure there's other ways but it's a quick fix and doesn't involve mucking around with neovim config.
That's brilliant, thanks!
EDIT - 'uv run nvim' works also
You probably need to make sure you have the correct python version set. By default it is in the .venv directory with uv
I really like uv and I have successfully got rid of miniconda but : - I wish there was a global virtual environment which could be referenced and activated from terminal. Not every new scripts needs their own .venv in their respective folder. uv takes the route of being project centered and based on file system, this works for me most of the time but sometime it doesn't. - I wish we could avoid the .python_version file and bundle it in the pyproject.toml file.
I've been using Hermit to install uv, then pointing scripts at $REPO_ROOT/bin/uv. That gives you a repo where the scripts can be run directly after cloning (Hermit is smart enough to install itself if necessary).
Unfortunately, Hermit doesn't do Windows, although I'm pretty sure that's because the devs don't have Windows machines: PRs welcome.
https://github.com/cashapp/hermit
Is there a conda to uv migration tutorial written by anyone?
I have installed miniconda system-wide. For any Python package that I use a lot, I install them on base environment. And on other environments. Like ipython.
For every new project, I create a conda environment, and install everything in it. Upon finishing/writing my patch, I remove that environment and clean the caches. For my own projects, I create an environment.yaml and move on.
Everything works just fine. Now, the solving with mamba is fast. I can just hand someone the code and environment.yaml, and it runs on other platforms.
Can someone say why using uv is a good idea? Has anyone written a migration guide for such use cases?
I am mightily impressed by one line dependency declaration in a file. But I don't know (yet) where the caches are stored, how to get rid of them later, etc.
It seems like uv doesn't target replacing pipenv...? No mention of it in their docs and there is an open Github issue about it.
I have yet to learn uv, but I intend to. Still, having to ".venv/bin/activate" to activate the virtualenv is a lot less ergonomic than "pipenv shell".
It does replace pipenv. They might not mention it because it's not widely used these days; most pipenv users switched to poetry sometime over the last half-decade or so.
There is a request for `uv shell` or similar[0], but it's trickier than it looks, and even poetry gave up `poetry shell` in their recent 2.0 release.
[0]: https://github.com/astral-sh/uv/issues/1910
Use direnv for automatically creating, activating, and deactivating your virtual environments. I don't know how anyone lives any other way. Just put this in a .envrc file (substituting whichever python version you are using):
Looks like direnv can be extended to use uv:https://github.com/direnv/direnv/wiki/Python#uv
I use autoenv, which runs a .env file when switching to a folder with it, and of course it has an activation command.
You don't need to activate anything with uv, all commands do it in the venv automatically, and including uv run.
While working in Django projects, one would prefer to have an environment activarted to perform all kinds of django-admin commands, I certainly wouldn't want to do via `uv run`.
Also, `nvim` is started with an environment activated if you want all the LSP goodies.
`uv run` is good for some things, but I prefer to have my venv activated as well.
Soon uv will invlude a task runner that will take care of that use case but I get your point.
Care to elaborate? I'm not watching uv development closely, something that has been announced?
It is common to have a task runner to abstract away commands.
doit, poethepoet, just... They are simpler than builders like make or maeven, and more convenient than aliases.
E.g: i don't run ./manage.py runserver 0.0.0.0:7777, I run "just openserver".
Poetry, cargo and npm have support for this natively, and there is an open ticker for this in uv too.
So you would not do "uv run manage.py runserver" but "uv serve".
Yeah, I use that with `rye` now.
But still, it's not good enough for Django as there are too many management commands and I don't want to configure them in pyproject.toml file, especially since some of them take additional arguments... There is no point in using anything but django-admin command (I do have a wrapper around it, but the point remains) and that requires activated venv.
How exactly does `uv` determine which is "the" venv? Is it simply based on the working directory like the `direnv`/`autoenv`/etc. workflows others are describing here?
It does seem like people have use cases for running code in a different environment vs. the one being actively used to develop the package.
Honest question: is uv more reproducible/portable than cramming your Python project into a Docker container? I've used pyenv, pip, venv, and a couple of other things, and they all work fine, at first, in simple scenarios.
Docker isn't a project manager, I'm struggling to see the comparison. If you have an app (api/web etc) you would use uv to manage dependencies, lock files and a locam virtual environment for development, and then you could install the same dependencies and the project in a docker image also for deployment.
People certainly use Docker for this purpose. Need a new package? Add a pip install line and rebuild the image.
I agree it isn’t the best use of Docker, but with the hell that is conda (and I say this as someone who likes conda more than most other options) and what can feel like insanity managing python environments, Docker isn’t the worst solution.
All that said, I moved to uv last year and have been loving it.
Yeah, I also used Docker (actually, Podman) as an alternative Python package manager and it worked well enough. Most of all, it felt somewhat cleaner and more reproducible than using plain virtualenv.
Of course, I migrated from it after I learned uv.
> Honest question: is uv more reproducible/portable than cramming your Python project into a Docker container?
Yes (unless you use uv in your Dockerfile). I mean, a Docker container will freeze one set of dependencies, but as soon as you change one dependency you've got to run your Dockerfile again and will end up with completely different versions of all your transitive dependencies.
People used pip-tools for this prior to uv (uv also replaces pip-tools).
I've literally never heard of that until just now, and I've heard of a lot of Python packaging tools.
As SRE with a ton of python stuff, nothing has beat Dev Containers with VSCode for not losing my mind with Python.
For running containers, pip is best way to go just to keep dependency requirements to minimum.
It’s less reproducible than docker (assuming the pip usage is correct). Docker specifies a lot of OS properties that UV ignores.
That being said, UV is great.
I've seen a lot of docker images move to uv inside their docker file.
Use both.
even if you go through the hassle of using docker for your local dev environment, you still need something to install dependencies in a reproducible way when you rebuild the image.
I can't speak to uv from an app dev perspective (I use Go for that), but as someone who dips into the Python world for data science reasons only, uv is great and I'm thankful for it.
Just going to plug https://mise.jdx.dev as a perfect accompaniment to uv. It simplifies installing tooling across languages and projects. I even install uv via mise, and it uses uv under the hood for Python related things.
Adding to the article (which I agree with): Lack of `uv pip install --user` has made transitioning our existing python environment a bit more challenging than I'd like, but not a deal breaker.
Out of curiosity, how does `--user` fall in your use case? It got me confused because this flag makes it install to a central location within the user home directory and not to a virtual environment.
Now we should just figure out why to stop here. Why not write everything in Rust? Recently I have moved all my projects to Rust from Python and never looked back. Of course we need projects like Torch and we are not yet there, but those simpler projects that do not require GPU libraries Rust is great.
"It's written in Rust" is not responsible for most of the improvements on offer here. (TFA barely mentions Rust, to its credit.) Many of them come from algorithmic improvements, better design decisions, and simply just not having the tool reside in the same environment as the installation target. (It is perfectly possible to use Pip cross-environment like this, too. People just don't do it, because a) they don't know and b) the standard library `venv` and `ensurepip` tools are designed to bootstrap Pip into new virtual environments by default. My recent blog post https://zahlman.github.io/posts/2025/01/07/python-packaging-... offers relevant advice here, and upcoming posts are in the works and/or planned about design issues in Pip.)
If your purpose is to denigrate Python as a language, then uv isn't solving problems for you anyway. But I will say that the kind of evangelism you're doing here is counterproductive, and is the exact sort of thing I'd point to when trying to explain why the project of integrating Rust code into the Linux kernel has been so tumultuous.
Do you think that pip could be re-implemented in Python and it would result in this performance that we are observing with uv?
I don't suppose it could be as fast as uv is, but it could be much closer to that than where it is now.
One immediate speed-up that requires no code changes: when uv creates a venv, it doesn't have to install Pip in that venv. You can trivially pass `--without-pip` to the standard library venv to do this manually. On my system:
For comparison: (which is around twice as long as Pip actually takes to install itself; I plan to investigate this in more detail for a future blog post.)To install in this environment, I use a globally installed pip (actually the one vendored by pipx), simply passing the `--python` argument to tell it which venv to install into. I have a few simple wrappers around this; see https://zahlman.github.io/posts/2025/01/07/python-packaging-... for details.
In my own project, Paper, I see the potential for many immediate wins. In particular, Pip's caching strategy is atrocious. It's only intended to avoid the cost of actually hitting the Internet, and basically simulates an Internet connection to its own file-database cache in order to reuse code paths. Every time it installs from this cache, it has to parse some saved HTTP-session artifacts to get the actual wheel file, unpack the wheel into the new environment, generate script wrappers etc. (It also eagerly pre-compiles everything to .pyc files in the install directory, which really isn't necessary a lot of the time.) Whereas it could just take an existing unpacked cache and hard-link everything into the new environment.
It's a lot harder to write rust than python.
If you check the TCO than probably it pays off. I am not sure about how much harder it is. I do not use lifetimes and I clone a lot. Still the performance and the reliability a Rust project has vs. Python is insane.
I think it depends a lot on what the task is.
What’s a good flask (+jinja) equivalent for rust?
Actix + minijinja
Actix is just one of many web frameworks, minijinja is an implementation of jinja2, by the original author.
I use Axum and Askama for such usecases.
Because Rust isn't Python.
Feel like that's super-obvious but yet here we are.
Yeah, if Python sucks at managing its own packages and you need Rust, why use Python at all?
That's the thing: you don't actually need Rust. uv has simply chosen it as an implementation language, but there isn't anything about Python that inherently prevents it from being used to write tools better than Pip. The problems with Pip are problems with Pip, not problems with Python.
(And many of them are completely fake, anyway. You don't actually need to spend an extra 15MB of space, and however many seconds of creation time, on a separate copy of Pip in each venv just so that Pip can install into that venv. You just need the `--python` flag. Which is a hack, but an effective one.)
(Last I checked, the uv compiled binary is something like 35MB. So sticking with a properly maintained Pip cuts down on that. And Pip is horrendously bloated, as Python code goes, especially if you only have the common use cases.)
Maybe because there would be blood on the streets, because people would start killing each other over atrocious build times?
Interesting. So pip install times did not make them to kill each other and the result is sometimes works, but if we wait on cargo build somehow it triggers them.
You can get pretty far without needing to run pip. Whereas you can't change anything in a rust codebase without compiling it.
As a dabbler (using python mostly for internal engineering tools and post processing of data) I am using uv at home but not in a professional setting.
The latest release is at 0.6.1, what is missing (roadmap/timeline wise) for uv to exist as a 1.0 release?
I tried out uv a bit ago and dropped it. But about two weeks ago, I switched to it and migrated two projects with no issues.
Things like pypi sources per dep are there finally.
I still find rough points (as many others pointed out, especially with non sandboxed installs), that are problematic, but on the whole it’s better than Mamba for my use.
Is uv better than micromamba? I tried using uv once and got some big ugly error I don't remember, and that was the end of that, whereas mm just worked (perhaps due to my familiarity). It was a project with the usual situation, i.e., torch, numpy, cuda support, nvcc, all had to play nicely together and satisfy requirements.txt.
How long ago did you try uv?
It just hit its 12 month birthday a few days ago and has evolved a LOT on those past 12 months. One of the problems I ran into with it was patched out within days of me first hitting it. https://simonwillison.net/2024/Nov/8/uv/
Just a few weeks ago. Sadly I can't remember the specifics but meta-management toolchains are always a hard sell if what you currently have works and is Good Enough. mm is quite fast compared to anaconda, though not perfect. Maybe I'd also benefit from uv having an environment import feature, since I have like 30 *conda environments by now.
I used uv for the first time a few months ago and it was a total revelation (previously I used venv and pip). I would be happy if pip itself and conda went away (although that won't happen).
I excited to try uv for my next project. It seems like a good future proof way to package python packages in Nix. https://github.com/pyproject-nix/uv2nix
Really like uv too but surprised he doesn’t mention the lack of conda compliance. Some scientific packages only being available on conda is the only reason I can’t use uv (but micromamba) for some projects.
conda compiance is nearly impossible to get because the typical anaconda project doesn't exist: it's a separate ecosystem with huge variability that is by design incompatible with everything else and that no two teams use in the same way.
I didn't find any mention comparing it to rye. Anyone have any insight? I am pretty distant from the day to day Python ecosystem lately
Rye is softly being sunset in favor of `uv` (though still officially supported, and I haven't heard of any plans to change that). As it says on https://rye.astral.sh/,
> If you're getting started with Rye, consider uv, the successor project from the same maintainers.
> While Rye is actively maintained, uv offers a more stable and feature-complete experience, and is the recommended choice for new projects.
It also links to https://github.com/astral-sh/rye/discussions/1342.
The astral team, creators of uv, have also subsumed rye
So it’ll get rolled into UV
uv has been fantastic for most of my personal projects. It feels so much smoother than any other python tooling combo I tried in past. That said, it just does not work well behind corporate proxies. This is single most annoying thing that has stopped me from recommending it at work.
Astral is doing phenomenal work.
I'm extremely excited about the ruff type checker too.
Has anyone built pipy packages with `uv`? Does doing so affect the end-user at all?
What do you mean by building a pipy package?
If you mean creating and publishing packages to PyPI end users can't tell if you used uv or poetry or something else.
Works fine. But for people on Github for now I recommend using uv only for building distributions and using the official PyPA GitHub action for publishing them to PyPI. This way you can take advantage of attestation, something not yet supported by uv.
How are they handling the returns VCs expect through this free software? If it's so easy to deploy, surely we should expect a Docker-like licensing model in the near future?
uv, venv, conda, brew, deb, nix, guix, chocolatey, containers, jails... One day we will finally find the ONE solution to this problem.
call it: one
uv is just so damn good.
All else equal, I prefer julia to python, but uv makes the python experience so much nicer. I'd love it if julia copied uv and replaced the Pkg system with it.
What are some of the issues you've had with the Pkg system?
Has someone migrated from poetry to uv? Any benefits?
Yes. The main benefits for me and my coworkers are speed and automatic fetching of Python interpreter. It's so fast that it's a pleasure to use (same thing could be said about ruff vs. black). And the fact that it downloads the right Python interpreter specified in the project's pyproject.toml means my coworker don't have to care about installing and managing Python versions. They don't even need to have a Python installed in the first place.
Otherwise, it works pretty much like Poetry. Unfortunately Poetry is not standards-compliant with the pyproject.toml, so you'll have to rewrite it. There are tools for this, never bothered with them though.
In addition to all the benefits others have mentioned, biggest benefit for me was that with uv you can easily bring your own build system (scikit-build-core in my case). Poetry comes with its own build system that didn't work for my needs and working around that was a massive pain and quite fragile. With uv I can use the build system that works best for me. On the whole uv is less opinionated than poetry. By default it will do sensible things, but if that thing doesn't work in your particular weird case, it is much easier to make uv work for you than poetry. Poetry gets very angry if you try to hold it wrong.
Mostly less modes of failures, particularly at bootstrapping.
great write up! have only dabbled in python, but dabbled enough to understand and _feel_ each dotpoint under "what problems uv tries to solve".
The challenge I have with adopting uv is it feels like it doesn't have a great replacement for poetry's `[tool.poetry.scripts]` block.
For instance, from a personal project that uses a src layout, without being a package, I have this in my pyproject.toml:
[tool.poetry] ... packages = [{ include = "*", from = "src", format = "sdist" }] ...
[tool.poetry.scripts] botch = "launcher:run_bot('botch')" beat = "launcher:run_bot('beat')"
I can't find any way to get that working in uv without some pretty major refactoring of my internal structure and import declarations. Maybe I've accidentally cornered myself in a terrible and ill-advised structure?
You can do this:
Assuming you have src/example.py with a function called hello, then "uv run hello" will call that function. I think you also need to have a (empty) src/__init__.py file.It looks like uv will get script support eventually: https://github.com/astral-sh/uv/issues/5903
is this better than miniconda?
Ok, so apparently `uv` has nothing to do with `libuv`.
Author here. A.m.a
Great overview thanks
I just reviewed uv for my team and there is one more reason against it, which isn't negligible for production-grade projects: Github Dependabot doesn't handle (yet) uv lock file. Supply chain management and vulnerability detection is such an important thing that it prevents the use of uv until this is resolved (the open github issue mentions the first quarter of 2025!)
uv export?
Yes. Python tooling has been shit for decades and `uv` is actually good.
> There are a lot of different ways to install Python, all with different default settings, and gotchas.
With uv, there is now one more.
https://xkcd.com/927/
Arnim Ronacher, author of rye (later uv) has very clearly highlighted that exact xkcd when he started working on rye. But he still decided that it was worth a try and as it turns out, rye/uv has become something that has a realistic chance of becoming the way to use python for most use-cases.
Cool point, thank you. I think his name is Armin.
Thanks. You are right. Unfortunately I can't edit my post any longer.
I thought this was going to be about the UV rays from the sun... But it's another python package manager. We're running out of names.
I assumed it was going to be about libuv, the event loop library that Node uses.
Were you considering migrating to ultraviolet light? Vitamin D is indeed important, but that mutagenesis is no joke. I'd suggest against it.
I read it as "should you migrate to a more or less sunny region".
I think you mean running out of acronyms.
What is CS?
1. Computer Science
2. Customer Service
3. Clinical Services
4. Czech
5. Citrate synthase
6. Extension for C# files
It's Counter Strike of course
Common Sense
Can anyone explain to a non-python developer why python infrastructure is so much broken around the version management?
It looks to me that every new minor python release is a separate additional install because realistically you cannot replace python 3.11 with python 3.12 and expect things to work. How did they put themselves in such a mess?
Python code rarely breaks between minor version releases in my experience. Python 3.5 introduced the `async` keyword (PEP 492), and Python 3.7 changed `StopIteration` handling in generators (PEP 479).
I have seen references to using uv for Python package management before and been thoroughly confused. I never realized it was not the same thing as the very nice asynchronous cross-platform library libuv (https://libuv.org/) and I could never figure out what that library had to do with Python package management (answer: nothing).
Maybe we need a Geographic Names Board to deconflict open source project names, or at least the ones that are only two or three characters long.