Honestly the one on my phone is more for fun than anything else - I use it to show people that LLMs can run on personal devices but rarely do anything useful with it.
My favourite example prompt for demos is "Write an outline of a Netflix Christmas movie where a topical-profession falls in love with another topical-profession" - customized for the occasion.
e.g. "Write an outline for a Netflix Christmas movie set in San Gregorio California about a man who runs an unlicensed cemetery falling in love with a barrister at the general store" - result here: https://bsky.app/profile/simonwillison.net/post/3ldthrqb6c22...
Great! Maybe I can finally learn what I'm suppose to be using generative AI for to be more productive. I'll be tuning in and spinning up whatever models/tools they suggest, but the longer this tech wave occurs the more confident I am that gen-ai is going to tally up to be an at-most 3% lift on global productivity.
I was pretty cynical trying earlier models, but with Gemini Flash 2.0 I felt like there was a pretty significant boost in usability and capabilities.
In particular I've found that these tools make it a lot easier to explore or get started with unfamiliar domains. One of my big issues has often been decision paralysis, so having a tool to help me narrow down the list of resources and make it more approachable has been a huge win.
My general experience has been that getting AI tools to directly do stuff for you tends to produce pretty bad results, but you can use it as a force multiplier for your own capabilities. If I'm confused or uncertain about how to do something, AI tools are usually pretty good at clarifying what needs to be done.
Maybe there’s a particular cognitive profile that benefits most from LLM chat bots? I’ve tried multiple times to realize this force multiplier in my life for everything from day-to-day stuff to picking up new things, using the latest paid bots with the best models, and I’ve persistently found them to be awkward, inaccurate, difficult to pin down into giving useful info rather than a bunch of non-committal pablum without hallucinations, etc etc etc
They’re useful for tasks that don’t require correctness: brainstorming, exploratory research, sketching out the vague shape of a solution to a problem.
They’re mediocre to awful at consistently following instructions, unless you have the fortune of having a task and domain that are well represented in the post-training data.
Yesterday I needed to generate ~500 filenames given a (human written) summary of each document’s contents. This seemed to be the perfect task to throw an LLM into a for loop. Yet it took three hours of prompt engineering to get a passable set of results - yes, I could’ve done it by hand in that time. Each iteration on the prompt revealed new blind spots, new ways in which the models would latch onto irrelevant details or gloss over the main point.
In my cognitive landscape, abstract reasoning is a huge spike for me while linear reasoning is closer to average. I just don’t think I need help with all of that exploratory stuff — I need something that’s going to root me in reality with hard facts, figures, and processes. I can see how folks that don’t exist primarily in “shower thoughts” land, like I do, and have better intuition with the concrete linear stuff might get more use out of it than me. Conversely, perhaps I get a bit more out of traditional search engine workflows than others?
That would definitely account for the difference in my perception of their utility vs other peoples’.
My experience has been that converting abstract ideas into a set of actionable steps is one of the areas where AI tooling is really useful. So for example, if I want to generate some music programmatically for a small demo, it gives me concrete suggestions of the existing tool landscape and it helps me generate a plan of action. Then I fill in the remaining gaps myself in order to execute.
Sometimes it helps to have it outline the task first with a prompt like "Suggest good guidelines for [task]" and then "Follow those guidelines for [task]" once you check to make sure the guidelines are good. LLMs aren't good at inventing processes on the fly, but they are good at writing plausible-sounding and often even correct processes, which they then follow.
Applied to ChatGPT's 4o:
> "Suggest good guidelines for generating ~500 filenames given a (human written) summary of each document’s contents"
> "Pretend you have this data and create 10 example file names."
> "Financial_Report_Q1_2024.xlsx (Summary: "Quarterly financial performance for Q1 2024, including revenue and expenses.")"
> "Marketing_Strategy_SocialMedia_2024.docx (Summary: "Comprehensive marketing strategy focusing on social media growth for 2024.")"
> "HR_Employee_Handbook_v2.1.pdf (Summary: "Updated version of the company employee handbook with revised policies.")"
And so on. In general, you have to either create the process or have it create a process and refine it if it isn't quite there. You can even use standard processes and guidelines when they're available. Then it usually does a great job.
Interesting strategy— I’ll check it out. I get a bit annoyed that I have to essentially trick a tool into giving me useful answers. What’s the chance that a nontechnical user that was just sold a subscription to a Magic answer-generating machine would even realize they needed to figure out how to do things like that? LLM chatbots are amazing but they’re lagging behind the marketing inferences, and that’s a recipe for frustration that providers need to address. I think a lot of these problems need to be fixed with product design and communication.
The reasoning models improve on this kind of thing, but that just means they do even better when provided with or asked to create and follow a process. o3-mini-high, since it shows its work, didn't just spit out a process. It showed the process of creating the process in its reasoning block. It considered the format above in there and removed and added things while considering different levels of detail before the final version before producing anything.
Here’s something scary I recently learned: Cedar-Sinai in LA (major hospital) used to have 15 lawyers on contracts. Now they have 1 and an AI app reviewing contracts.
Those are 14 lawyers gone. That’s more than 3% on “productivity”, but 14 people who lost their jobs. And that’s now with the current state of things.
Lawyer here - there are fields, and law is definitely one of them, where labor is the major cost.
That labor is not often used sanely.
It is common to use lawyers costing hundreds per hour to do fairly basic document review and summarization. That is, to produce a fairly simple artifact.
Not legal research, not opinionated briefing.
But literal: Read these documents, produce a summary of what they say.
While I can't say this is the same as what you are talking about ("contracts review" means many things to many people), i'm not even the slightest bit surprised that AI is starting to replace remarkably inefficient uses of labor in law.
I will add: Lots of funding being thrown at AI legal startups around on products that do document review and summarization, but that's not the big fish, and will be commodity very quickly.
So i expect there will be an ebb and flow of these sorts of products as the startups either move on to things that enable them to capture a meaningful market (document review ain't it), or die and leave these companies hanging :)
But how do you know if AI is able to pick out the salient bits for summarization? Like some nasty point poison pill hidden in there? Wouldn’t you want an expert for such things?
Your last statement hints at a big consideration: accountability. One lawyer on a formerly 15 lawyer staff is accountable for 15 lawyers worth of potential mistakes, and we know that “but the AI did it!” doesn’t hold water in law.
The main one i think is probably wrong is that there was 15 lawyers worth of work being done before (when measured by some average lawyer standard).
For example, it's possible there was only really 1 lawyer worth of work being split 15 ways, so each lawyer was really only responsible for 1/15th of an average lawyers amount of work :)
In that scenario, they'd only be responsible for 1 average lawyers worth of mistakes now.
Is that realistic? Who knows. I've definitely seen that level of "waste" (for lack of a better term) before in law firms :)
Even in the scenario you are positing, it's not obvious it matters as much as you seem to think it does.
If the per-lawyer mistake rate was low enough, it may be that 15x that rate simply does not matter.
These kinds of contracts are fairly standardized, and so they are mostly looking at the differences from last time. Those differences are often not legal as much as factual. IE the table of costs changed, not the legal responsibility.
So the main thing mistakes get you is maybe cost (if mistakes matter at all).
This isn't like they are seeing brand new from scratch contracts constantly that require brand new analysis.
Even if they were, like I said, the main issue with a mistake is cost.
For all we know, the AI company also agreed to indemnify them for a certain rate of mistakes or something (which wouldn't be hard to get insurance for).
I'm not actually a fan of AI taking necessary jobs, but I think the view here that this is sort of life or death is strange.
I'd be much more worried about AI handling criminal defense in some semi-autonomous fashion than this.
Undoubtedly. Happy to be disabused of my misgivings.
> The main one i think is probably wrong is that there was 15 lawyers worth of work being done before (when measured by some average lawyer standard).
For example, it's possible there was only really 1 lawyer worth of work being split 15 ways, so each lawyer was really only responsible for 1/15th of an average lawyers amount of work :)
> In that scenario, they'd only be responsible for 1 average lawyers worth of mistakes now.
Is that realistic? Who knows. I've definitely seen that level of "waste" (for lack of a better term) before in law firms :)
Even in the scenario you are positing, it's not obvious it matters as much as you seem to think it does.
> If the per-lawyer mistake rate was low enough, it may be that 15x that rate simply does not matter.
Well having done quite a bit of work with attorneys no longer practicing law, I’m definitely familiar with the gripes about inefficiencies and running up hours— especially during litigation in the larger firms. Even not being as efficient as they could be, assuming 1400% inefficiency or whatever seems much less reasonable than assuming 0% inefficiency. It’s obviously not either of those extremes, but I have a hard time imagining it’s even close to the former.
> These kinds of contracts are fairly standardized, and so they are mostly looking at the differences from last time. Those differences are often not legal as much as factual. IE the table of costs changed, not the legal responsibility.
So the main thing mistakes get you is maybe cost (if mistakes matter at all).
> This isn't like they are seeing brand new from scratch contracts constantly that require brand new analysis.
Even if they were, like I said, the main issue with a mistake is cost.
> I don’t actually know what kind of contracts they were working on so I’ll have to take your word on that.
> For all we know, the AI company also agreed to indemnify them for a certain rate of mistakes or something (which wouldn't be hard to get insurance for).
I was involved with the AI legal tool scene indirectly for about a decade, but haven’t been for a couple years, and am only getting info indirectly from people I know that still are. (Actually clicking through the top results on Google, I’m actually on a first-name basis with the first founder there was a picture of. I didn’t know he started a new company though so I guess we’re not THAT close!) My knowledge could be out of date, but I’ve not seen one of these services offer indemnity for mistakes and ostensibly for good reason — the latest data I’ve seen shows that attorney-targeted legal tools make more mistakes than people hoped. I also know nothing about legal insurance, but I don’t think it would be smart to insure an organization that just canned 94% of their counsel in favor of tools known to not be particularly reliable when their workload probably has not changed. Whether they did it because either they care more about payroll than reliability, or they had the poor judgement to maintain 1500% staffing levels until then, it still seems like a pretty poor bet.
> I'm not actually a fan of AI taking necessary jobs, but I think the view here that this is sort of life or death is strange.
I certainly don’t think it’s life or death, and of all the places in our society that could use a little more efficiency, legal services is right up there. That said, the fact that it’s not life or death also doesn’t mean that it’s totally fine either.
> I'd be much more worried about AI handling criminal defense in some semi-autonomous fashion than this.
Haha— frankly, I don’t give a damn if the hospital signs a terrible contract that costs them a bazillion dollars as long as they don’t pull a Steward and stop purchasing basic medical supplies.
SURELY public defenders are an attractive target for the outright person-replacing sort of efficiencies, but I have a hard time imagining that would pass muster. I can definitely see some supposedly adversarial plea agreement system being implemented by more authoritarian jurisdictions as an incremental expansion of the NN sentence-recommendation type of tools. My gut says the bigger semi-automation risk there is overworked public defenders’ being lulled into false confidence in legal and general office LLM type tools (messages summaries, auto scheduling appointments, etc) without having the time to give them the scrutiny they need. I’d be shocked if that wasn’t already happening though. Hey maybe with a bunch of attorneys having newfound time on their hands they can bone up on criminal law and provide some relief for the public defender staffing crisis.
Probably the bigger factor is consequence, not just rate at what’s missed.
But zoom out and we see job loss that could be stated as productivity, but what do those lawyers and law degrees do that’s more productive for society? They’re already near the top of an information economy; we’d need to invent an entire next phase. That takes time and pain management; and yet those 14 jobs are gone now.
I used to think all of this was much further away and we had time. But now I’m seeing that we don’t actually need hallucinations fixed, actual AGI, or major quality boosts before displacement begins.
MTTL really applies more to a startup portfolio, has long does the average startup from an incubator operate before it receives it's first lawsuit.
For your situation, you really want to measure how many PII records you handle per lawsuit. That way you can accurately measure the lawsuit cost per record and compare it to revenue per record to see if you're profitable.
> Maybe I can finally learn what I'm suppose to be using generative AI for to be more productive.
So many things. It's a general-purpose "thing doer" in many situations where you otherwise wouldn't have one. Let me give a super-simple example - not a high-value one, but an example of obvious value IMO.
Say for some reason, you have a screenshot of a bunch of text. Maybe you took a picture of a page from a book or something, idk. Now you want it in textual form. You can throw it in ChatGPT and ask it to give you the text, and a few seconds later you have the text.
I'm not saying there are no other solutions for this - there are. You can look for some software to do OCR or something. But that's what makes ChatGPT or others general-purpose - they're a one-stop shop for a lot of different things, including small one-off tasks like this. I can name a dozen other one-off tasks that it helps me with. Again, not the most high-value things it helps me with (that'd be programming help), but an undeniable example of value, IMO.
How would you do this without an LLM? (I personally would've just typed it up myself, probably.)
OK, first of all, really cool. I don't think I knew this! Thank you.
But how about a next step that I use LLMs for - taking this text from a document and reformatting it as, say, bullet points.
E.g. I literally had this in a Jupyter notebook: [some_var_name, some_other_name, ...] and a bunch of those, and I wanted them redone as bullet points. It was a bit more complicated than that but I'm simplifying for the example. I literally took a screenshot, through it in ChatGPT and got back the correctly-formatted list.
There are other ways to solve something like this of course (normally I'd put it in VSCode and use multiple cursors or macros), but I don't think there's anything that can go from a screenshot and a one-sentence description, to having finished the task, all in a single tool (that can also solve 100 other problems that come up in my work similarly easily).
It mostly just uses those other solutions in the background. You can open the Analyzing and see it building a script with OpenCV for vision tasks. It's a handy front-end.
Yeah, but finally everybody can now bulshit freely aided by their personal llm, not just the natural bulshitters; but I forsee that bullshitting may spike up a bit now then fall out of fashion for something even more fleeting as people’s attention span is getting more and more fractured.
Odd timing - right in the middle of RSA in SF. Llama (and other US-trained open weight models) are key to national security and cybersecurity, and there's a built in audience 30 minutes away if this were two days later or two days earlier...
The “con” being the claims of “open source” when Llama is at best “weights available”. It’s not even “open weights” since it has a proprietary license. But I’m sure that won’t stop Yann LeCun from repeating lies about how Meta is the leader in open source AI.
Since then the avg ad price, they have reported has risen for 12 quarters in a row, past 6 quarters its jumped 15-30%. The MO is to prey on small businesses and people who want attention, world wide, who don't know anything about advertising/marketing. Everything they know comes from what Google and Meta tell them.
Super thrilled about all the cross-functional synergies and ROI-optimized deliverables poised to disrupt the status quo and elevate the strategic framework.
"At LlamaCon, we’ll share the latest on our open source AI developments to help developers do what they do best: build amazing apps and products, whether as a start-up or at scale."
Strangely enough, I can work quite well without your help. I've been doing it professionally for 35 odd years. I'm "just" an engineer - no capital E - I simply studied Civil Engineering at college and ended up running an IT company and I'm quite good at IT.
What I would really like to see is really well indexed documentation written by people ie an old school search engine. Google used to do that and so did Altavista, back in the day.
I do not need or want a plethora of trite simulacra web sites dripping with AI wankery at every search term.
Indices are, by definition, lossy representations of their underlying data. If you use stemming and lemmatization to preprocess both documentation and query text, you're already departing from a truly hand-optimized indexing system, and choosing to have imperfect algorithms do things in a more scalable way. And indexing by embedding vectors that use LLMs to determine context are a natural extension of this, in my view. And on top of that, when you have a massive amount of candidate text to display to the user... is displaying sentence fragments one on top of the other the most optimal UX there? At a certain point, RAG becomes the answer to this question.
The problem, as you note, is that search engines and social media systems are incentivized to allow garbage content into the original set of things they index and surface, if that garbage content drives more attention to advertisements. But that's not a reason to reject the benefits that the underlying LLM technology can bring towards building good indexing on top of human-written documents. It just won't be done by the companies that used to do it.
I have tried ChatGPT quite a lot. CoPilot has failed to rock up on my KDE desktop - but I live in hope.
I really tried and I was both impressed and horrified in equal measure. I advise people to use them but treat them like a calculator that snorts cocaine.
A calculator is a useful tool but when they start going off the rails, things can start to get nasty.
I should be more precise: After messing around with generic questions and answers, I went for VMware PowerCLI to test it out. PowerCLI is PowerShell for VMware boxes. PowerShell is very popular so loads of input. PowerCLI is VMware so lots of input too but not quite so much as PS itself.
I tried to get ChatGPT to generate a PS script to patch a VMware cluster. The result was horrific and not even close. Bear in mind that the entirety of the VMware docs for PowerCLI are public and I wrote a script myself - its not perfect but good enough.
Oh and I am dropping VMware for good in favour of Proxmox. I have been a VMware consultant for 20+ years. Oh well.
I do find that for big tasks I generally have to scaffold it a bit, lead it in the right direction.
But it's also impressive the way it can do tasks that I can't: like writing complex TypeScript. I could spend hours on certain TypeScript challenges and not actually find a workable solution. With ChatGPT I can usually either get a solution, or convince myself it's not going to happen, within 5-10 minutes.
I agree with this. I have a normally inquisitive friend who, in response to his trade being threatened by AI, has decided that it's all a horror and he would rather avoid even reading or trying anything from the AI landscape. Bit like "I'd rather be unemployed and starving than have anything to do with all that."
All of my trades are likewise threatened, but I've found various AI options to be useful augments or interesting toys. Things like having ChatGPT check if a particular game already exists, or analysing cost/effort of lining a shed with gyprock vs ply, or analysing a chicken orchard with steel or timber, or instantly culling lists of extraneous options, or summarising someone's public writings on a particular topic, name ideas in various languages. If you have an experience eye to review suggestions, it's fantastic. Or having CoPilot/similar quickly juicing up a web page - something I could do manually, but would rather save time. Or learning how to build a game in a different language.
I guess I don’t see it as merely technology about which I might be reasonably enthused. Everything cool becomes bad or harnessed for evil in rather short order these days. The destructive potential of widespread deployment of LLMs seems so obvious that I don’t see why anyone would rush employ it in their work, let alone book a trip to a Meta-hosted hypefest for it.
I was that way about a year or two ago. The stuff is moving fast, the water is warm. Unless your objections are rooted in privacy or some other perceived misdeed, I say give it a whirl.
If nothing else, use Dall-E to draw stupid pictures to make you and friends laugh. :)
Probably not going to be saying much though. The state of real-time LLM-based conversation aids just isn’t where it needs to be for those folks to function in public effectively. I could foresee there being a heck of a slam broetry event happening at an after party, though.
Given how good Llama 3.1, 3.2 and 3.3 were I'm genuinely looking forward to news on Llama 4.
3.3 70B is the best model I've managed to run on my laptop, and 3.2 3B is my favourite model to run on my phone.
I'd love to hear more about how you're using models on a phone. Have you written anything about it?
Honestly the one on my phone is more for fun than anything else - I use it to show people that LLMs can run on personal devices but rarely do anything useful with it.
I use the MLC Chat app from the App Store: https://apps.apple.com/gb/app/mlc-chat/id6448482937
My favourite example prompt for demos is "Write an outline of a Netflix Christmas movie where a topical-profession falls in love with another topical-profession" - customized for the occasion.
e.g. "Write an outline for a Netflix Christmas movie set in San Gregorio California about a man who runs an unlicensed cemetery falling in love with a barrister at the general store" - result here: https://bsky.app/profile/simonwillison.net/post/3ldthrqb6c22...
fullmoon is a good FOSS iOS/macOS client for mlx
https://fullmoon.app/
Oh neat, that one has a feature that I am missing from MLC - it logs your conversations and lets you revisit them.
Still a bit too hard to copy and paste transcript aback out again though!
I wish they called it llamarama...
I would bet at least one person at Meta wanted to call it Llamapalooza.
In any case, I'm excited!
https://github.com/containers/ramalama
That way attendees could be called Llamarama Dingdongs.
Great! Maybe I can finally learn what I'm suppose to be using generative AI for to be more productive. I'll be tuning in and spinning up whatever models/tools they suggest, but the longer this tech wave occurs the more confident I am that gen-ai is going to tally up to be an at-most 3% lift on global productivity.
I was pretty cynical trying earlier models, but with Gemini Flash 2.0 I felt like there was a pretty significant boost in usability and capabilities.
In particular I've found that these tools make it a lot easier to explore or get started with unfamiliar domains. One of my big issues has often been decision paralysis, so having a tool to help me narrow down the list of resources and make it more approachable has been a huge win.
My general experience has been that getting AI tools to directly do stuff for you tends to produce pretty bad results, but you can use it as a force multiplier for your own capabilities. If I'm confused or uncertain about how to do something, AI tools are usually pretty good at clarifying what needs to be done.
Maybe there’s a particular cognitive profile that benefits most from LLM chat bots? I’ve tried multiple times to realize this force multiplier in my life for everything from day-to-day stuff to picking up new things, using the latest paid bots with the best models, and I’ve persistently found them to be awkward, inaccurate, difficult to pin down into giving useful info rather than a bunch of non-committal pablum without hallucinations, etc etc etc
They’re useful for tasks that don’t require correctness: brainstorming, exploratory research, sketching out the vague shape of a solution to a problem.
They’re mediocre to awful at consistently following instructions, unless you have the fortune of having a task and domain that are well represented in the post-training data.
Yesterday I needed to generate ~500 filenames given a (human written) summary of each document’s contents. This seemed to be the perfect task to throw an LLM into a for loop. Yet it took three hours of prompt engineering to get a passable set of results - yes, I could’ve done it by hand in that time. Each iteration on the prompt revealed new blind spots, new ways in which the models would latch onto irrelevant details or gloss over the main point.
In my cognitive landscape, abstract reasoning is a huge spike for me while linear reasoning is closer to average. I just don’t think I need help with all of that exploratory stuff — I need something that’s going to root me in reality with hard facts, figures, and processes. I can see how folks that don’t exist primarily in “shower thoughts” land, like I do, and have better intuition with the concrete linear stuff might get more use out of it than me. Conversely, perhaps I get a bit more out of traditional search engine workflows than others?
That would definitely account for the difference in my perception of their utility vs other peoples’.
My experience has been that converting abstract ideas into a set of actionable steps is one of the areas where AI tooling is really useful. So for example, if I want to generate some music programmatically for a small demo, it gives me concrete suggestions of the existing tool landscape and it helps me generate a plan of action. Then I fill in the remaining gaps myself in order to execute.
Thanks— I’ll play around with that workflow.
Sometimes it helps to have it outline the task first with a prompt like "Suggest good guidelines for [task]" and then "Follow those guidelines for [task]" once you check to make sure the guidelines are good. LLMs aren't good at inventing processes on the fly, but they are good at writing plausible-sounding and often even correct processes, which they then follow.
Applied to ChatGPT's 4o:
> "Suggest good guidelines for generating ~500 filenames given a (human written) summary of each document’s contents"
> "Pretend you have this data and create 10 example file names."
> "Financial_Report_Q1_2024.xlsx (Summary: "Quarterly financial performance for Q1 2024, including revenue and expenses.")"
> "Marketing_Strategy_SocialMedia_2024.docx (Summary: "Comprehensive marketing strategy focusing on social media growth for 2024.")"
> "HR_Employee_Handbook_v2.1.pdf (Summary: "Updated version of the company employee handbook with revised policies.")"
And so on. In general, you have to either create the process or have it create a process and refine it if it isn't quite there. You can even use standard processes and guidelines when they're available. Then it usually does a great job.
Interesting strategy— I’ll check it out. I get a bit annoyed that I have to essentially trick a tool into giving me useful answers. What’s the chance that a nontechnical user that was just sold a subscription to a Magic answer-generating machine would even realize they needed to figure out how to do things like that? LLM chatbots are amazing but they’re lagging behind the marketing inferences, and that’s a recipe for frustration that providers need to address. I think a lot of these problems need to be fixed with product design and communication.
It does seem like some of the messaging gets away from their actual documentation: https://platform.openai.com/docs/guides/reasoning-best-pract...
The reasoning models improve on this kind of thing, but that just means they do even better when provided with or asked to create and follow a process. o3-mini-high, since it shows its work, didn't just spit out a process. It showed the process of creating the process in its reasoning block. It considered the format above in there and removed and added things while considering different levels of detail before the final version before producing anything.
Maybe you can give some specific examples
For what — things that AI wasn’t helpful for when I gave it a shot?
Here’s something scary I recently learned: Cedar-Sinai in LA (major hospital) used to have 15 lawyers on contracts. Now they have 1 and an AI app reviewing contracts.
Those are 14 lawyers gone. That’s more than 3% on “productivity”, but 14 people who lost their jobs. And that’s now with the current state of things.
Lawyer here - there are fields, and law is definitely one of them, where labor is the major cost.
That labor is not often used sanely.
It is common to use lawyers costing hundreds per hour to do fairly basic document review and summarization. That is, to produce a fairly simple artifact.
Not legal research, not opinionated briefing.
But literal: Read these documents, produce a summary of what they say.
While I can't say this is the same as what you are talking about ("contracts review" means many things to many people), i'm not even the slightest bit surprised that AI is starting to replace remarkably inefficient uses of labor in law.
I will add: Lots of funding being thrown at AI legal startups around on products that do document review and summarization, but that's not the big fish, and will be commodity very quickly.
So i expect there will be an ebb and flow of these sorts of products as the startups either move on to things that enable them to capture a meaningful market (document review ain't it), or die and leave these companies hanging :)
But how do you know if AI is able to pick out the salient bits for summarization? Like some nasty point poison pill hidden in there? Wouldn’t you want an expert for such things?
Your last statement hints at a big consideration: accountability. One lawyer on a formerly 15 lawyer staff is accountable for 15 lawyers worth of potential mistakes, and we know that “but the AI did it!” doesn’t hold water in law.
There's a bunch of assumptions here.
The main one i think is probably wrong is that there was 15 lawyers worth of work being done before (when measured by some average lawyer standard).
For example, it's possible there was only really 1 lawyer worth of work being split 15 ways, so each lawyer was really only responsible for 1/15th of an average lawyers amount of work :)
In that scenario, they'd only be responsible for 1 average lawyers worth of mistakes now.
Is that realistic? Who knows. I've definitely seen that level of "waste" (for lack of a better term) before in law firms :)
Even in the scenario you are positing, it's not obvious it matters as much as you seem to think it does.
If the per-lawyer mistake rate was low enough, it may be that 15x that rate simply does not matter.
These kinds of contracts are fairly standardized, and so they are mostly looking at the differences from last time. Those differences are often not legal as much as factual. IE the table of costs changed, not the legal responsibility.
So the main thing mistakes get you is maybe cost (if mistakes matter at all).
This isn't like they are seeing brand new from scratch contracts constantly that require brand new analysis.
Even if they were, like I said, the main issue with a mistake is cost.
For all we know, the AI company also agreed to indemnify them for a certain rate of mistakes or something (which wouldn't be hard to get insurance for).
I'm not actually a fan of AI taking necessary jobs, but I think the view here that this is sort of life or death is strange.
I'd be much more worried about AI handling criminal defense in some semi-autonomous fashion than this.
> There's a bunch of assumptions here.
Undoubtedly. Happy to be disabused of my misgivings.
> The main one i think is probably wrong is that there was 15 lawyers worth of work being done before (when measured by some average lawyer standard). For example, it's possible there was only really 1 lawyer worth of work being split 15 ways, so each lawyer was really only responsible for 1/15th of an average lawyers amount of work :)
> In that scenario, they'd only be responsible for 1 average lawyers worth of mistakes now. Is that realistic? Who knows. I've definitely seen that level of "waste" (for lack of a better term) before in law firms :) Even in the scenario you are positing, it's not obvious it matters as much as you seem to think it does.
> If the per-lawyer mistake rate was low enough, it may be that 15x that rate simply does not matter.
Well having done quite a bit of work with attorneys no longer practicing law, I’m definitely familiar with the gripes about inefficiencies and running up hours— especially during litigation in the larger firms. Even not being as efficient as they could be, assuming 1400% inefficiency or whatever seems much less reasonable than assuming 0% inefficiency. It’s obviously not either of those extremes, but I have a hard time imagining it’s even close to the former.
> These kinds of contracts are fairly standardized, and so they are mostly looking at the differences from last time. Those differences are often not legal as much as factual. IE the table of costs changed, not the legal responsibility. So the main thing mistakes get you is maybe cost (if mistakes matter at all).
> This isn't like they are seeing brand new from scratch contracts constantly that require brand new analysis. Even if they were, like I said, the main issue with a mistake is cost.
> I don’t actually know what kind of contracts they were working on so I’ll have to take your word on that.
> For all we know, the AI company also agreed to indemnify them for a certain rate of mistakes or something (which wouldn't be hard to get insurance for).
I was involved with the AI legal tool scene indirectly for about a decade, but haven’t been for a couple years, and am only getting info indirectly from people I know that still are. (Actually clicking through the top results on Google, I’m actually on a first-name basis with the first founder there was a picture of. I didn’t know he started a new company though so I guess we’re not THAT close!) My knowledge could be out of date, but I’ve not seen one of these services offer indemnity for mistakes and ostensibly for good reason — the latest data I’ve seen shows that attorney-targeted legal tools make more mistakes than people hoped. I also know nothing about legal insurance, but I don’t think it would be smart to insure an organization that just canned 94% of their counsel in favor of tools known to not be particularly reliable when their workload probably has not changed. Whether they did it because either they care more about payroll than reliability, or they had the poor judgement to maintain 1500% staffing levels until then, it still seems like a pretty poor bet.
> I'm not actually a fan of AI taking necessary jobs, but I think the view here that this is sort of life or death is strange.
I certainly don’t think it’s life or death, and of all the places in our society that could use a little more efficiency, legal services is right up there. That said, the fact that it’s not life or death also doesn’t mean that it’s totally fine either.
> I'd be much more worried about AI handling criminal defense in some semi-autonomous fashion than this.
Haha— frankly, I don’t give a damn if the hospital signs a terrible contract that costs them a bazillion dollars as long as they don’t pull a Steward and stop purchasing basic medical supplies.
SURELY public defenders are an attractive target for the outright person-replacing sort of efficiencies, but I have a hard time imagining that would pass muster. I can definitely see some supposedly adversarial plea agreement system being implemented by more authoritarian jurisdictions as an incremental expansion of the NN sentence-recommendation type of tools. My gut says the bigger semi-automation risk there is overworked public defenders’ being lulled into false confidence in legal and general office LLM type tools (messages summaries, auto scheduling appointments, etc) without having the time to give them the scrutiny they need. I’d be shocked if that wasn’t already happening though. Hey maybe with a bunch of attorneys having newfound time on their hands they can bone up on criminal law and provide some relief for the public defender staffing crisis.
I mean, it's usually not that adversarial, but lawyers miss this stuff too sometimes.
Like anything else, it's a question of performance - if the AI misses it at the same or less rate than the lawyers, ...
If not, it's a question of whether the higher rate is acceptable. For these kinds of contracts, that's mostly about cost.
Probably the bigger factor is consequence, not just rate at what’s missed.
But zoom out and we see job loss that could be stated as productivity, but what do those lawyers and law degrees do that’s more productive for society? They’re already near the top of an information economy; we’d need to invent an entire next phase. That takes time and pain management; and yet those 14 jobs are gone now.
I used to think all of this was much further away and we had time. But now I’m seeing that we don’t actually need hallucinations fixed, actual AGI, or major quality boosts before displacement begins.
Going to be interesting to see the MTTL (mean time to lawsuit) on this. Sounds grossly negligent. I feel kinda sorry for the lonely lawyer.
As someone who spends a lot of time battling mis-use/over-use of PII constantly, I am adopting MTTL as a term of art. :D
MTTL really applies more to a startup portfolio, has long does the average startup from an incubator operate before it receives it's first lawsuit.
For your situation, you really want to measure how many PII records you handle per lawsuit. That way you can accurately measure the lawsuit cost per record and compare it to revenue per record to see if you're profitable.
Highly doubtful it is either negligence, or something they will get sued over. I don't even quite understand what you think they will get sued over.
Most of the policing would happen by courts and bar associations.
That's great. That means potentially slightly cheaper healthcare. A company shouldn't need so many lawyers unless it's a legal firm.
Do you have an article that has more information about this? I'd really like to learn more about what happened.
Can you share a source/article?
> Maybe I can finally learn what I'm suppose to be using generative AI for to be more productive.
So many things. It's a general-purpose "thing doer" in many situations where you otherwise wouldn't have one. Let me give a super-simple example - not a high-value one, but an example of obvious value IMO.
Say for some reason, you have a screenshot of a bunch of text. Maybe you took a picture of a page from a book or something, idk. Now you want it in textual form. You can throw it in ChatGPT and ask it to give you the text, and a few seconds later you have the text.
I'm not saying there are no other solutions for this - there are. You can look for some software to do OCR or something. But that's what makes ChatGPT or others general-purpose - they're a one-stop shop for a lot of different things, including small one-off tasks like this. I can name a dozen other one-off tasks that it helps me with. Again, not the most high-value things it helps me with (that'd be programming help), but an undeniable example of value, IMO.
How would you do this without an LLM? (I personally would've just typed it up myself, probably.)
> How would you do this without an LLM? (I personally would've just typed it up myself, probably.)
This has been a feature of Apple Preview (default image program) for years and years. You can just highlight text and copy it from a jpeg or png.
OK, first of all, really cool. I don't think I knew this! Thank you.
But how about a next step that I use LLMs for - taking this text from a document and reformatting it as, say, bullet points.
E.g. I literally had this in a Jupyter notebook: [some_var_name, some_other_name, ...] and a bunch of those, and I wanted them redone as bullet points. It was a bit more complicated than that but I'm simplifying for the example. I literally took a screenshot, through it in ChatGPT and got back the correctly-formatted list.
There are other ways to solve something like this of course (normally I'd put it in VSCode and use multiple cursors or macros), but I don't think there's anything that can go from a screenshot and a one-sentence description, to having finished the task, all in a single tool (that can also solve 100 other problems that come up in my work similarly easily).
It mostly just uses those other solutions in the background. You can open the Analyzing and see it building a script with OpenCV for vision tasks. It's a handy front-end.
Cursor Composer is a game-changer for greenfield projects, it's definitely not 3% change.
Folks are busy optimizing their own building, not telling you how to optimize yours...
Yeah, but finally everybody can now bulshit freely aided by their personal llm, not just the natural bulshitters; but I forsee that bullshitting may spike up a bit now then fall out of fashion for something even more fleeting as people’s attention span is getting more and more fractured.
It would be beneficial to have a hardware-optimized Llama lineup with a clearer naming scheme and distinct performance tiers, for example:
- Llama 4.0 Phone (Lite / Standard / Max) – For mobile devices.
- Llama 4.0 Workstation (Lite / Standard / Max) – For PCs and laptops.
- Llama 4.0 Server (Lite / Standard / Max) – For high-performance computing.
This approach would enable developers to select the appropriate model based on both device type and performance needs.
What do you think? For example now I feel like 3.3 70B is more for laptops/PCs, and the previous 3.2 3B for phones, is a bit confusing to me.
Odd timing - right in the middle of RSA in SF. Llama (and other US-trained open weight models) are key to national security and cybersecurity, and there's a built in audience 30 minutes away if this were two days later or two days earlier...
Maybe they should have used one of those newfangled AI automatic calendar manager things. Or maybe they were using one and shouldn’t have been.
It's sad how we are letting Meta get away with their abuse of the term "open-source" for their open weights models. :-(
Dates are awfully close to ICLR, but I suppose the audiences don’t really overlap.
The “con” being the claims of “open source” when Llama is at best “weights available”. It’s not even “open weights” since it has a proprietary license. But I’m sure that won’t stop Yann LeCun from repeating lies about how Meta is the leader in open source AI.
3 years ago, Meta Connect in 2022 was a different atmosphere. [0] Almost no-one cared.
That was close to the bottom of Meta's stock price.
[0] https://news.ycombinator.com/item?id=33087535
Since then the avg ad price, they have reported has risen for 12 quarters in a row, past 6 quarters its jumped 15-30%. The MO is to prey on small businesses and people who want attention, world wide, who don't know anything about advertising/marketing. Everything they know comes from what Google and Meta tell them.
Can you imagine how many LinkedIn thought leaders are going to be in attendance? Perhaps the greatest gathering of minds since the Manhattan Project.
Super thrilled about all the cross-functional synergies and ROI-optimized deliverables poised to disrupt the status quo and elevate the strategic framework.
Can't wait to delve into it!
So excited to engage the core on our future exciting product developments as a team!
LlamaCon: the greatest buzzword bingo convention opportunity in 2025
Double click with us.
Thanks! This made my day. ROFL.
"Super thrilled about all the cross-functional synergies ..."
Could you explain what that means - please?
It’s just fluff
... whoosh ...
"At LlamaCon, we’ll share the latest on our open source AI developments to help developers do what they do best: build amazing apps and products, whether as a start-up or at scale."
Strangely enough, I can work quite well without your help. I've been doing it professionally for 35 odd years. I'm "just" an engineer - no capital E - I simply studied Civil Engineering at college and ended up running an IT company and I'm quite good at IT.
What I would really like to see is really well indexed documentation written by people ie an old school search engine. Google used to do that and so did Altavista, back in the day.
I do not need or want a plethora of trite simulacra web sites dripping with AI wankery at every search term.
> well indexed documentation
Indices are, by definition, lossy representations of their underlying data. If you use stemming and lemmatization to preprocess both documentation and query text, you're already departing from a truly hand-optimized indexing system, and choosing to have imperfect algorithms do things in a more scalable way. And indexing by embedding vectors that use LLMs to determine context are a natural extension of this, in my view. And on top of that, when you have a massive amount of candidate text to display to the user... is displaying sentence fragments one on top of the other the most optimal UX there? At a certain point, RAG becomes the answer to this question.
The problem, as you note, is that search engines and social media systems are incentivized to allow garbage content into the original set of things they index and surface, if that garbage content drives more attention to advertisements. But that's not a reason to reject the benefits that the underlying LLM technology can bring towards building good indexing on top of human-written documents. It just won't be done by the companies that used to do it.
You really don't use Copilot or ChatGPT? Have you tried them?
I have tried ChatGPT quite a lot. CoPilot has failed to rock up on my KDE desktop - but I live in hope.
I really tried and I was both impressed and horrified in equal measure. I advise people to use them but treat them like a calculator that snorts cocaine.
A calculator is a useful tool but when they start going off the rails, things can start to get nasty.
I should be more precise: After messing around with generic questions and answers, I went for VMware PowerCLI to test it out. PowerCLI is PowerShell for VMware boxes. PowerShell is very popular so loads of input. PowerCLI is VMware so lots of input too but not quite so much as PS itself.
I tried to get ChatGPT to generate a PS script to patch a VMware cluster. The result was horrific and not even close. Bear in mind that the entirety of the VMware docs for PowerCLI are public and I wrote a script myself - its not perfect but good enough.
Oh and I am dropping VMware for good in favour of Proxmox. I have been a VMware consultant for 20+ years. Oh well.
A cocaine-snorting calculator, great analogy :)
I do find that for big tasks I generally have to scaffold it a bit, lead it in the right direction.
But it's also impressive the way it can do tasks that I can't: like writing complex TypeScript. I could spend hours on certain TypeScript challenges and not actually find a workable solution. With ChatGPT I can usually either get a solution, or convince myself it's not going to happen, within 5-10 minutes.
I never have and I never will!
> I never have and I never will!
That’s an odd flex to roll with for someone who has long been a member of a community of technology enthusiasts, but you be you.
LLMs are tools, not religions. They don’t need to be elevated to a dogmatic level.
I agree with this. I have a normally inquisitive friend who, in response to his trade being threatened by AI, has decided that it's all a horror and he would rather avoid even reading or trying anything from the AI landscape. Bit like "I'd rather be unemployed and starving than have anything to do with all that."
All of my trades are likewise threatened, but I've found various AI options to be useful augments or interesting toys. Things like having ChatGPT check if a particular game already exists, or analysing cost/effort of lining a shed with gyprock vs ply, or analysing a chicken orchard with steel or timber, or instantly culling lists of extraneous options, or summarising someone's public writings on a particular topic, name ideas in various languages. If you have an experience eye to review suggestions, it's fantastic. Or having CoPilot/similar quickly juicing up a web page - something I could do manually, but would rather save time. Or learning how to build a game in a different language.
I guess I don’t see it as merely technology about which I might be reasonably enthused. Everything cool becomes bad or harnessed for evil in rather short order these days. The destructive potential of widespread deployment of LLMs seems so obvious that I don’t see why anyone would rush employ it in their work, let alone book a trip to a Meta-hosted hypefest for it.
>I don’t see why anyone would rush employ it in their work, let alone book a trip to a Meta-hosted hypefest for it.
These are very different propositions.
You probably already use various code automation tools in your work. At the lower end, Copilot is exactly that, just a bit smarter.
You write "const [getFoo, set" and it autocompletes "Foo] = useState(". Who wouldn't want that?
It's entirely within your control how far you stray into adopting code that you don't thoroughly understand or haven't thoroughly vetted.
I wish QtCreator didn’t require a commercial license for their LLM integration.
Do you have any suggestions for a linux-based, qt-tolerant, LLM-integrated IDE? I’d love to try one.
Some of us are still using punch cards and never changed /s
I was that way about a year or two ago. The stuff is moving fast, the water is warm. Unless your objections are rooted in privacy or some other perceived misdeed, I say give it a whirl.
If nothing else, use Dall-E to draw stupid pictures to make you and friends laugh. :)
Okay
Check out Zeal - zealdocs.org - you get indexed docs for stuff everyone uses.
Probably not going to be saying much though. The state of real-time LLM-based conversation aids just isn’t where it needs to be for those folks to function in public effectively. I could foresee there being a heck of a slam broetry event happening at an after party, though.
[flagged]
Ironically, that was a pretty Reddit-like thing to say.
> and 2025 is shaping up to be another banger
> banger
“Fellow kids” vibes from the dinosaurs at Facebook and Zuckerfuck.
let's not be afraid to bring masculinity to work
up next: farting on the earnings call "here's what I think of your question Chadwick at Vanguard..."
/s
For a moment I thought it said "LlamaCoin" lol
I wondered why so much support on HN all of a sudden.
Boycott.
Facebook cheering AI and those dinosaurs cheering that "meteor shower" - same vibe!