jmull a day ago

This is a fantastic approach.

BTW, it looks like the js engine is "QuickJS" [0]. (I'm not familiar with it myself.)

I like it because sqlite by itself lacks a host language. (e.g., Oracle's plsql, Postgreses pgplsql, Sqlserver's t-sql, etc). That is: code that runs on compute that is local to your storage.

That's a nice flexible design -- you can choose whatever language you want. But quite typically you have to bring one, and there are various complications to that.

It's quite powerful, BTW, to have the app-level code that acts on the app data live with the data. You can present cohesive app-level abstraction to the client (some examples people will hopefully recognize: applyResetCode(theCode) or authenticateSessionToken(), or whatever), which can be refined/changed without affecting clients. (Of course you still have to full power and flexibility of SQL and relational data for the parts of your app that need it.)

[0] https://bellard.org/quickjs/

  • jmtulloss a day ago

    This doesn’t make sense to me. SQLite is an embedded database, all the app code is already local to the storage.

    • oefrha 15 hours ago

      Plus if you want to use functions inside queries, SQLite UDFs can be used from any language. E.g. for Node the better-sqlite3 binding has supported UDFs for ages; the new experimental node:sqlite also appears to have support[1][2], though I haven't used it. For Python UDFs are supported out of box. Etc. I can't see myself loading a js extension with a bundled engine exposing the same UDF capability when I can just use UDFs from the host language (which can be js with v8 already there) with access to whatever application state I want to expose, and arbitrary library code, rather than just the bare bones standard library shipped with QuickJS. What's the use case? Use rather awkward js functions in the sqlite3 CLI?

      [1] https://nodejs.org/api/sqlite.html#databasefunctionname-opti...

      [2] https://nodejs.org/api/sqlite.html#databaseaggregatename-opt...

      • ncruces 9 hours ago

        This stores the UDFs in the database. Which means each database can have its own UDFs.

        This makes more sense if you're someone who hosts the databases of others, but I it's still interesting otherwise.

        Like, if something like that was included in the CLI, one of the many browser apps, or the online playgrounds, your DBs could carry their own UDFs portably within themselves.

        It'd be even more interesting if it supported virtual tables as well.

    • jmull a day ago

      Well, you can still have an app server if you want -- having app logic that runs local to the storage doesn't preclude you from also having an "app server" (that is, a place app logic runs that isn't local to the user and isn't local to the storage, but somewhere in between.)

      Very typically, that's how it's done with traditional client/server databases.

      There's no built-in "wire-protocol" for clients to connect, but there are reasonable options (it's a pretty common pattern, if fact, for systems to have a data service that provides an app-level HTTP interface to data -- so there you go, it's something you might have implemented anyway).

      But I think this project would help in the creation of a full/rich application data service without a need for an intermediate app tier.

      There are a few reasons people end up with an intermediate app-level data service, but it's starting to seem like a service based on sqlite (running local to the storage, of course) may be able to provide a decent alternative answer in many cases.

      I'm imagining a service light-weight enough to run as a lambda or other serverless environment (including fast cold start) which then opens up some interesting things like one-db per user and maybe user-controled host, etc.

      • MobiusHorizons 18 hours ago

        You could of course do that, but the main reason to use SQLite in a server application is for the performance characteristics of staying in-process for accessing the database. This makes it cheap to make queries in response to a previous query, which would need to somehow be batched for good latency with a client/server database. Re-adding the client server model defeats almost all of the PROs of SQLite, leaving you with only the CONs.

    • mikepurvis a day ago

      I feel like this would be more about validation and constraint enforcement than full on application functions.

      Given that, though, JavaScript feels like a bit of an odd choice for language.

  • 90s_dev a day ago

    QuickJS came out a few years ago, and it was really exciting at the time, being a Lua-style embeddable JS in contrast to V8 which is a beast but very fast and much more comparable to LuaJIT. Which basically meant you could stop using Lua for scripting and start using JS. But by the time it came out, TypeScript was already king of JS land, and QuickJS didn't have native TypeScript type stripping support last time I checked (~2023), though he did port the `tsc` compiler to it I think? so you can build this functionality yourself. I wonder how QuickJS compares to JSCore which Bun uses.

    • jmull a day ago

      You could use jsdoc, or 'tsc' your .ts files on the way to loading them into the db.

      Typescript has a fairly limited utility here though. It's a static type checker. Your types are mostly going to be SQL parameters and the result of running SQL, which, by design/nature are resolved at runtime. You could build a bunch of external controls to help ensure the runtime data is contained to your static types, but you're really swimming upstream. Like you can use a screwdriver as a hammer, but there are better approaches. (I think typescript would be much better used client-side, in your app server code that is above the data layer.)

      • 90s_dev a day ago

        I tried jsdoc for so many years before just giving up and going full TypeScript.

        But you're right, the TS layer would be static, and you would compile to JS and just use that... I guess.

        Until the types-proposal is inevitably implemented, of course.

  • crabbone a day ago

    When I needed to interface with SQLite from Ada, I discovered that SQLite library provided with the language lacked regular expressions, and I had to add that functionality myself. During this exercise, I learned that you can relatively easily add Ada functions to be executed by SQLite. From performance and ability to debug standpoint, I'd prefer this to a whole language runtime added to the database.

    In general, I'd prefer to minimize non-SQL code that runs in the database because it's hard to reason about its implications on the already complicated planning and execution of SQL code. Especially if such code can observe or change the state of the variables involved in a transaction. I feel like to not have this feature backfire, I'd want to have a way to either disallow access to the variables, or create a framework for the foreign code where its made to comply with transaction rules and have to make promises about its own code to not violate them.

    • sgarland 16 hours ago

      The other reason to not do this is that most functions someone might want to add are very simple, enough so that even if you don’t know C, you can probably muddle your way through it.

      JS in an RDBMS… shudder

sgarland 20 hours ago

Why not use the native functions [0] of the DB? Presumably they're going to be faster. For example, computing the median of a table `nums` with columns `id` and `num` can be done like this:

    WITH ordered_nums AS (
      SELECT num, ROW_NUMBER() OVER (ORDER BY num) as rn,
             COUNT(*) OVER() as total
      FROM nums
    )
    SELECT AVG(num) as median
    FROM ordered_nums
    WHERE rn IN (
      (total + 1) / 2,
      (total + 2) / 2
    );
[0]: https://www.sqlite.org/lang_corefunc.html
Wheaties466 5 hours ago

Can someone explain to me why you would want to do something like in the example of calculating age based on birthdate? Why wouldn't you do that within an app or within code rather than having a database function?

rcarmo 5 hours ago

Nice. Would prefer Lua though.

hbcondo714 a day ago

> Every SQLite Cloud database comes with the sqlite-vec extension pre-installed. sqlite-vec is currently built and optimized for brute-force vector search. This means there is no approximate nearest neighbor search available at this time[1]

Darn, ANN would be awesome to have on the edge.

[1]: https://docs.sqlitecloud.io/docs/vector

  • marcobambini a day ago

    We'll soon announce a new extension: sqlite-vector.

    It is blazing fast, highly optimized, and even performs well on memory-constrained devices. Already tested with 5M 1500-dimensional vectors.

    The repo is currently private, and we'll make it public soon: https://github.com/sqliteai/sqlite-vector

    • mholt a day ago

      How does it compare to https://github.com/asg017/sqlite-vec ? Already using this in a big project.

      • marcobambini a day ago

        Our implementation is much faster and does not require to store vectors into a virtual table (which forces the user to write complex join statements)

    • intalentive a day ago

      Have you benchmarked against the usearch extension?

gwbas1c a day ago

Question: How easy / hard is it to replace a SQL query with a join, to a SQL query that returns a JSON object? (IE, a foreign key relationship is turned into a JSON array.)

gorm 20 hours ago

Nice project and cool to see JavaScript embedded with SQL this way, never seen it before. Just wondering how it ended up like this syntax wise and what exactly is going on here?

SELECT js_create_scalar('function_name', 'function_code');

Really cool project! Thanks for sharing.

cal85 a day ago

Looks interesting. Is there a performance benefit to pushing this kind of logic into SQLite, compared with doing similar logic as a series of steps from a Node process? Or are the motivations for this library more ergonomic/practical? (Or does it enable you to do things you actually couldn’t do from Node at all?)

  • sillystu04 a day ago

    There are performance benefits to using DB functions in situations where you'd otherwise have to pull lots of data out of the DB.

    For example, if you wanted to find which of your sessions where created with iPV6 addresses you could select them all out and perform the logic in your application code, potentially at the cost of pulling millions of records out of your DB. But doing it in a DB function allows you to skip this as your app code never needs to do the calculations.

    This kind of optimisation is generally more important when the DB is running on a separate machine to the application code because of the overhead of big network requests in getting large amounts of data out, but even on a local SQLite DB there is likely some non zero benefit to minimising the amount of data retrieved.

    I suppose DB functions could of course be implemented in SQL or similar, but that can be quite unfriendly for complex logic. So in a sense there is an advantage ergonomic as well.

    • cal85 a day ago

      > even on a local SQLite DB there is likely some non zero benefit to minimising the amount of data retrieved.

      Why though? I get why it can be a big perf win to push that kind of logic into a remote DB - fewer and smaller payloads being sent over the network. But what makes you say there is likely a non-zero benefit even with local SQLite? (I don’t really have a clear mental model of how embedded SQLite works, so you may well be right, I just want to understand why.)

      • frollogaston a day ago

        First thing that comes to mind is you've got a complex query with some of these UDFs involved in the middle, rather than just transforming the end result. Doing the equivalent without UDFs would be an entirely different query with a different plan, or more likely you'd have to split into separate queries.

      • sgarland 16 hours ago

        Because I/O isn’t free? If you can write code that does fewer things, it will be faster.

      • intalentive a day ago

        If you have to pull the data into your application then it's all in memory at once. SQLite streams from disk, so the memory usage at any given time is less. Also, if the application language is slower than C, then you get a performance boost there as well.

timz 12 hours ago

Fantastic, would love to see same js support for redis as oposed to lua scripts.

9dev a day ago

JS is a great choice for this. I wonder if one could stack a bytecode compiler on top, to optimise performance even further? Or add WASM support, and compile the JS to WASM when creating the function?

3cats-in-a-coat 5 hours ago

I wonder how many more decades we need until we realize database and programming language belong together. We can still separate infrastructure in persistence, backend, frontend, that's not the point. Every one of them should have a native local relational database, and these databases, in each layer, should be capable of basic interop out of the box.

datadrivenangel a day ago

Can't you already just register a javascript function as a custom user defined SQLite function?

orliesaurus 19 hours ago

This is such an interesting concept, thanks for sharing!

rasz 8 hours ago

Hear me out guys - SQLite, but in a browser! We could call it WebSQL.

  • pwdisswordfishz 5 hours ago

    That's pretty much why it failed: https://www.w3.org/TR/webdatabase/

    > This document was on the W3C Recommendation track but specification work has stopped. The specification reached an impasse: all interested implementors have used the same SQL backend (Sqlite), but we need multiple independent implementations to proceed along a standardisation path.

pdyc a day ago

can this work with wasm too? that would open interesting doors of doing it in browser.

mcflubbins a day ago

Cool, now someone rebuild the magic that was the OLD CouchDB (1.x) with couchapps!

porridgeraisin a day ago

> js_create_aggregate

Reminds me of awk, Nice.

90s_dev a day ago

Given that this links to sqlite.ai, did AI write some or all of this project? It's hard to tell anymore.

  • marcobambini a day ago

    All code written by me. I am a human.

    • giancarlostoro a day ago

      Is there some background why the domain implies AI is involved? I'm sure I'm not the only one confused / curious about it.

neuroelectron 19 hours ago

    CVE-2024-0418 (and similar recent ones like CVE-2024-32593, CVE-2024-32592): These often relate to how QuickJS handles certain object properties or internal structures, potentially leading to crashes (Denial of Service) or, in more severe cases, memory corruption issues like heap-based buffer overflows or use-after-free vulnerabilities. These types of memory corruption can sometimes be escalated to arbitrary code execution, though it's not always straightforward.

    CVE-2021-40517: A use-after-free vulnerability when handling Array.prototype.concat with a specially crafted proxy object. This could lead to a crash or potentially code execution.

    CVE-2020-13951: An issue in JSON.parse that could lead to a stack overflow (Denial of Service) with deeply nested JSON structures.
It's not V8 or SpiderMonkey, which have dedicated, large security teams and decades of hardening due to their use in browsers handling actively malicious web content. QuickJS is primarily the work of one (albeit brilliant) developer.

This means that while it's well-written, the sheer volume of security research and fuzzing applied to browser engines is likely greater.

The responsibility for security falls on multiple layers:

    Fabrice Bellard for QuickJS itself.

    The sqlite-js developers (
    @marcobambini
    marcobambini Marco Bambini
    @Gioee
    Gioee Gioele Cantoni)

    for how they embed, configure, and update QuickJS, and what APIs they expose.

    The end-user/DBA for controlling who can define JavaScript UDFs and for keeping sqlite-js (and thus its QuickJS version) updated.