xnx 3 days ago

Very cool. The shellfs extension (https://github.com/rustyconover/duckdb-shellfs-extension) that allows shell commands to be used for input and output will make DuckDB even more useful as a command line analysis tool. I'm not sure how I'll use it yet, but I'm betting I can streamline some multi-step data processes.

  • rustyconover 2 days ago

    As the author I'm happy to answer any questions. I'm glad you like the idea of the extension.

netcraft 3 days ago

>DuckDB Labs and the DuckDB Foundation do not vet the code within community extensions and, therefore, cannot guarantee that DuckDB community extensions are safe to use. The loading of community extensions can be explicitly disabled with the following one-way configuration option:

So we should think of this like NPM.

Still, very cool and very useful. Would love a way from inside of duckdb directly to query the extensions available from community.

  • nerdponx 3 days ago

    And like NPM or PyPI it's still at least marginally better than downloading compiled packages from opaque file servers. For example we avoided using the H3 (https://h3geo.org) extension for that reason. Safer (but slower) to use Python UDFs with the official H3 Python library than to fetch a file from an R2 instance, which is what the instructions currently state on Github (https://github.com/isaacbrodsky/h3-duckdb/blob/3c8a5358e42ab...)

9cb14c1ec0 3 days ago

> What happens behind the scenes is that DuckDB downloads an extension binary

The baser part of me wonders how hard it would be to compromise that supply chain.

  • 1egg0myegg0 3 days ago

    Extension downloads are validated using a signature check to prevent tampering!

    (I work for DuckDB Labs and MotherDuck)

    • immibis 3 days ago

      The backdoored version of xz was also signed.

  • metadat 3 days ago

    define: baser

    > 1. (of a person or a person's actions or feelings) without moral principles; ignoble.

    > 2. denoting or befitting a person of low social class.

    (New term, to me)

  • sitkack 3 days ago

    Same as PyPi. Maybe upload left pad?

shubhamjain 3 days ago

Honest question, how feasible it would be for DuckDB to release a non-columnar version of their DB (or at least make DuckDB a decent choice for a typical web app)? I don't know any other DB that makes installing extensions this easy. The rate at which they're shipping awesome features makes me wonder if they could eventually become a great generic database.

I know, I know, this could just as easily be a double-edged sword. A database should prioritize stability above everything else, but there is no reason why we shouldn't expect them to reach there.

  • wild_egg 3 days ago

    Are we certain that it's _not_ a decent choice for a typical web app? I'm tempted to swap it into one of mine and see how it behaves. Even if some operations are internally slower, that might be offset by having zero network latency to deal with

    It would be nice though if other DBs made extensions this easy. There are a handful of package managers for Postgres but they're not generally supported on managed platforms like RDS.

    Anyone know if there are comparable options for SQLite? Seems like an obvious thing that should exist but a quick search isn't showing me any

  • 1egg0myegg0 2 days ago

    Hello! I would recommend trying out DuckDB's SQLite attach feature! You can read or write data, and even make schema changes, all with DuckDB's engine and syntax. The storage then uses SQLite, which is row oriented!

    https://duckdb.org/docs/extensions/sqlite

    (I work at MotherDuck and DuckDB Labs)

    • wild_egg 2 days ago

      This is excellent — do you have any content around the performance affect here over using SQLite directly? I could see DuckDB's engine being faster for some cases but the SQLite storage format might hinder it. Curious if there's any analysis around this

  • snidane 2 days ago

    What do you need non-columnar layout for? Do you expect thousands of concurrent single row writes at a time?

    If you use embedded duckdb on the client, unless the person goes crazy clicking their mouse at 60 clicks/s, duckdb should handle it fine.

    If you run it on the backend and expect concurrent writes, you can buffer the writes in concatenated arrow tables, one per minibatch, and merge to duckdb every say 10 seconds. You'd just need to query both the historical duckdb and realtime arrow tables separately and combine results later.

    I agree that having a native support for this so called Lambda architecture would be cool to have natively in duckdb. Especially when drinking fast moving data from a firehose.

  • mgaunard 3 days ago

    Most of my web apps are built around tabular data.

gigatexal 3 days ago

This is the coolest thing! I’m very excited to see what we will have next. Hah maybe an extension that imbeds vim and then I’ll never leave DuckDb lol