Dagger: Define software delivery workflows and dev environments
Posted by ahamez 6 days ago
Comments
Comment by kjuulh 15 hours ago
It serves a place where a dockerfile is not enough, and CI workflows are too difficult to debug or reason about.
I do have some current problems with it though:
1. I don't care at all about the LLM agent workflows, I get that it is possible, but the same people that chose dagger for what it was, is not the same audience that runs agents like that. I can't choose dagger currently, because I don't know if they align with my interests as an engineer solving a specific problems for where I work (delivering software, not running agents).
2. I advocated for modules before it was a thing, but I never implemented it. It is too much magic, I want to write code, not a DSL that looks like code, dagger is already special in that regard, to modules takes it a step too far. You can't find the code in their docs anymore, but dagger can be written with just a .go, .py or .rs file. Simply take in dagger as a dependency and build your workflow.
3. Too complex to operate, dagger doesn't have runners currently, and it is difficult to run a setup in production for CI yourself, without running it in the actions themselves, which can be disastrous for build times, as dagger often leads you into using quite a few images, so having a cache is a must.
Dagger needs to choose and execute; not having runners, even when we we're willing to throw money at them was a mistake IMO. Love the tool, the team, the vision but it is too distracted, magical and impatient to pick up at the moment.
Comment by shykes 13 hours ago
1. Yes we got over-excited with the agent runtime use case. We stand by the LLM implementation because we never compromised on the integrity of Dagger's modular design. But our marketing and product priorities were all over the place. We're going to refocus on the original use case: helping you ship software, and more particularly building & testing it.
2. Modules have warts but they are essential. We will continue to improve them, and remain committed to them. Without this feature, you have to write a complete standalone program every time you want to build or test your software. It's too much overhead.
3. Yes you are right. We really thought we could co-exist with CI runners, and get good performance without reinventing the wheel. But for various reasons that turned out to not be the case. So we're going to ship vertically integrated runners, with excellent scalability and performance. DM me if you want early access :)
TLDR: yes we needed to choose and execute. We have, and we are.
Thank you again for the feedback.
Comment by kjuulh 12 hours ago
Best of luck and thx for taking my harsh feedback in strides!
Comment by usrme 23 hours ago
I ended up opting for CUE and GitHub Actions, and I'm glad I did as it made everything much, much simpler.
Comment by tom1337 23 hours ago
Comment by digdugdirk 23 hours ago
Comment by themgt 23 hours ago
re: the cloud specifically see these GitHub issues:
https://github.com/dagger/dagger/issues/6486
https://github.com/dagger/dagger/issues/8004
Basically if you want consistently fast cached builds it's a PITA and/or not possible without the cloud product, depending on how you set things up. We do run it self-hosted though, YMMV.
Comment by pxc 21 hours ago
We used Dagger, and later Nix, mostly to implement various kinds of security scans on our codebases using a mix of open-source tools and clients for proprietary ones that my employer purchases. We've been using Nix for years now, and still haven't set up any of our own binary cache. But we still have mostly-cached builds thanks to the public NixOS binary cache, and we hit that relatively sparingly because we run those jobs on bare metal in self-hosted CI runners. Each scan job typically finishes in less than 15 seconds once the cache is warm, and takes up to 3 minutes when the local cache is cold (in case we build a custom dependency).
Some time in the next quarter or two I'll finish our containerization effort for this so that all the jobs on a runner will share a /nix/store and Nix daemon socket bind-mounted from the host, so we can have relatively safe "multi-tenant" runners where all jobs run under different users in rootless Podman containers while still sharing a global cache for all Nix-provided dependencies. Then we get a bit more isolation and free cleanup for all our jobs but we can still keep our pipelines running fast.
We only have a few thousand codebases, so a few big CI boxes should be fine, but if we ever want to autoscale down, it should be possible to convert such EC2 boxes into Kubernetes nodes, which would be a fun learning project for me. Maybe we could get wider sharing that way and stand up fewer runner VMs.
Somewhere on my backlog is experimenting with Cachix, so we should get per-derivation caching as well, which is finer-grained than Docker's layers.
Comment by shykes 18 hours ago
If you have questions about Dagger, I encourage you to join our Discord server, we will be happy to answer them!
Comment by esafak 20 hours ago
Comment by pxc 20 hours ago
That is, of course, a self-fulfilling prophecy (or, perhaps, a self-inflicted wound). As soon as Dagger's "multi-language support" came out (actually a bit before), the CUE SDK was rendered abandonware. Development only happened on the new backend, and CUE support was never ported over to the new one.
Comment by shykes 17 hours ago
We shipped multi-language support because we had no choice. It was a major engineering effort that we hadn't originally planned for, but it was painfully obvious that remaining a CUE-only platform was suicide.
Comment by pxc 9 hours ago
I think multi-language support is a great feature, and I understand why you had to go for it. While I'm sure some people likely switched away from CUE once they had the chance because they weren't interested in working with a novel and perhaps quirky DSL, I'm also sure some stopped using the CUE SDK just because it was clear to them that it was being abandoned— I know that because I'm one of them. I'm one of the users who stopped using the CUE SDK after multi-language support came out— and it's not because I preferred using one of those other languages. That's all I'm saying.
Comment by shykes 6 hours ago
For a while there was activity on the #cue channel about a community SDK (that's how we got PHP, Java, Rust, Elixir and dotnet), but it didn't materialize.
It looks like you were in the minority that would have preferred to continue using the original CUE SDK - I'm sorry that we didn't find a way to continue supporting it.
Comment by Kinrany 23 hours ago
Comment by sontek 18 hours ago
Comment by sontek 17 hours ago
Comment by Xiol 23 hours ago
Comment by usrme 23 hours ago
Here are a few links to whet your appetite:
- https://cue.dev/docs/getting-started-with-github-actions-cue...
- https://cue.dev/docs/drying-up-github-actions-workflows/
- https://cue.dev/docs/spotting-errors-earlier-github-actions-...
Definitely read through the CUE documentation (https://cuelang.org/docs/), watch their YouTube videos (https://www.youtube.com/@cuelang/videos), and join the community Slack channel (https://cuelang.org/community/). I've gotten a lot of help in the Slack from both enthusiastic community members and from the developers themselves whenever I've gotten stuck.
Comment by 9dev 23 hours ago
Comment by diarrhea 20 hours ago
To some extent yes. If all you have is 2 GitHub Actions YAML files you are not going to reap massive benefits.
I'm a big fan of CUE myself. The benefits compound as you need to output more and more artifacts (= YAML config). Think of several k8s manifests, several GitHub Actions files, e.g. for building across several combinations of OSes, settings, etc.
CUE strikes a really nice balance between being primarily data description and not a Turing-complete language (e.g. cdk8s can get arbitrarily complex and abstract), reducing boilerplate (having you spell out the common bits once only, and each non-commit bit once only) and being type-safe (validation at build/export time, with native import of Go types, JSON schema and more).
They recently added an LSP which helps close the gap to other ecosystems. For example, cdk8s being TS means it naturally has fantastic IDE support, which CUE has been lacking in. CUE error messages can also be very verbose and unhelpful.
At work, we generate a couple thousand lines of k8s YAML from ~0.1x of that in CUE. The CUE is commented liberally, and validation imported from native k8s types, and sprinkled in where needed otherwise (e.g. we know for our application the FOO setting needs to be between 5 and 10). The generated YAML is clean, without any indentation and quoting worries. We also generate YAML-in-YAML, i.e. our application takes YAML config, which itself is in an outer k8s YAML ConfigMap. YAML-in-YAML is normally an enormous pain and easy to get wrong. In CUE it's just `yaml.Marshal`.
You get a lot of benefit for a comparatively simple mental model: all your CUE files form just one large document, and for export to YAML it's merged. Any conflicting values and any missing values fail the export. That's it. The mental model of e.g. cdk8s is massively more complex and has unbounded potential for abstraction footguns (being TypeScript). Not to mention CUE is Go and shipped as a single binary; the CUE v0.15.0 you use today will still compile and work 10 years from now.
You can start very simple, with CUE looking not unlike JSON, and add CUE-specific bits from there. You can always rip out the CUE and just keep the generated YAML, or replace CUE with e.g. cdk8s. It's not a one-way door.
The cherry on top are CUE scripts/tasks. In our case we use a CUE script to split the one-large-document (10s of thousands of lines) into separate files, according to some criteria. This is all defined in CUE as well, meaning I can write ~40 lines of CUE (this has a bit of a learning curve) instead of ~200 lines of cursed, buggy bash.
Comment by flanked-evergl 21 hours ago
Comment by stephen 20 hours ago
I.e. declaratively setup a web of CI / deployment tasks, based on docker, with a code-first DSL, instead of the morass of copy-pasted (and yes orbs) CircleCI yaml files we have strewn about our internals repos.
But their DSL for defining your pipelines is ... golang? Like who would pick golang as "a friendly language for setting up configs".
The underlying tech is technically language-agnostic, just as aws-cdk's is (you can share cdk constructs across TypeScript/Python), but it's rooted in golang as the originating/first-class language, so imo will never hit aws-cdk levels of ergonomics.
That technical nit aside, I love the idea; ran a few examples of it a year or so ago and was really impressed with the speed; just couldn't wrap my around "how can I make this look like cdk".
Comment by esafak 20 hours ago
Comment by stephen 13 hours ago
https://docs.dagger.io/cookbook/services?sdk=typescript
Still looks like "a circa-2000s Java builder API" and doesn't look like pleasant / declarative / idiomatic TypeScript, which is what aws-cdk pulled off.
Genuinely impressively (imo), aws-cdk intermixes "it's declarative" (you're setting up your desired state) but also "it's code" (you can use all the usual abstractions) in a way that is pretty great & unique.
Comment by shykes 1 hour ago
Comment by moltar 19 hours ago
Comment by LeBit 22 hours ago
What else could be used to abstract away your CICD from the launcher (Jenkins, Argo Workflows, GitHub Actions, etc.)?
Comment by shykes 19 hours ago
Comment by moltar 19 hours ago
But one flaw (IMO) that it can’t export artifacts and import into other steps without breaking the cache.
Eg if you provide monorepo as input, and then on some step narrow your build to one specific dir, then even when files change outside of that dir then caching still is invalidated.
Which makes it extremely verbose and maintenance nightmare to keep multiple narrow inputs and keep all those paths up to date.
Comment by shykes 19 hours ago
Comment by Havoc 1 day ago
Comment by pxc 21 hours ago
Then they completely abandoned not just the CUE frontend, but CUE altogether (while strenuously denying that they were doing so) for a GraphQL-based rewrite that focused on letting people use popular general-purpose languages to construct their workflows. The initial rollout of this was not feature complete and only supported imperative languages (Python and TypeScript, IIRC), which I didn't like.
Instead of porting everything over to all their new interfaces, I hopped off the train and rewrote all of our portable pipeline scripts in Nix, via Devenv. At the time, I'd never used Devenv before, but getting the work done that time still took maybe a tenth of the time or less. More than anything else, this was due to not having to fuck around with the additional overhead Docker entails (fussing with mount points, passing files from one stage to another, rebuilding images, setting up VMs... all of it). I got the reproducibility without the extra bullshit, and got to work with interfaces that have proven much more stable.
I still think there's a place for something like Dagger, focused just on CI, perhaps even still using Docker as a deployment/distribution strategy. But I no longer trust Dagger to execute on that. I think a proper external DSL (probably special-purposw but still Turing-complete, e.g., Nickel) is the right fit for this domain, and that it should support multiple means of achieving repeatability rather than just Docker (e.g., Nix on bare metal and Podman, to start). An option to work on bare metal via reproducible environment management tools like Nix, Guix, or Spack is a valuable alternative to burdensome approaches like containers.
I haven't looked at Dagger in several months, but the other big piece that is missing for portable CI workflows is a library that abstracts over popular CI platforms so you can easily configure pull/merge request pipelines without worrying about the implementation details like what environment variables each platform exposes to indicate source and target branch.
Idk anything about all the AI horseshit; I was off the Dagger bandwagon before they took that turn. I don't know if it's serious or a nominal play to court investors. But that kind of pivot is another reason not to build core infra on top of the work of startups imo. If the product is 70% of what you want, you have no way of knowing whether filling that 30% gap is something the maintainers will suddenly pivot away from, even if their current direction looks aligned with yours.
I'd recommend considering tools in this space only if (a) they're already close to 100% of what you need and (b) they're open-source. Maybe you can relax (a) if it's really easy to extend the codebase (I find this to be true for Devenv's Nix modules, for example.)
Comment by digdugdirk 21 hours ago
I currently manage my development environments via NixOS and Devenv, so if I could just keep that and achieve the same functionality, that sounds good to me.
Comment by junon 1 day ago
Then... it wasn't. The more I read the less I ever want to see this again. The LLM train has got to end at some point.
Comment by sgammon 20 hours ago
Comment by nozzlegear 18 hours ago
/s I've never heard of Dagger the DI framework but I have heard of this Dagger. Names will overlap sometimes and it's not a big deal.
Comment by sgammon 13 hours ago
Android is the most popular operating system on earth. Names can overlap but you shouldn’t choose one where you can’t live up to owning it.
Comment by nozzlegear 10 hours ago
> can’t live up to owning it.
?
Comment by isuckatcoding 19 hours ago
Comment by techn00 18 hours ago
Comment by leetrout 22 hours ago
They don't seem to have jumped for AI hype (yet?)...
Comment by someguy101010 22 hours ago
Comment by oulipo2 21 hours ago
Comment by oulipo2 1 day ago
Comment by jiehong 1 day ago
Comment by bflesch 22 hours ago
Comment by jiehong 1 day ago
Without the LLM bits, this is basically like Bazel or buck2, right?
Comment by dilyevsky 19 hours ago
Comment by kylegalbraith 19 hours ago
Comment by esafak 23 hours ago
Comment by ndrpnt 23 hours ago
Comment by esafak 23 hours ago
Comment by mxey 21 hours ago
Comment by Too 20 hours ago
Out of curiosity, would it be feasible to take a big cmake project and generate thousands of compile rules into dagger and use it as a substitute for make with sandboxing? I’ve never seen builkit used with such many nodes, how would it fare?
Comment by esafak 21 hours ago
Comment by justincormack 23 hours ago
Comment by elryry 23 hours ago
Comment by lowmagnet 23 hours ago
Comment by Too 20 hours ago
If I already have a Dockerfile that doesn’t need composition, how does this help me vs being a small cosmetic improvement over ”docker build” command line?
Comment by tajd 1 day ago
Comment by jiehong 1 day ago
But, the marketing heavily focuses on LLM stuff to the point of making everyone confused.
Comment by mayhemducks 22 hours ago
Comment by lowmagnet 23 hours ago
Comment by esafak 20 hours ago
Comment by michaelbuckbee 22 hours ago
I think a more interesting point of comparison is the Claude Code Github Action, Co-Pilot code reviews, etc.
Comment by vivzkestrel 19 hours ago
Comment by oulipo2 21 hours ago
Although I'm not sure if that's so much a value-added? It's not so hard to just create a container and launch an agent in it.
The whole interesting thing was to use actual programming languages for Docker build, which I think was what they initially tried to do, but now it's a bit incomprehensible... I guess conceptually Dagger relates to Dockerfile a bit like Pulumi related to Terraform?