Self-hosted dev sandboxes with preview URLs (Docker, Go, no K8s)

Posted by tastyeffectco 6 days ago

11234Original

Comments

Comment by rsyring 5 days ago

I'd like something like this but using firecracker VMs. Basically, a self hosted exe.dev.

Anyone building or using a project like this?

Comment by babhishek21 5 days ago

Any particular reason why you want this with microVMs? Security (kernel separation) or snapshot support perhaps?

A friend already made something similar for personal use, but using docker containers hardened with gVisor.

Comment by p2hari 5 days ago

+1 for this. Looking for something like exe.dev. self hosted . I tried using ionos cloud VPS , 4gig one could not handle even 3 basic web servers.

Comment by benldrmn 5 days ago

I am working on https://github.com/isola-run/isola which uses gVisor (not firecracker) on k8s (or something like kind, locally). Includes snapshotting, network controls and everything. Hope you could find this useful

Comment by p2hari 4 days ago

This looks interesting. With auth and certs we might have something equivalent!

Comment by CGamesPlay 5 days ago

I'm using https://coder.com for all my development containers. I've got mine hooked up to a k8s cluster, but anything that you can provision with Terraform can be used (e.g. docker containers).

Comment by bureado 5 days ago

hth: https://github.com/bureado/awesome-agent-runtime-security

Comment by cultofmetatron 5 days ago

firecraker is optimized for lambda. ie: fire and forget. not so much live systems that maintain long running state. also, I dont' think you can run it on top of a hypervisor.

Comment by umuttalha0 5 days ago

[dead]

Comment by Bnjoroge 5 days ago

[flagged]

Comment by sebmaynard 5 days ago

Any suggestions?

Comment by tastyeffectco 5 days ago

This project takes the Docker route instead of Firecracker — each container drops all capabilities, runs no-new-privileges, read-only rootfs, per-sandbox memory/PID limits, isolated networks. but! Not kernel-level separation like microVM.

depending on use cases but its enough for most and way simpler to operate and maintain.

If you need stronger isolation, the other replies in this thread mention (gVisor on k8s) Depends on your threat model and how much infra complexity you want to manage.

Comment by dang 5 days ago

Can you please not post AI-generated or AI-edited comments to HN? It's not allowed here - see https://news.ycombinator.com/newsguidelines.html#generated and https://news.ycombinator.com/item?id=47340079.

Of course, it's impossible to know for sure what was LLM processed or not, but some (not all!) of your posts are getting classified that way.

You obviously have good points to make and are certainly welcome here! but if you'd please write text by hand which you plan to post to HN itself, we'd appreciate it. The community feels strongly about this right now.

Comment by Bnjoroge 5 days ago

This is heavily vibecoded, and probably the 100th iteration of sandboxes. In any case, docker isn’t a serious isolation boundary for agents.

Comment by fulafel 5 days ago

The thing about Docker is that it's pretty nebulous, might mean the namespace-based container thing, or might mean the virtual machine app (Docker Desktop), or might mean the agent sandboxing thing (Docker Sandboxes).

Comment by mellosouls 5 days ago

Thanks for this, it looks interesting - how does it relate to your commercial service (upilote)? I ask so I can better understand what it offers.

I interpreted it as follows:

upilote are maybe a competitor to say loveable et al and as part of your marketing/community outreach you supply an open-source self-hostable (llm endpoint excepted) version of your service?

Or is this a subset of that service?

PS. Might get more traction as a ShowHN:

Comment by tastyeffectco 5 days ago

upilote (mellosouls): your second reading is correct.

This is not a Lovable competitor, and it's not all of upilote.

upilote is the product: chat → agent builds → live preview.

This repo is just the infrastructure layer underneath it that we extracted and open-sourced under MIT. It handles one container per project, preview URLs, running agents, sleeping when idle, waking on request, persistence, and recovery after reboots.

For us, it simplified a lot of things. Instead of managing all that logic ourselves, it became: submit a task and stream events back.

Comment by mellosouls 5 days ago

Thank you (and thanks for contributing to OS) - best of luck with your product.

Comment by sublimefire 5 days ago

Kinda cool, but the comments here also reveal associated alternatives. If isolation is important then you probably need to switch to at least gVisor and think about firecracker (needs kvm support). Otherwise, caddy, shell scripts can achieve it.

I am building a desktop application and use docker for now as an isolation to exec the stuff agent needs, this is just a poc.

Comment by cedws 5 days ago

So this is capable of exposing vibe coded projects contained only by Docker to the internet? Quite dangerous.

Comment by cadamsdotcom 6 days ago

Thanks for posting your project & congrats on shipping.

I have to confess, I’m struggling to see how this beats having my agent write 100 lines of shell script in a couple of seconds to do just the subset of this I need..

Would be neat to be able to read about that on its landing page!

Comment by digitaltrees 6 days ago

This feels like the notorious Dropbox objections when it launched. Just because it's easy to you doesn't mean it's easy for everyone. I can see this being really useful for my product which is cursor on my phone/computer/and browser. I built an IDE with a Linux container so I could have a real dev environment and file system on my iPhone. It let's me code at the beach with my kids (plan and epic have the AI do a massive pull request while I am having fun for 30 mins with the family. Spend 5 minutes giving the code a look which often finds some large concerns that warrant a new prompt etc). The containers were actually a huge pain to set up and I am still not satisfied with my implementation.

I'll definitely check this out. This project is actually perfect for several projects i am working on.

Comment by tastyeffectco 5 days ago

[dead]

Comment by danudey 6 days ago

I think the idea here is to provide this as a way to host/manage multiple different test apps, or apps for a team, or maybe you're just building one of those 'our AI will build you a web app' services. Definitely overkill for one-off projects.

As much as the 'no kubernetes needed' thing is nice, it would be nice if it had a 'yes kubernetes' option for those of us who have a k8s cluster available and want to yeet things into it or do more meaningful network restrictions/sandboxing/etc.

Comment by marcammann 6 days ago

I started using the Sandbox CRD: https://github.com/kubernetes-sigs/agent-sandbox

And if I really need to kick off a box very quickly, OpenRouter Spawn seems like it'll do 95% of what I need it to do: https://github.com/OpenRouterLabs/spawn

Still missing better restrictions though.

Comment by kxxx 6 days ago

[dead]

Comment by theptip 6 days ago

If you’ve got a k8s cluster, review apps are completely solved, right? You only need this if you’re in the niche of “no k8s, no PaaS”.

Comment by kxxx 6 days ago

[dead]

Comment by tastyeffectco 6 days ago

[flagged]

Comment by zackify 6 days ago

Haha that's what I do personally.

Vibe coded in 30mins a textualize tui that shows lxd containers.

I just hit "p" on a container to forward that container to host.

I only use ports for one instance at a time so it works perfect.

Hitting enter auto joins the lxc container instance with tmux.

Works perfect for me for tasks that can stay long running

Comment by ambicapter 6 days ago

That's funny, their README explicitly mentions this as a comment.

> "Why not just a shell script?"

Comment by no-name-here 6 days ago

It looks like the OP repo was created 6 hours ago and has multiple commits since the grandparent comment, including a commit message subject relevant to the GP comment.

Comment by tastyeffectco 6 days ago

[flagged]

Comment by indigodaddy 6 days ago

My project is kinda sorta similar but uses Incus (LXC) and Caddy: https://github.com/jgbrwn/vibebin

Comment by tim-projects 5 days ago

Get a cheap ip6 only vps for like $2. Hook up to cloud flare to get ip4. Run the agent under a user without sudo. Tell the agent to install caddy.

There's no need for all this complexity.

Comment by mrasong 5 days ago

Nice project!

If I want to run this on a VPS for a few sandboxes, what’s the minimum spec that won’t make everything melt? CPU/mem/storage?

Any hidden gotchas not in the README?

Comment by tastyeffectco 5 days ago

For a 8 CPU, 12GB Ram you can run app to 10 concurrents sandboxes that do heavy builds! the project itself is not ram/cpu heavy at all! depending on what you will run inside ! that was one of the main points why i didnt chose to go for real kvm isolation.

i would just say if its for an early stage product got for it! at scale reconsider security and isolation layers

Comment by 5 days ago

Comment by hk1337 6 days ago

I did a similar thing but with devcontainer and codespaces.

Comment by 5 days ago

Comment by utibeumanah 6 days ago

interesting project does this have some form of isolation?

Comment by tastyeffectco 5 days ago

Yes, but not fully! each sandbox have all linux capabilities! runs with no-new-privileges, a read-only rootfs! capped limits on PID and Memory, network isolated per design! all that said! this is not a VM isolation level like Firecracker for example, but quit enough for most use cases for early stage products or entreprise internal products

Comment by danelliot 5 days ago

[flagged]

Comment by priyadarshy 5 days ago

I think we're all under-indexing on how important it is going to be to be able to play-test an agent's changes before shipping it as a hedge against slop. I don't think people will do it unless it feels effortless, like clicking a preview link from a GitHub PR. If there's any friction, people will just click merge PR and move on with their life and your product-level slop will acccummulate.

I think high-taste products require you to actually use the thing that was built to feel the gaps in your spec.

I routinely find I am doing something dead simple that an agent will one-shot, e.g. add a new sort by option in the panel that lists a user's Linear tasks. If I looked at the PR diff I'd immediately think it was perfect and ship it. It's only when I actually get in there and play around with a dynamic UI that I realize, "you know this option really belongs at the top" or "hm, there's enough options here now that this dropdown feels cramped and we might need to consider another option". Simple examples obviously but the principle is that when I am landing the last 10% of a fix or a feature that when I need to interact and play-test. At the speed agents can generate fixes to new bugs or customer requests, I am bottle necked just doing that last 10% of steering and even the basic git mechanics to try something locally are enough to get me to not want to do it.

Right now, it seems like the state of the art is to review PR diffs and just merge them in or if you are more sophisticated have your agent generate screenshots or screen recordings. Screenshots and images are moving in the right direction but if you are building something interactive, you've really got to interact with it to know if it feels great.

My dream was to start my day with an agent having handled every bug and small enhancement request that came my way that day, worked on, and ready for my review, so I could spend an hour each day just steering them to the finish line.

I could do this linearly, picking off a branch an agent worked on, testing it and iterating but most of these small fixes each day aren't big brain stuff, I can effectively multi-task them but when I've tried doing it on my own machine it's either worktree hell, git gymnastics, or agents deadlocking - one agent wants to self test with Chrome MCP but can't cause another is editing code causing hot reloads.

I ended up building a Desktop app version of what OP posted to do this for myself and my team at Sunsama, it's called Macro: - Website: https://macro.land - Demo: https://www.loom.com/share/89c273e3a92d45cfb6860790d7b78bf6

I had a couple other specific needs that these repos don't cover: - I want complete control of what MCP tools my team uses and the ability to control their input/output etc e.g. I don't want someone using the Intercom MCP to accidentally reply to a customer with an agent. - I don't want my engineers spending time configuring all the boring stuff (preview urls, mcps, chrome mcp, etc) - I wanted the ability to steer agents to the finish line together. In Macro, any user can join the chat and steer the agent e.g. a colleague with UI taste jumps into tweak the last bits - I want to see common failure agentic failure cases across my team so I can improve our Claude.md and agentic practices so all work flowing through one interface allows that