CapROS: Capability-Based Reliable Operating System

Posted by gjvc 13 hours ago

Comments

Comment by mfedderly 12 hours ago

I had the privilege of taking two classes with Dr Shapiro while I was in undergrad. The second class revolved around a related operating system named Coyotos. One of the most memorable classes was a 3 hour session where we worked through the boot sequence step by step [1]. The single lecture helped us all appreciate the delicate dance to bring up an x86 processor, a history lesson in the various features that had been bolted onto x86 over time, and a bunch of helpful debugging tips when your options are limited (it prints "Co" "yo" and "tos" in different stages!).

This was easily one of my most memorable lectures from undergrad, and it really helped to show me that even your operating system is just more software that you can read and understand.

1. https://github.com/vsrinivas/coyotos/blob/c68719b851e253aa11...

Comment by ryanjshaw 9 hours ago

I was a nerdy kid living in the middle of nowhere in Africa. I think we’d had dialup for about 2 years at that, and I emailed him with some questions about how to understand the mathematical notations used in his EROS work. He was very kind and helpful in his response, even though my questions were probably very naive.

Comment by kragen 11 hours ago

Coyotos and CapROS are two continuations of EROS.

Comment by kragen 12 hours ago

Seems like Charlie hasn't been merging pull requests in three years: https://github.com/capros-os/capros

And the list has been idle since then: https://sourceforge.net/p/capros/mailman/capros-devel/

I wonder if something has happened to him? I hope he's okay.

Comment by btilly 12 hours ago

The fact that we went with access control lists instead of true capabilities has long been a disappointment to me.

For people who understand OO, capabilities are the simplest model in the world. You hand out objects. You can call methods on the object. What that method call has access to depends on the permissions on the object, not your permissions. Entire classes of security mistakes (most notably the "confused deputy" become impossible.

The only commercial success that was a true capability system was the AS/400. Not coincidently, single stand alone machines averaged 99.99%-99.999% uptime. And it never had a significant security compromise. (Individual systems did, of course, have problems due to weak passwords and poor configuration. But they were still remarkably resistant.

Capability systems work so well that when people wanted to improve security on Linux, they called it capabilities. Even though it wasn't.

Unfortunately, the world went with ACLs. That's baked in to the design of things like Windows and POSIX. Which means that all of the consumer software out there expects ACLs. In order to get them to run on a pure capability system, you have to do things like create a POSIX subsystem. At which point, you've just thrown away the whole reason to use capabilities in the first place.

Comment by Findecanor 8 hours ago

The big problem is that you'd need to be able to change permissions over time. With ACLs that is simple and direct: if you have the access right, you just change the ACL. Traditional capabilities last forever, unless there is some sort of support for revoking already issued capabilities, and those mechanisms tend are far from straightforward.

Some systems have revocation as a core feature, but a cascading revocation (every delegation as a branch in a tree, and revoke a whole subtree of delegated capabilities) is often complex and takes time, especially if they are on disk. There have also been protocols (for EROS-like OS:es) for setting up systems with additional capabilities to revoke individual capabilities but they are even more complex IMHO. So, in most capability systems the only way to revoke capabilities to a resource is to remove the resource itself.

In CHERI, where every pointer is a capability, revocation of capabilities into a memory object relies on what is effectively a parallel garbage collector process that finds all pointers to revoked objects and overwrites them with an invalid pointer that traps on use. [0]

In the fantasy OS of my mind, ACLs have instead been promoted to "access-control trees" that include a "grant option", allowing a user to grant the permission she has to someone else. But once the first user's permissions are revoked, the sub-tree of re-granted permissions get revoked as well. I think that could be achieved with existing file systems ACLs, with added topology info and enforcement by the OS. Then actual capabilities would be created first when a file is opened, as file handles, but unlike Unix file handles they could be revoked, be revoked in a cascading manner, and revoked automatically if the underlying ACT gets changed.

Authorization Certificates (as in X.509) are a type of distributed cryptographic capabilities, but require complex distribution of "revocation lists". In recent years, there new types of distributed "authorization tokens" have been introduced such as e.g. "Biscuits" [1].

[0] https://www.semanticscholar.org/paper/Cornucopia-Reloaded%3A...

[1] https://www.biscuitsec.org

Comment by LoganDark 4 hours ago

> Traditional capabilities last forever, unless there is some sort of support for revoking already issued capabilities, and those mechanisms tend are far from straightforward.

Capabilities don't have to hold the actual permission to access the object. Capabilities can simply hold a provenance that can be used to verify the source of the access. If that access is then revoked from that source, the capability doesn't need to change at all. This is similar to how generational arenas work in some game engines, IMO.

AFAIK Android performs something similar to this with the storage URLs that are provided to apps, which will be different depending on which picker provided the file/media, etc. Apple probably also does something similar, but I'd imagine with objects rather than strings.

Comment by bheadmaster 2 hours ago

> Capabilities don't have to hold the actual permission to access the object. Capabilities can simply hold a provenance that can be used to verify the source of the access. If that access is then revoked from that source, the capability doesn't need to change at all.

Which complicates the initial premise that

> capabilities are the simplest model in the world. You hand out objects. You can call methods on the object. What that method call has access to depends on the permissions on the object, not your permissions.

Which is exactly what the parent said. Capabilities sound simple at first, but require complex machinery to work.

Comment by ryanjshaw 9 hours ago

It’s bizarre to me that not one megawealthy tech nerd has thrown 8 figures at some smart people in an attempt to solve the capabilities-based OS UX problem. The payoff would be remarkable.

Comment by csrse 6 hours ago

I guess Fuchsia is an attempt at a capability-based OS with wider appeal. The architecture seems interesting, wish there was quicker progress on it.

Comment by Jyaif 1 hour ago

The tricky part is not doing the capability-based OS, it's getting adoption.

Linux is good enough, so a slightly better OS is not going to cut it.

Comment by EGreg 11 hours ago

I guess you must really love Capnproto then: https://github.com/iguazio/go-capnproto2

Comment by btilly 11 hours ago

Not sure why you think that my opinions about operating systems would predict my opinions about an RPC system.

Comment by ocdtrekkie 10 hours ago

You may want to change this link, this is an extremely old fork of the Go capnp implementation. It's neither official nor current!

I'd recommend just pointing to capnproto.org

Comment by ahlCVA 6 hours ago

There is also a relatively modern capability-based kernel in the L4 family of microkernels, called Fiasco.OC: https://os.inf.tu-dresden.de/fiasco/overview.html

There are also a bunch of components for building a functional userspace (such as L4Re or Genode).

Comment by NooneAtAll3 5 hours ago

what does L4 mean here?

Comment by sirwhinesalot 4 hours ago

L4 was a microkernel design by Jochen Liedtke (RIP). It was notable for proving that microkernels can perform much better than was thought at the time (L4 performed 20x better than the Mach microkernel).

The work was so influential it got the ACM SIGOPS Hall of Fame Award in 2015. A whole family of microkernels based on that original design have since been developed, hence the "L4 microkernel family".

Comment by unwind 4 hours ago

It's a family of microkernels.

https://en.wikipedia.org/wiki/L4_microkernel_family

Comment by pyrolistical 11 hours ago

https://en.wikipedia.org/wiki/Capability-based_security

It’s like sharing google doc link. You configure the link to be read only or read/write.

Now imagine you can create as many links as you want with all possible permission combinations. Then you have a capability based system

Comment by iberator 9 hours ago

Intel did this is 1989 with iAPX 432. Super interesting and SUPER complex (just check out the documentation of cpu architecture), that's it failed hard.

Flat memory model always win vs Star Trek like architecture who bo one understands

Comment by gnufx 36 minutes ago

1970s-ish capability systems with support in hardware/firmware include CAP, Flex, System/38, Plessey System 250 (which a former colleague worked on) -- the last two commercial; see https://en.wikipedia.org/wiki/Capability-based_security.

I'd like to think their time has come, given vulnerabilities I see.

Comment by silasdavis 8 hours ago

Most of the links seem to be broken on https://www.capros.org/overview.html

Comment by retrac 8 hours ago

I've written a little bit before about KeyKOS/GNOSIS, which is the capability operating system used by Tymshare to host their timesharing language services on IBM mainframes, in the 70s and 80s. From a comment 3 years ago I'll just repost the relevant part:

> KeyKOS (developed by Tymshare for their commercial computing services in the 1970s) - A capability operating system. If everything in UNIX was a file, then everything in KeyKOS was a memory page and capabilities (keys) to access those pages. The kernel has no state that isn't calculated from values in the virtual memory storage. The system snapshots the virtual memory state regularly. There are subtle consequences from this. Executing processes are effectively memory-mapped files that constantly rewrite themselves, with only the snapshots being written out. Snapshotting the virtual memory state of the system snapshots everything -- including the state of running processes. There's no need for a file system, just a means to map names to sets of pages, which is done by an ordinary process. After a crash, processes and their state are internally consistent, and continue running from their last snapshot. For those who are intrigued, there's a good introduction, written in 1979, by the system's designers available here: http://cap-lore.com/CapTheory/upenn/Gnosis/Gnosis.html (It was GNOSIS before being renamed KeyKOS.) And a later document written in the 90s aimed at UNIX users making the case: http://cap-lore.com/CapTheory/upenn/NanoKernel/NanoKernel.ht... Some work on capability systems continues, but it seems the lessons learned have largely been forgotten.

The core abstraction is simpler than the Unix process model or that of many other operating systems. Processes have keys which access virtual memory pages. All of storage including persistent secondary storage is just one big pool of virtual memory pages. These can be shared between processes. That's all that's necessary to implement things like filesystems and networking which are often thought to require special handling. A filesystem is just names and addresses of pages in storage. Give a process a capability to do shared memory with a process that maintains such a structure. I find the emphasis on minimizing process and kernel state, such that processes can be snapshot and frozen at any time and are inherently persistent, handled as the set of the relevant pages, to be genius. Though the architecture does have the classic microkernel/nanokernel performance penalties, as have been long debated.

Comment by contrarian1234 11 hours ago

Most of my "wtf is going on" moments on Linux have to do with permissions. I loath the industry move to even more security. I want a more Emacs-like experience. Multiuser systems have become the exception and most people have a personal computer with one user. Dealing with evil apps is a loosing battle b/c the attack surface is too large.

I think the counter argument to more security is Distro Repos. When was the last time you apt-get'ed some software and had your documents stolen?

If you add blocks then you need to somehow communicate to the use when it's failing and that's hard... You see the shitshow that is Android security where apps have mysterious access to some directories and not others and it's impossible to understand what's going on. Maybe capabilities will work better, it's unclear to me.

Comment by iberator 9 hours ago

Just link statically compiled emacs into /sbin/init and you are done

Comment by krautburglar 11 hours ago

Absolutely! Most of it is there to protect their moats from us, not us from “hackers”.

Comment by mikewarot 12 hours ago

Why is it that every Capability based system seems to be a toolkit for running a single program instead of an OS ready for daily use? Is it just me?

Comment by kragen 12 hours ago

It's just you. seL4, CheriBSD, etc., do not fit your description. Neither did KeyKOS itself. You're presumably looking at research prototypes.

Comment by ratmice 12 hours ago

I'd also note capros doesn't fit that description either. I don't know that there were examples that ran more than a single process.

That's probably not true, for anything relying on drivers since user mode drivers are basically processes there... but in the way that people might think of a process.

Comment by kragen 11 hours ago

I mean, there isn't exactly a thriving ecosystem of existing software built for CapROS. Right now I don't think anybody even has CapROS itself building.

The problem has gotten a lot easier since the EROS days, thanks to Xen, QEMU, UEFI (?), and the explosion of cheap hardware, but it looks like maybe Charlie got sick or lost interest or something?

Comment by ratmice 11 hours ago

Yeah, I did see a email on a capabilities list from him about him no longer working on it because of lack of feedback & wanting to just enjoy his retirement. That was the impression I got.

When he had resumed his work on it, I personally had been going through a back injury. I still feel bad that I didn't get a chance to contribute any of the hardware ports and software I wrote for it.

Comment by kragen 11 hours ago

Hmm, do you know when?

Comment by ratmice 11 hours ago

I wasn't able to google it, or find a public link to the email (but it was posted on a public list) so here is some relevant snippets from it.

Nov 20 2022 titled CapROS status

"When I retired a year ago I hoped to correct some of those issues, but I want to enjoy retirement and not just have a full-time unpaid job.", ...

"I am considering just abandoning CapROS. I believe there are some useful ideas in the system, but so far no one seems to have known or cared about them."

Comment by ryukafalz 11 hours ago

Since it is a public list, here's the link: https://groups.google.com/g/cap-talk/c/Box4XXhSevw/m/18pUqAQ...

He posted on the list recently too if folks were worried: https://groups.google.com/g/cap-talk/c/XCBwf-zpJWA/m/6CWsNA-...

Comment by wmf 12 hours ago

A lot of OS projects develop the kernel then run out of steam. It's especially hard for capabilities because there's no established standard like Unix/Posix to copy. Capability OSes are still a research topic.

Comment by spencerflem 12 hours ago

Check out Genode Sculpt for a vision of a workable desktop !

It’s capable of dynamic flows, adding and removing programs, has ports of Chromium and Virtual Box. The devs daily drive it :)