Military standard on software control levels
Posted by ibobev 12 hours ago
Comments
Comment by AlotOfReading 11 hours ago
I've found that a quality process that starts with "you need to comprehensively understand what you're engineering" is almost universally a non-starter for anyone not already using these things. Putting together an exhaustive list of all the ways code interacts with the outside world is hard. If a few engineers actually manage it, they're rarely empowered to make meaningful decisions on whether the consequences of failures are acceptable or fix things if they're not.
Comment by kqr 11 hours ago
Much better to do as you say and think about the software and its role in the system. There are more and less formal ways to do this, but it's definitely better than taking a component view.
Comment by aidenn0 2 hours ago
And the article you intended to link is just wrong. E.g. the Therac-25 was not designed to output high power when an operator typed quickly; it was built in such a way to do so. This would be analogous to describing an airplane failure due to using bolts that were too weak: "the bolt didn't fail; it broke under exactly the forces you would expect it to break from its size; if they wanted it to not break, they should have used a larger bolt!" Just like in the Therac example, the failure would be consistently reproducible.
Comment by ryandrake 11 hours ago
Comment by teddyh 11 hours ago
“The reason is that, in other fields [than software], people have to deal with the perversity of matter. [When] you are designing circuits or cars or chemicals, you have to face the fact that these physical substances will do what they do, not what they are supposed to do. We in software don't have that problem, and that makes it tremendously easier. We are designing a collection of idealized mathematical parts which have definitions. They do exactly what they are defined to do.
And so there are many problems we [programmers] don't have. For instance, if we put an ‘if’ statement inside of a ‘while’ statement, we don't have to worry about whether the ‘if’ statement can get enough power to run at the speed it's going to run. We don't have to worry about whether it will run at a speed that generates radio frequency interference and induces wrong values in some other parts of the data. We don't have to worry about whether it will loop at a speed that causes a resonance and eventually the ‘if’ statement will vibrate against the ‘while’ statement and one of them will crack. We don't have to worry that chemicals in the environment will get into the boundary between the if statement and the while statement and corrode them, and cause a bad connection. We don't have to worry that other chemicals will get on them and cause a short-circuit. We don't have to worry about whether the heat can be dissipated from this ‘if’ statement through the surrounding ‘while’ statement. We don't have to worry about whether the ‘while’ statement would cause so much voltage drop that the ‘if’ statement won't function correctly. When you look at the value of a variable you don't have to worry about whether you've referenced that variable so many times that you exceed the fan-out limit. You don't have to worry about how much capacitance there is in a certain variable and how much time it will take to store the value in it.
All these things are defined a way, the system is defined to function in a certain way, and it always does. The physical computer might malfunction, but that's not the program's fault. So, because of all these problems we don't have to deal with, our field is tremendously easier.”
— Richard Stallman, 2001: <https://www.gnu.org/philosophy/stallman-mec-india.html#conf9>
Comment by lukan 9 hours ago
Comment by kqr 40 minutes ago
The software may need to handle hardware failures, but software that doesn't do that also doesn't fail -- it's inadequately designed.
Comment by teddyh 9 hours ago
Comment by BoppreH 8 hours ago
And if you're designing a Hardware Security Module, as another example, I hope that you've taken at least rowhammer into consideration.
Comment by AlotOfReading 6 hours ago
Comment by lo_zamoyski 9 hours ago
However, that "as long as" is doing quite a bit of work. In practice, we rarely have a perfect grasp of a real world program. In practice, there is divergence between what we think a program does and what it actually does, gaps in our knowledge, and so on. Naturally, this problem also afflicts mathematical approximations of physical systems.
[0] And even this is not entirely true. Think of a concurrent program. Race conditions can produce all sorts of weird results that are unpredictable. Perfect knowledge of the program will not tell you what the result will be.
Comment by kqr 3 hours ago
Comment by gmueckl 9 hours ago
In light of this, even software development has to focus on failures when you apply this standard. And that does include considerations like failures occurring with in the computer itself (faulty RAM or faulty CPU core).
Comment by lo_zamoyski 10 hours ago
Of course, the notion of "failure" itself presupposes a purpose. It is a normative notion, and there is no normativity without an aim or a goal.
So, sure, where human artifacts are concerned, we cannot talk about a part failing per se, because unlike natural kinds (like us, where the norm is intrinsic to us, hence why heart failure is an objective failure), the "should" or "ought" of an artifact is a matter of external human intention and expectation.
And as it turns out, a "role in a system" is precisely a teleological view. The system has an overall purpose (one we assign to it), and the role or function of any part is defined in terms of - and in service to - the overall goal. If the system goes from `a->d`, and one part goes from `a->b`, another `b->c`, and still another `c->d`, then the composition of these gives us the system. The meaning of the part comes from the meaning of the whole.
Comment by MobiusHorizons 11 hours ago
Comment by mubbicles 11 hours ago
Has some fun anecdotes in it. My favorite being the nuclear certified supersonic aircraft having a latent defect discovered during integration of a new subsystem. Turns out all of the onboard flight computers crashed at the transition from sub to supersonic, thankfully the aircraft had enough inertia to "ride through" all of their flight computers simultaneously crashing during the transonic boundary.
Moral of that story is your software people need to have the vocabulary to understand the physical properties of the system they're working on.
Comment by inamberclad 10 hours ago
Comment by trklausss 8 hours ago
Most of it happens, as always, at the interface. So these methodologies help you manage these interfaces between people, machine and product.
Comment by jcims 10 hours ago
Maintaining it over time is even harder.
Comment by tehjoker 11 hours ago
Comment by AlotOfReading 11 hours ago
Comment by exe34 10 hours ago
they want the benefits, and are willing to do everything except the things that actually help.
Comment by ldx1024 1 hour ago
If you have ever read the software control category definitions in MIL-STD-882E you know that the definitions that this blog author gives are very much his interpretation. The actual definitions in 882E are a god awful mess. Multiple contradictory definitions provided for the same category. Additional parenthetical statements that are intended to clarify, but just muddy the picture further. Yikes...
Comment by svilen_dobrev 8 hours ago
""" A second important dimension is criticality, the potential damage caused by an undetected defect: loss of comfort (C), loss of discretionary moneys (D), loss of essential moneys (E), and loss of life (L). """
(my rephrasing): he points that the more one moves further into that list, the more hardened/disciplined the way of making should be. From "anything goes" in the beginning to "no exceptions whatsoever" in the end.
[1] https://www.researchgate.net/publication/234820806_Crystal_c...
Comment by renewiltord 10 hours ago
Comment by superxpro12 9 hours ago
Comment by VoodooJuJu 9 hours ago