I really like the development of all the sanitizers we got in the last decade or so. I only wish that more could be done at compile time as opposed to these runtime checks.
That's the killer feature why I gave F# instead of my usual C# a chance - it 'does more for me' at compile time vs. run time.
Yes, these are both much higher level than C, but if what I'm reading among other comments is correct, Rust accomplishes an equivalent improvement in compile time checks while still being a systems language.
As someone who has personally written Rust code that deadlocked under specific circumstances, I think I can say that Rust does not prevent deadlocks. Not sure about specific types of race conditions, but I don't think it's possible for the compiler to detect all race conditions at compile time.
It is a heck of a lot easier to not make them in Rust than in someting like C and C++. But Rust is not a panacea for asynchronous code.
You can still introduce them via circular strong reference-counted pointers. The solution of course is to have only one strong reference and many weak ones, but the compiler in principle will let you create a cycle that will never get deallocated.
That's the common case, but you can still make a reference counting cycle, or have a thread hang while holding onto something, or even just call `mem::forget` on an allocation.
This is different from Rust's guarantees about memory and thread safety, which you can't break without `unsafe`.
They are, and in a typical program you don't need to worry about leaking memory.
However, it is possible to leak, e.g. if you use a reference-counted type and create a cycle. There's also Box::leak() that does what it says (it's useful for singleton-like things).
Rust would have tradeoffs here, though. For example, in C# you'd be able to do (non-leaking) graphs and other structures in 100% safe code, while Rust would need unsafe.
You need to use reference counting or a scoped memory pool. It's totally fine in almost all cases.
Refcounting in Rust is still faster than refcounting in ObjC or Swift (because Rust can avoid using atomics or increasing counts in many cases), but people who use Rust tend to insist on zero overhead, and a truly flexible zero-overhead solution can't be verified statically.
You don't even need "not the most optimum implementation" (assuming you're talking about index-based graphs or something).
You can make non-leaking pointer-based graphs in safe Rust using reference counting (basically what C# does, if you squint) or arenas (if your nodes' lifetimes fit that pattern).
There are sound static analyzers, like Trust-in-Soft, that can statically guarantee the absence of undefined behaviors in C programs, but require interactive work to get rid of the false positives.
But the distinction between "runtime" and "compile-time" is not entirely binary. What you call "compile time" checks can be viewed as an abstract interpretation [1] of the program, or as running the program in some abstract domain -- e.g., where a concrete value could be 3, its abstract value could be called `int`, and the + operation could be interpreted as `int + int = int`; type inference could be viewed as abstract interpretation. The problem is that sound abstract interpretation (i.e., one without false negatives -- if it tells you your programs is "safe" then it will definitely be safe, for some appropriate definition of safe) is limited, and often comes with big tradeoffs, e.g. either false positives given by sound static analyzers or the pain something like Rust's borrow checker sometimes causes -- these are both instances of the same underlying difficulty with abstract interpretation.
One of the most promising directions in formal methods research is "concolic testing" [2], which means a combination of concrete (i.e. non-abstract, or "runtime") execution and symbolic execution (an instance of abstract interpretation). It is not sound, but can be made quite safe, and at the same time it can be very powerful and flexible, checking properties that can be either impossible or extremely tedious, to the point of infeasibility, with abstract methods like types. It might prove to be a good sweet spot between the power, expressivity and cost of concrete (AKA dynamic, AKA runtime) methods and the soundness of abstract (AKA static, AKA compile-time) methods.
Even dynamic methods are often "stronger" (more sound) than just testing assertions during testing. For example, fuzzers can sample the program's state space, offering more coverage, either by analyzing the program (whitebox) or just randomly (blackbox), and various sanitizers sample the scheduling space by fuzzing thread scheduleing.
The field evidence would support their position, too. Your claim about C is true just enough to encourage people to not use it if they want to get more out of program analysis. Yet, current tools make up for it enough to find most or all bugs in benchmarks with Mayhem fixing them, too.
The only problem is that almost nobody that cares about FOSS security is working on those tools that automate it. The companies doing it end up black boxing, patenting, etc the tools that they sell for exhorbitent prices. Something like RV-Match well-integrated with repos either FOSS or just price scaled to focus on mass adoption over high profit would be a game changer. Especially if 3rd parties could use it on repos.
Aren't those runtime tools like I just suggested? The other ones in the 1st link are theorem provers (at least the one that I recognized, frama-c) that require a lot more than off-the-shelf C code for their inputs. You can't just run CC=my-compiler-frontend make to suddenly get better diagnostics.
Companies are mixing static and running analysis more than they used to. Most static analyzers look at just the code itself, build models of it, check their properties, and report the presence/absence of errors. First links were more about effectiveness on C. I just dug around in search to find you something that explains it, shows its effectiveness, and threw in a case study you might enjoy:
The provers do something similar but with more manual effort and logic-based approach. Frama-C and SPARK Ada are interesting in that they use programmer annotations plus supporting functions (eg ghost code). These are converted into logical representation (eg facts about the program), fed into logic solvers with properties intended to be proved, and success/failure maybe tells you something. Like with static analyzers, you get the results without running the program. A recent, great addition is that, if proof is too hard, the condition you didn't prove can be turned into a runtime check. You could even do that by default only proving the ones that dragged performance down too much. Flexible.
The best methods are a mix. I've always advocated mixing static analyzers, dynamic analysis, fuzzing etc since each technique might spot something the other will miss. They also cover for implementation bugs each might have. That adding runtime analysis can improve effectiveness does not negate my original claim that static, non-runtime analysis of C code can tell you a lot about it and/or find a ton of bugs. I also agreed with one point you made about it not conveying enough information: these clever tools pull it off despite C lacking design attributes that would make that easier. SPARK Ada is a great example of a language designed for low-level, efficient coding and easy verification. Although, gotta give credit to Frama-C people for improving things on C side.
This is really interesting stuff, thanks. I've assumed that without source annotation, the only things that you can verify statically are violations of the runtime model (e.g., stack overflow, heap corruption). Mostly through inferences on possible value sets of function arguments and return values.
I'm going to have a look at your links more. I guess the question I don't have answered quite yet is how to tell the tool what to look for beyond the language semantics.
What you've assumed is mostly correct. Almost all the annotation-free tools look for violations like you described since that's what most people want. The correctness-oriented work starts with annotated code. The only other thing I know that works without annotations, but sort of similar, is creating the specs from example inputs and outputs. Then, the code can be analyzed against them.
What pro's often do, though, is prove the algorithm itself in a higher-level form that captures its intrinsic details, create an equivalent form in lower-level code, and prove the equivalence. That lets you deal with the fundamentals first. Then address whatever added complexity comes in from lower-level details and language semantics.
That's a good comparison given asserts are specs. If you like asserts, the easiest, high-ROI method of getting into verification is Design-by-Contract. Here's both a way to sell it to your boss and some articles highlighting it:
Many of us independently converged on a consensus about how to do it quickly, too. You put the contracts into the code, do property-based testing to test them, and have them turn into runtime or trace checks combined with fuzzers. Eclipser is one that runs on binaries finding problems way faster than others. This combo lets you iterate fast knocking problems out early. If you want more assurance, you can feed those contracts (aka formal specifications) into tools such as Frama-C and SPARK Ada to get proofs they hold for all inputs. Obviously, run lots of analyzers and testing tools on it overnight, too, so you get their benefits without just staring at the screen.
Another lightweight-ish, formal method you might like was Cleanroom Software Engineering. It was one of first to make writing software more like engineering it. Stavely's introduction is good. His book was, too, with the chapter on semi-formal verification being easier to use and high ROI having excellent arguments. Fortunately, the automated and semi-automated tools doing formal analysis look good enough to replace some or all of the manual analysis Cleanroom developers do. Languages such as Haskell with advanced type systems might let it go even further. I advise coding in a simplified, hierarchical style like in Cleanroom to ease analysis of programs even if you do nothing else from the method. If you're curious, that technique was invented by Dijkstra in 1960's to build his "THE Metaprogramming System."
Are you Santa? Isn't it a bit early for Christmas?
Thank you so much! I've been looking into building a foundation with regards to formal methods and safety-critical systems, and besides getting familiar with the regulations and Standards around that type of software, I've also been looking to build up some familiarity with the tools to achieve those higher levels of provability!
This all looks like it fits the bill perfectly! Thank you! Thank you! Thank you!
Interesting enough, safety-critical field doesn't actually use formal methods much. It's more about lots of documentation, review, and testing. Mostly just human effort. They do buy static analyzers and test generators more than some other segments. There's slowly-slowly-increasing adoption of formal methods in that space via SPARK Ada and recently-certified CompCert. The main incentive so far is they want to get bugs out early to avoid costly re-certifications.
DO-178C and SIL's are main regulations driving it if you want to look into them. DO-178B, the prior version, is also a good argument for regulating software for safety/security. It worked at improving quality. We even saw our first graphics driver done robustly. :)
I have an alternate opinion which is compile times should be fast because it done often. Where sanitization is something you only need to do in later release QA stages. SO it can be slow and that's okay.
Rust prevents data races at compile time. Note that data races, while a common form of race condition, is not the only kind of race condition, and Rust won't prevent the others.
You're right to an extent. I think that because of Halting problems, a number of useful properties computer could tell you about itself are limited, because they boil down to solving Halting problem.
However, one way to avoid such a problem was basically settling for less accurate, but still strict subproblems. I.e. Rust doesn't stop all race conditions but solves them for a subset of data races.
In other words you reject some valid programs (false positive?) in order to make sure all valid programs are really valid (false negative?).
I do not program in ATS because I am not doing anything that needs to be as efficient, so that I have the luxury to enjoy programming and learning Haskell, in the hope that by the time when I would need to write some super secure and performant program that has to run without interruption, and is used by more users than myself, well, I hope that by that time Haskell will evolve enough to allow me to write such a program. Otherwise I might end up using ATS, though honestly the syntax is so ugly.
The good news is it's also the golden age of static analysis right now. There's open source and commercial tools that find most of the problems. Some are scaling up on codebases the size of the Linux kernel.
So, not only can you do that: it's standard practice for some developers already in and out of commercial space. Mostly commercial developers doing it that I see, though.
It is dynamic analysis. I don't think it's that reason. Both Saturn and Facebook's tool scaled analysis to the size of the Linux kernel. I think most developers and companies aren't using or pushing them like they could for cultural reasons. Look at the number of people that used sanitizers vs low-F.P. static analyzers even if free. The latter would've gotten them results sooner. It wasn't empirical evidence or effectiveness that led to that decision. Human factors trump technical factors in adoption decisions most of the time.
Note: Linking to Saturn since it's old work by small team. I theorized big investment could achieve even more. Facebook's acquisition and continued investment with excellent results proved it.
If tech and money is the limit, tell me why Google hasn't straight-up bought the companies behind RV-Match and Mayhem to turn their tech on all their code plus open-source dependencies. Even if over-priced, it might actually turn out cheaper than all the bug bounties adding up. Maybe re-sell it cheap as a service Amazon-style. The precedent is that Facebook bought one that built a scalable tool they're applying to their codebase. Then, they were nice enough to open-source it. Google, Microsoft, Apple, etc could get one, too.
What's Google's excuse? They sure aren't broke or dumb. Gotta be cultural. Likewise, Microsoft built lots of internal tools. They do use at least two static analyzers: one for drivers, one for software. I heard they even ship 2nd one in VS. They don't use most of the MS Research tools, though. The reason is cultural: Operations side knows they'll keep making money regardless of crap security.
Far as FOSS developers, most don't care about security. Those that do tend to follow both what takes little effort (naturally) and whats trending. The trending topics are sanitizers, fuzzers, and reproducible builds. Hence, you see them constantly instead of people learning how to build and scale program analyzers, provers, etc with much better track record. Note that I'm not against the others being built to complement the better methods. I just think, if this is rational vs emotional, you'd see most volunteer and corporate effort going to what has had highest payoff with most automation so far. For a positive example, a slice of academics are going all-in on adaptive fuzzers for that reason, too. They're getting great results like Eclipser.
I know a bunch of the people at Google who are responsible for program analysis. The idea that they haven't bought some company doing symbolic execution because of some internal vendetta against static analysis is just ridiculous.
I also find it weird that you reference Saturn, since Alex Aiken hasn't been driving heavy SAT stuff for a while. Clark Barrett is at Stanford now and is doing more things related to what you want. And... oh wait he spent a few years at Google at few years ago.
Fuzzing plus sanitizers work like mad. They aren't magic and interpocedural static analysis provides value too. But your claim that Google is ignoring these techniques and doing so because of cultural idiocy just isn't correct.
I'm saying they didn't care enough to do the kind of investment others were doing which would've solved lots of their problems.
The article also indicates they couldn't get developers to do their job of dealing with the alerts despite false positives. If Coverity's numbers are right, there's over a thousand organizations whose managers did it better.
Since they didn't address it, Google would be about the best company to acquire expensive tech like Mayhem that finds and fixes bugs so their developers can keep ignoring them. Alternatively, start double teaming the problem investing in FB's tool, too, moving in best advances from SV-COMP winners.
I mean, their size, talent, and finances don't go together with results delivered (or not) in static analysis. They could do a lot more with better payoff.
EDIT: Forgot to mention I referenced Saturn because parent said the methods couldn't scale to the Linux kernel. And Saturn was used on the Linux kernel over a decade ago. A few scale big now.
You keep missing my point. I keep giving examples of large-scale, catch-about-everything systems. I'm not talking merely has a few teams on something. This is Google, not a mid-sized business. You win if they already have their own version of Mayhem that fixes their bugs for them. Otherwise, they're behind Facebook in terms of financial effort they'll put in to get great capabilities. I'm also going to assume they're playing catch-up to Infer unless they've released their analyzers at least for peer review.
The infer team is only like 15 people, last I checked. That's certainly not more than "a few teams on something". Also, separation logic and symbolic/concolic execution are such wildly different approaches that it seems odd to pivot to Infer here.
I obviously won't be able to convince you since they aren't publishing at the same rate as facebook. So you'll just have to take my word that Google doesn't have a vendetta against static analysis.
"The infer team is only like 15 people, last I checked."
The Infer team is claiming to get results on C/C++ with both sequential and concurrency errors via a tool they open sourced. I value scalable results over theories, team counts, etc. Does Google have a tool like that which we can verify by using it ourselves? Even with restrictions on non-commercial use? Anything other than their word they're great?
"that it seems odd to pivot to Infer here."
"So you'll just have to take my word that Google doesn't have a vendetta against static analysis. "
You really messed up on 2nd sentence since I linked an article on Google's static analysis work. I told people they're doing it. I mentioned Infer was Facebook pouring a lot of money into a top-notch, independent team to get lots of result. I mentioned Google could do that for companies like that behind Mayhem that built the exact kind of stuff they seem to want in their published paper. They could do it many times over. If they did it, I haven't seen it posted even once. They don't care that much.
Your claims about team size and "vendetta against static analysis" are misdirection used to defend Google instead of explain why they don't buy and expand tech like Infer and Mayhem. And heck, they can use Infer for free. My theory is something along the lines of company culture.