On Thursday June 17 I went to (on zoom- does that need to be added anymore?)

*A Debate on Program Correctness*

There was no subtitle but it could have been:

Have the points made in *Social Processes and Proofs of Theorems and Programs* by DeMillo, Lipton, Perlis, survived the test of time ? (Spoiler Alert: Yes.)

I found out about it from the Lipton-Regan blog here

The debaters were Richard DeMillo and Richard Lipton and the moderator was Harry Lewis (Alan Perlis passed away in 1990). Calling it a debate is not correct since DeMillo and Lipton (and Lewis) all agree. (DeMillo and Lipton even have the same first name!) The DLP paper is in Harry Lewis's collection* Ideas that created the future.* The event should have been advertised as a discussion. However, it was a good discussion so this is not a complaint.

Here are some things that came out of the discussion.

1) The main topic was the 1979 DeMillo-Lipton-Perlis paper (see here) that gave arguments why Proofs of Program correctness could not work.

An all-to-brief summary of the DLP paper: Some researchers are trying to set up frameworks for doing proofs that programs are correct, analogous to the certainty we get with a proof of a Theorem in Mathematics. But proofs in Mathematics are, in reality, NOT that rigorous. Often details are left out or left to the reader. This is fine for mathematics (more on that later) but unacceptable for programs which need rather precise and rigorous proofs.

How do theorems in mathematics really get verified? By having enough people look at them and make sure they match intuitions, what DLP call *A Social Process*. (NOTE FROM BILL: Papers that are not important do not get looked at so there may well be errors.)

2) The notion of proving-programs-correct was very seductive; however, the people who were trying to do this had a blind spot about how the analogy of proving-programs-correct and proving-theorem-correct differ. In particular, a program is rather complicated and even stating carefully what you want to prove is difficult. By contrast, for most math statements, what you want to prove is clear. Note also that a program has lots of code (far more now than when DLP was written) and so much can happen that you cannot account for.

3) The DLP paper had a large effect on the field of program verification. Funding for it was reduced and students were discouraged from going into it.

4) When DLP appeared DeMillo and Lipton were pre-tenure. Hence it took lots of courage to publish it. Alan Perlis had tenure and had already won a Turing award. This did give DeMillo and Lipton some cover; however, it still took courage.

*Mechanizing Proof: Computing, Risk, and Trust by David McKenzie*see here.

*Ideas that Created the Future*).

*Ideas that created the future*collects up, edits, and comments on 46 important papers in Computer Science (I reviewed it in the issue of SIGACT News that is in your mailbox---I will blog about it at a later time.) There are several papers in it about program verification, including DLP, Hoare's paper, and three papers by Dijkstra.

*The*

*Mythical Man Month*(see my post on Brook's work here ) and DLP.

As for point 9), bad software isn't the same as incorrect software. A program can do what it is supposed to do, but still be considered bad software because it gives a horrible user experience, is hard to install, consumes an awful lot of CPU or memory, or a dozen other reasons than not doing what it supposed to do.

ReplyDeleteOn the other hand, there is software which is loved by millions of users, but which isn't 100% bug-free.

By analogy DLP (1979) might have had a similar effect on program verification as Minsky & Papert (1969) had on neural networks. Both stated convincingly that it will not work. I am still waiting for verification rising like a phoenix from the ashes similarly as neural network learning lately accomplished. I would bet that the computer assistance of formal verification would make the difference (unlike to proofs in mathematics these are very long and very tedious, but mostly trivial so a computer should handle them). Let's wait another decade.

ReplyDelete> But proofs in Mathematics are, in reality, NOT that rigorous. Often

ReplyDelete> details are left out or left to the reader.

There is some confusion here. Proofs in mathematics certainly are rigorous. That is part of the definition of the word "proof". However, it is important to realize that what is written in books and articles is merely a way for the author to communicate the proof to the reader. The reader must construct the rigorous proof for themselves. This is why the steps that you need to include in a written proof depend on the audience that you are writing for. (I was surprised to learn that when non-mathematicians read a math book, they don't check the statements for themselves. I have no idea what they think they are doing.)

Formal proofs are different. They have all steps written out, and so should be checkable by a computer. However, they are probably impossible for humans to check because humans are not computers.

When I write software (and I've written a lot of software), after I've written it, I print it out, then proofread it the same way that I would proofread a mathematical proof. Of course, it helps a lot if the code is written in a way that facilitates this. Most code is poorly written. For the best advice on how to write code so that you can understand it and check that it is correct, see the book "Professional Pascal: Essays in the Practice of Programming" by Henry Ledgard. I read this book when it came out, and it completely changed the way I write code. "The Mythical Man-Month" is a great book, but it is about project management, not what the code should look like.

I think very few programmers print out their code an proofread it. Various software methodologies try to encourage having more than one person check the code, e.g., via pair programming or code reviews. But, I hope that any mathematician feels confident they can check a proof themselves without asking a colleague to check it for them.

Math proofs as written down in papers and textbooks are often not completely rigourous- and this is not a criticism. The missing parts can be supplied by the reader or are obvious. DLM argues that this approach will not work for proving programming correct.

DeleteIt works the same way for programs: If you want a human to check your proof, write it so a human can understand it. If you want a computer to check your proof, write it so a computer can check it.

DeleteI did not attend the debate, but I find it interesting that it occurred less than two weeks after Peter Scholze, a Fields Medalist, blogged about the progress toward formally verifying one of his most important theorems.

ReplyDeletehttps://xenaproject.wordpress.com/2021/06/05/half-a-year-of-the-liquid-tensor-experiment-amazing-developments/

While I agree with most of what the DLP paper says, I think that some of what it says about mathematics is, at best, dated. There is no doubt that socialization will always play a crucial role in the acceptance of mathematical theorems; in the case of the "Liquid Tensor Experiment," this role is clearly explained in the comments to Scholze's blog post. That said, it does not follow that formal verification has no role to play when it comes to increasing our confidence in the correctness of mathematical proofs. The Flyspeck project is perhaps the clearest example of this.

https://arxiv.org/abs/1501.02155

The intermediate lemmas in the proof are certainly messy, ugly, idiosyncratic, and require more than a roll of paper towels to sketch, but they are indispensable to the proof. And their formal verification has tangibly increased the community's confidence that the Kepler conjecture has been rigorously proved. I find it amusing and slightly ironic that perhaps the biggest change since the DLP paper was first written is that mathematics has become more like program verification. Here's another article that you may find interesting, written by the late Fields Medalist Vladimir Voevodsky:

https://www.ias.edu/ideas/2014/voevodsky-origins

That said, I still agree with DLP that the role of formal verification in the case of software is likely to be limited. But it is not because formal verification is irrelevant to mathematics (it isn't), or because all nontrivial formal mathematical proofs are infeasibly long (they aren't), or because important mathematics is never so ridiculously computational that a computer is needed (it sometimes is). Rather, it is because in program verification, exactly specifying what the program is supposed to do is usually an infeasibly difficult process (with moving goalposts) that by its nature cannot be fully automated. By contrast, even in the case of Flyspeck, the end product was the Kepler conjecture itself, and checking that what the computer formally verified was indeed the Kepler conjecture was a humanly feasible task (with fixed goalposts).

In the few cases where writing down the spec (and checking that it really says what we mean) is a humanly feasible task, I think that formal software verification could potentially play an important role.

Perhaps it would be more useful to have a debate involving participants who actually have some notion of how advances in verification and program logics are actually used in industrial software development, see, e.g.,

ReplyDeletehttps://en.wikipedia.org/wiki/Infer_Static_Analyzer

Why in the world would we expect two debaters and a moderator who are totally removed from the field to have anything useful to say on the matter?

Agree that it would have been good to have alternative viewpoint at the discussion.

DeleteI thought static analyzers FOUND BUGS which is GREAT, but didn't (nor did they claim to) VERIFY programs? Am I making to fine a distinction between the two?

I would say that these analyzers can verify certain properties of program units, but not entire programs or systems. On the the hand, it is not even clear what the latter would mean.

Delete"The program is correct; the only issue is that it doesn't do what you want"

ReplyDelete"In the world there are only 'Programmers' and 'Sh*t Programmers'; the only difference is that a 'Programmer' has not yet coded enough to realize that he is a 'Sh*t Programmer'" (this one is by me :-)

DeleteTesting is fine, but the emphasis should be on writing good code. If the code is poorly written, then no amount of testing will fix that. I do comparatively little testing for my own code. I focus my testing on catching errors that are hard for me to catch by proofreading. Using a language and compiler that make it hard to have such errors is extremely useful.

ReplyDeleteI see your point, and, as a programmer, I feel the same about my own code. Yet, for software I did not write, I've more faith in well tested code, than code whose programmer claimed it was well written.

DeleteYou can read the code and see if it is well written. How do you know code is "well tested"? Is it tested by people in the same organization that wrote it? I recommend the essay "It Worked Right the Third Time" in the book "Professional Pascal".

DeleteI inadvertently just produced an example of why testing is not enough to produce good software.

DeleteA user reported a bug in my table tennis tournament software (https://www.davidmarcus.com/Zermelo.htm). The bug had been there for many years, maybe from the first release. The bugged procedure was only a couple dozen lines, but when I looked at it, I thought it was not written well: the order of statements seemed odd.

I rewrote the procedure to fix the bug, and the result was much more logical. I should have spent more time originally to write it better.

The bug would have been very hard to discover by testing. To reproduce it, I had to add a player to a singles event, then add the player to two doubles events with different partners where the player's tournament ID was less than the tournament IDs of the player's doubles partners.

Bill, great post and in line of thought with some

ReplyDeleteprevious ideas we both entertained!

(An aside thought.)

Question how does the Turing-Completeness property

of programming languages impact (i.e. limit[?]) proof verification?

(Assumption is that programs under investigation

[i.e.,to be checked, verified] are written in a

Turing complete language.)

Re: David Marcus comment:

ReplyDeleteYes. Exactly. Guy Steele once said "A program that you spend 9 hours thinking about and 1 hour writing will be better and more interesting than a program that you spend 1 hour writing and 9 hours debugging". (Or something to that effect.)

I think machine learning sidesteps "specifying what to compute" by saying "compute something sort of like these examples" (with variable success). Possibly that's partly why it's popular?

ReplyDelete