Logic Programming with core.logic ================================= Logic programming is an interesting approach to programming that was pioneered by Prolog_ in the 1970s and 1980s. While it never became as popular as its supporters hoped it would, the ideas behind it are very interesting. Constraint Problems ------------------- Logic programming is an approach to programming that is often useful for solving **constraint problems** where searching is required. For instance, consider this constraint problem: Find all integer values of x and y such that:: x + y = 10 x and y are constrained as follows:: 0 <= x <= 7 2 <= y <= 11 One solution to this problem is x = 4 and y = 6, but there are others. You could find all solutions to this problem in Go_ like this:: package main import "fmt" func main() { for x := 0; x <= 7; x++ { for y := 2; y <= 11; y++ { if x+y == 10 { fmt.Printf("x=%v y=%v\n", x, y) } } } } This is a **brute force** solution. The two loops generate all pairs of ``x`` and ``y`` values that satisfy the variable constraints, and then tests if they sum to 10. Since there are 8 possible ``x`` values and 10 possible ``y`` values, there are exactly :math:`8 \cdot 10 = 80` (x, y) pairs to test. The great thing about brute force is that it works with any constraint problem like this. Of course, the bad thing is that search spaces for even small problems can be gigantic, thus making brute force it glacially slow. Here's another constraint problem: Find all integer values of x and y such that:: x*x - 2*y*y = 1 x and y are constrained as follows:: 0 <= x <= 10000 0 <= y <= 10000 This is an instance of the `Pell equation `_. This Go_ program prints all the answers:: package main import "fmt" func main() { for x := 0; x <= 10000; x++ { for y := 0; y <= 10000; y++ { if x*x-2*y*y == 1 { fmt.Printf("x=%v y=%v\n", x, y) } } } } Here they are:: x=1 y=0 x=3 y=2 x=17 y=12 x=99 y=70 x=577 y=408 x=3363 y=2378 Since ``x`` and ``y`` both range from 0 to 10000, exactly :math:`10001 \cdot 10001 = 100020001` (x, y) pairs are tested. That's over 100 million pairs of numbers. On a fast modern computer there's not much delay in generate the answer, but as the ranges for ``x`` and ``y`` increase, the program gets slower and slower. You need to come up with more clever methods to reduce the space of possible answers. Here's a famous constraint problem with more variables: Assign a digit to each of the following letters that makes this equation true:: SEND + MORE = MONEY The letters are constrained as follows: - each letter stands for a digit from 0 to 9 - different letters are different digits - a number can't start with 0 (so S and M can't be 0) Hold onto your mouse --- here's a brute-force Go_ solution:: package main import "fmt" func main() { method1() } // main func method1() { for S := 0; S <= 9; S++ { for E := 0; E <= 9; E++ { for N := 0; N <= 9; N++ { for D := 0; D <= 9; D++ { for M := 0; M <= 9; M++ { for O := 0; O <= 9; O++ { for R := 0; R <= 9; R++ { for Y := 0; Y <= 9; Y++ { if allDiff([]int{S, E, N, D, M, O, R, Y}) && S != 0 && M != 0 { SEND := 1000*S + 100*E + 10*N + D MORE := 1000*M + 100*O + 10*R + E MONEY := 10000*M + 1000*O + 100*N + 10*E + Y if SEND+MORE == MONEY { fmt.Printf(" %v\n + %v\n-------\n %v\n\n", SEND, MORE, MONEY) } } } // Y } // R } // O } // M } // D } // N } // E } // S } // method1 func allDiff(nums []int) bool { for i := 0; i < len(nums); i++ { for j := i + 1; j < len(nums); j++ { if nums[i] == nums[j] { return false } } } return true } As before, the idea is to use nested loops, one loop per variable, to generate all possible assignments of the variables ``S``, ``E``, ``N``, ``D``, ``M``, ``O``, ``R``, and ``Y``. Since there are 8 variables, there are :math:`10^{8}`, i.e. a hundred million, possible assignments. It finds the one and only solution:: 9567 + 1085 ------- 10652 How fast is this program? On my computer:: $ time go run sendMoreMoney.go 9567 + 1085 ------- 10652 5.50user 0.06system 0:05.37elapsed 103%CPU (0avgtext+0avgdata 26300maxresident)k 0inputs+2816outputs (0major+10863minor)pagefaults 0swaps So this takes about 5.5 seconds to compile and run, which isn't bad. But we can do a little better. An obvious improvement is to never assign 0 to ``S`` or ``M`` (because the problem says a number can't start with 0). So we can change the ``S`` and ``M`` loops to start at 1 instead of 0:: for S := 1; S <= 9; S++ { // ... for M := 1; M <= 9; M++ { // ... The ``S == 0`` and ``M == 0`` tests in the ``allDiff`` if-statement are no longer needed, giving this code:: func method2() { for S := 1; S <= 9; S++ { // 0 changed to 1 for E := 0; E <= 9; E++ { for N := 0; N <= 9; N++ { for D := 0; D <= 9; D++ { for M := 1; M <= 9; M++ { // 0 changed to 1 for O := 0; O <= 9; O++ { for R := 0; R <= 9; R++ { for Y := 0; Y <= 9; Y++ { if allDiff([]int{S, E, N, D, M, O, R, Y}) { SEND := 1000*S + 100*E + 10*N + D MORE := 1000*M + 100*O + 10*R + E MONEY := 10000*M + 1000*O + 100*N + 10*E + Y if SEND+MORE == MONEY { fmt.Printf(" %v\n + %v\n-------\n %v\n\n", SEND, MORE, MONEY) } } } // Y } // R } // O } // M } // D } // N } // E } // S } // method2 This runs a little faster than ``method1``:: $ time go run sendMoreMoney.go 9567 + 1085 ------- 10652 4.55user 0.08system 0:04.46elapsed 103%CPU (0avgtext+0avgdata 26176maxresident)k 8inputs+2824outputs (0major+12215minor)pagefaults 0swaps With some thought, you can infer that the only possible value for ``M`` is 1. That's because ``MONEY`` has five digits, but ``SEND`` and ``MORE`` only have four, and so the carry to get the ``M`` in ``MONEY`` can only be 0 or 1. Since 0 is disallowed by the problem statement, ``M`` must be 1. Continuing reasoning in this fashion is how you could solve the problem by hand. It's a fun puzzle that you should try. As you figure out more numbers, you can cut down on the possible values for other variables, eventually zero-ing in on the solution. Inferring values like this is a good approach, but its hard to implement. So lets think if there are relatively simple changes we could make to the program that will speed it up. One idea is to spread out the testing for different variables. For example, consider the first two loops:: for S := 1; S <= 9; S++ { // 0 changed to 1 for E := 0; E <= 9; E++ { // ... Suppose ``S`` is 1 and ``E`` is 1. Then we know immediately that these can't be in the solution, because two letters can't have the same value. So there's no need to calculate any further values. Instead, we can immediately jump to the next value of ``E`` (which is 2), and try that. This idea can applied to all the loops: after a variable is assigned, check to make sure it is not equal to any of the previously assigned variables. If it is, then immediately jump to the next value. Here's an implementation of that idea:: func method3() { for S := 1; S <= 9; S++ { for E := 0; E <= 9; E++ { if E == S { continue } for N := 0; N <= 9; N++ { if N == S || N == E { continue } for D := 0; D <= 9; D++ { if D == S || D == E || D == N { continue } for M := 1; M <= 9; M++ { if M == S || M == E || M == N || M == D { continue } for O := 0; O <= 9; O++ { if O == S || O == E || O == N || O == D || O == M { continue } for R := 0; R <= 9; R++ { if R == S || R == E || R == N || R == D || R == M || R == O { continue } for Y := 0; Y <= 9; Y++ { if Y == S || Y == E || Y == D || Y == M || Y == O || Y == R || Y == N { continue } SEND := 1000*S + 100*E + 10*N + D MORE := 1000*M + 100*O + 10*R + E MONEY := 10000*M + 1000*O + 100*N + 10*E + Y if SEND+MORE == MONEY { fmt.Printf(" %v\n + %v\n-------\n %v\n\n", SEND, MORE, MONEY) } } // Y } // R } // O } // M } // D } // N } // E } // S } // method3 The results in a significant performance increase:: $ time ./sendMoreMoney 9567 + 1085 ------- 10652 0.61user 0.06system 0:00.49elapsed 138%CPU (0avgtext+0avgdata 26032maxresident)k 0inputs+2824outputs (0major+12074minor)pagefaults 0swaps It takes 0.61 seconds, which is only a couple tenths of a second, or so, more than the the time needed to compile the program! Despite it's good performance, this program is big and ugly. It's also tedious to write, and hard to change. Is there a better way? Logic Programming ----------------- `Logic programming `_ is a style of programming based on formal logic. Prolog_ is the most famous example of a logic programming language, and while it can be used for general-purpose programming, it is probably most useful for a handful of applications, such as grammar processing (e.g. creating parse trees), and solving constraint problems. Prolog_ programs also have a reputation for begin very compact, which is can be good and bad depending upon the situation. For the kinds of constraint problems introduced above, Prolog_ programs that solve them tend to be short and readable than the what we had to write for Go_. Here's a sample Prolog_ program (copied from https://en.wikipedia.org/wiki/Prolog):: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Facts. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% mother_child(trude, sally). % trude is the mother of sally father_child(tom, sally). % tom is the father of sally father_child(tom, erica). % tom is the father of erica father_child(mike, tom). % mike is the father of tom %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Rules. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % X is a sibling of Y if they share a common parent Z sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y). % X is the parent of Y if either X is the father of Y, % or X is the mother of Y parent_child(X, Y) :- father_child(X, Y). parent_child(X, Y) :- mother_child(X, Y). This program is divided into two basic parts: a small database of facts, and a number of rules that can be applied to those facts. You can see that ``trude`` is the mother of ``sally``, and ``mike`` is the father of ``tom``. Prolog_ rules, in their pure form, are `horn clauses `_ with this general form:: H :- B_1, B_2, ..., B_n You can read this as saying "to prove ``H``, prove ``B_1``, and prove ``B_2``, ... and prove ``B_n``". The program's sibling rule has this form:: sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y). This says to prove ``sibling(X, Y)``, you must prove both ``parent_child(Z, X)`` and ``parent_child(Z, Y)``. With these facts and rules in place, you can now write *queries* at the interpreter like this:: ?- sibling(sally, erica). Yes Given these facts and rules, Prolog_ can prove that ``sally`` is a sibling of ``erica``. Using an elegant combination of backtracking and pattern matching, Prolog_ is able to automatically infer this fact. Thus, Prolog_ is a popular choice for mimicking symbolic reasoning. One way to think about Prolog_ is as a database system that contains both facts and rules, and when you query this database Prolog_ can apply the rules to prove new true facts. The power of this approach can be surprising. For example, consider this simple Prolog_ program consisting of just these two facts:: bit(0). bit(1). This says that 0 and 1 are bits. Now we can generate, say, all 3-bit bit strings with this query:: ?- bit(A), bit(B), bit(C), Bits=[A, B, C]. A = B, B = C, C = 0, Bits = [0, 0, 0] ; A = B, B = 0, C = 1, Bits = [0, 0, 1] ; A = C, C = 0, B = 1, Bits = [0, 1, 0] ; A = 0, B = C, C = 1, Bits = [0, 1, 1] ; A = 1, B = C, C = 0, Bits = [1, 0, 0] ; A = C, C = 1, B = 0, Bits = [1, 0, 1] ; A = B, B = 1, C = 0, Bits = [1, 1, 0] ; A = B, B = C, C = 1, Bits = [1, 1, 1]. This query finds all values of ``A``, ``B``, and ``C`` that are bits, which is the same as asking for all 8 bit strings of length 3. Despite its charms, Prolog_ is no longer a very popular language. While its program can be compact, *writing* those programs can be quite tricky compared to more mainstream languages. The syntax is quite unusual, and you must learn a number of tricks specific to Prolog_ to solve many basic programming problems that are trivial in procedural, or even functional, languages. Plus, Prolog_ is usually not as fast at solving any particular constraint problem as compared to a customized Java_ or C++ program doing the same thing. Logic Programming in Clojure ---------------------------- So while Prolog_ never really caught on in the mainstream, the ideas behind it are still useful and intriguing, and have been copied in different forms in other languages. For example, Clojure_ has a library called `core.logic `_ that implements a number of logic programming ideas within Clojure_. Lets see how to solve a few constraint problems in `core.logic `_. First: Find all integer values of x and y such that:: x + y = 10 x and y are constrained as follows:: 0 <= x <= 7 2 <= y <= 11 This `core.logic `_ will solve it:: (require '[clojure.core.logic :as logic] '[clojure.core.logic.fd :as fd]) (def csp1 (logic/run* [q] (logic/fresh [x y] (fd/in x (fd/interval 0 7)) (fd/in y (fd/interval 2 11)) (fd/eq (= (+ x y) 10) ) (logic/== q [x y]))) ) => csp1 ([0 10] [1 9] [2 8] [3 7] [4 6] [5 5] [6 4] [7 3]) Second:: Find all integer values of x and y such that:: x*x - 2*y*y = 1 x and y are constrained as follows:: 0 <= x <= 10000 0 <= y <= 10000 Here is a `core.logic `_ solution that doesn't work:: (def csp2 (logic/run* [q] (logic/fresh [x y] (fd/in x y (fd/interval 0 10000)) ;; 10000 is too big! (fd/eq (= 1 (- (* x x) (* 2 y y))) ) (logic/== q [x y]))) ) When you evaluate ``csp2``, you get this:: ; => csp2 ;; when x, y in interval 0 1000 ; StackOverflowError clojure.lang.Symbol.hashCode (Symbol.java:84) Presumably the implementation of `core.logic `_ makes recursive calls, and a ``StackOverflowError`` implies that too many function calls have been made. So for this solution to work, you must decrease the range of ``x`` and ``y``, e.g.:: (def csp2 (logic/run* [q] (logic/fresh [x y] (fd/in x y (fd/interval 0 600)) ;; 600 is ok (on my computer) (fd/eq (= 1 (- (* x x) (* 2 y y))) ) (logic/== q [x y]))) ) => csp2 ([1 0] [3 2] [17 12] [99 70] [577 408]) It takes a few seconds on my computer to find this solution. And finally: Assign a digit to each of the following letters that makes this equation true:: SEND + MORE = MONEY The letters are constrained as follows: - each letter stands for a digit from 0 to 9 - different letters are different digits - a number can't start with 0 (so S and M can't be 0) Here is a `core.logic `_ solution that works:: (def csp3 (logic/run* [q] (logic/fresh [S M E N D O R Y] (fd/in S M (fd/interval 1 9)) (fd/in E N D O R Y (fd/interval 0 9)) (fd/distinct [S M E N D O R Y]) (fd/eq (= (+ (* 10000 M) (* 1000 O) (* 100 N) (* 10 E) Y) (+ (+ (* 1000 S) (* 100 E) (* 10 N) D) (+ (* 1000 M) (* 100 O) (* 10 R) E) ) ) ) (logic/== q [[S E N D] [M O R E] [M O N E Y]]))) ) => csp3 ([[9 5 6 7] [1 0 8 5] [1 0 6 5 2]]) This solution is calculated in the blink of an eye. In practice, systems like `core.logic `_ are quite flexible and easy to use (once you get used to the syntax!), but tend to perform more slowly than custom-made solvers (like the ones we wrote above in Go_). Plus, different ways of writing the same constraints can result in wildly different run-times, and so it can be hard to predict ahead of time how fast --- or slow --- a `core.logic `_ might run.