Modeling and simulation of planetary motion

I believe that the maximum exponent of curiosity is trying to understand how our own world works; thus, as a curious minded person, I’ve always been very fond of physics and applied sciences.

One of my favourite problems in classical mechanics is the planetary motion, and one of my first attempts to solve it with computer algorithms dates from my early teens. My first approach was written in BASIC, and the results were completely wrong, as I didn’t know enough about the underlying mathematics. Subsequent attempts yielded good results once I knew some basics of differential calculus during my high-school years.

In this post I’ll rescue one of those implementations, written in C, which uses the laws of motion to compute astronomical body paths.

Physics background

Let’s suppose that we have two bodies B_1 and B_2, at positions \vec{p}_1 and \vec{p}_2, and with masses m_1 and m_2, respectively. The Newton’s law of universal gravitation states that the force between the two bodies is proportional to the product of the two masses, and inversely proportional to the squared distance

  F = G\,\dfrac{m_1\,m_2}{r^2}.

Where r = \left|\vec{p}_1-\vec{p}_2\right|. For instance, the force with which B_1 is attracted towards B_2, including magnitude and direction, and being \hat{r} the unitary vector from \vec{p}_1 to \vec{p}_2, is

  \vec{F}_{12} = G\,\dfrac{m_1\,m_2}{\left|\vec{p}_1-\vec{p}_2\right|^2}\hat{r} = G\,\dfrac{m_1\,m_2}{\left|\vec{p}_1-\vec{p}_2\right|^3}(\vec{p}_2-\vec{p}_1)

Solution example when two bodies interact.

If we also use the Newton’s second law of motion, which relates force and acceleration

  \vec{F} = m\,\vec{a},

and knowing that the acceleration \vec{a}(t) is the second time derivative of position \vec{p}(t)

\vec{a}(t) = \dfrac{d^2}{dt^2}\vec{p}(t),

we can express the problem with the following set of coupled non-linear differential equations:

  \begin{array}{c}  \dfrac{d^2}{dt^2}\vec{p}_1(t) = G\,\dfrac{m_2}{\left|\vec{p}_1(t)-\vec{p}_2(t)\right|^3}(\vec{p}_2(t)-\vec{p}_1(t))\\  \dfrac{d^2}{dt^2}\vec{p}_2(t) = G\,\dfrac{m_1}{\left|\vec{p}_1(t)-\vec{p}_2(t)\right|^3}(\vec{p}_1(t)-\vec{p}_2(t)).  \end{array}

Same solution as before, but showing the relative motion. The result is an ellipse.

If we provide the initial conditions, i.e. the initial positions \vec{p}_1(t_0) and \vec{p}_2(t_0), and the initial velocities \vec{v}_1(t_0) and \vec{v}_2(t_0), we can obtain, from the above equations, the paths \vec{p}_1(t) and \vec{p}_2(t), regardless of the number of dimensions. Particularly, for a two-dimensional problem, where

  \begin{array}{c}  \vec{p}_1(t) = (x_1(t), y_1(t))\\  \vec{p}_2(t) = (x_2(t), y_2(t)),  \end{array}

the set of equations becomes

  \begin{array}{c}  \dfrac{d^2}{dt^2}x_1(t) = G\,\dfrac{m_2}{((x_1(t)-x_2(t))^2+(y_1(t)-y_2(t))^2)^\frac{3}{2}}(x_2(t)-x_1(t))\\  \dfrac{d^2}{dt^2}y_1(t) = G\,\dfrac{m_2}{((x_1(t)-x_2(t))^2+(y_1(t)-y_2(t))^2)^\frac{3}{2}}(y_2(t)-y_1(t))\\  \dfrac{d^2}{dt^2}x_2(t) = G\,\dfrac{m_1}{((x_1(t)-x_2(t))^2+(y_1(t)-y_2(t))^2)^\frac{3}{2}}(x_1(t)-x_2(t))\\  \dfrac{d^2}{dt^2}y_2(t) = G\,\dfrac{m_1}{((x_1(t)-x_2(t))^2+(y_1(t)-y_2(t))^2)^\frac{3}{2}}(y_1(t)-y_2(t)),  \end{array}

where we have to find x_1(t), y_1(t), x_2(t) and y_2(t). The problem can be easily spread to N dimensions and M bodies.

Designing an algorithm to solve the equations

If we discretize the time domain

  t_n = t_0 + n\,\Delta_t,\qquad n \in \mathbb{Z}^+

and we apply the following first order finite differences approximations of the differential operators (this is the easiest way, and it’s equivalent to Euler method):

  \begin{array}{c}  \vec{v}(t) = \dfrac{d}{dt}\vec{p}(t) \simeq \dfrac{\vec{p}(t+\Delta_t)-\vec{p}(t)}{\Delta_t}\\  \vec{a}(t) = \dfrac{d}{dt}\vec{v}(t) \simeq \dfrac{\vec{v}(t+\Delta_t)-\vec{v}(t)}{\Delta_t}  \end{array}

3D solution example

We can compute both the velocity and position of a body B_i, given the force exerted over it, by finding \vec{p}_i(t+\Delta_t) from the above equations

  \begin{array}{l}  \vec{F}_i = \displaystyle\sum_{j \neq i}G\,\dfrac{m_i\,m_j}{\left|\vec{p}_i(t_n)-\vec{p}_j(t_n)\right|^3}(\vec{p}_j(t_n)-\vec{p}_i(t_n))\\  \vec{a}_i = \dfrac{\vec{F}_i}{m_i}\\  \vec{v}_i(t_n+\Delta_t) = \vec{v}_i(t_n) + \Delta_t\,\vec{a}_i\\  \vec{p}_i(t_n+\Delta_t) = \vec{p}_i(t_n) + \Delta_t\,\vec{v}_i(t_n)  \end{array}

As simple as that, we compute the forces for each body (the forces due to all the other bodies), hence we have the acceleration, and we integrate it twice to obtain both the velocity and the position.

The code

In the link below you have an implementation in C for the two-dimensional case. The code is platform-dependent, as it uses some basic graphic routines. Originally this code was written for MS-DOS, using the protected mode and the mode 13h, compiling with the DJGPP programming platform under RHIDE. As this platform is from the past, you can either run it under an emulator like DOSBox, or compile it using the SDL wrapper that I’ve written for rescuing this and other codes (only tested under GNU/Linux). The wrapper is coded in files mode13h.h and mode13h.c (the only platform-dependent code), and to switch between a native MS-DOS platform and SDL you must use the DOS symbol (when present, it assumes a native MS-DOS platform, otherwise it will use SDL).

The main algorithm is coded in file newton.c, and, as I described before, it just computes the force exerted over each body, and then velocity and position.

Example of execution when we show the relative motion following one of the bodies. The result is an ellipse.

A body is modelled with the struct body_t, which considers the position and velocity. Before starting the algorithm, you must create the bodies with the createBody() function, providing initial position, initial velocity and mass. Also, you can specify whether or not the body is fixed at its position, to simulate fictitious situations, suitable to being validated against Keppler’s laws (you can visually check how the two first laws hold when only two bodies interact). Another way to do this with real cases is by using the bodyref variable: when this variable is not null, the camera will follow the selected body.

Other useful parameters are scale_factor, to fit the required space in the screen, and calculations_per_plot, that allows you to enable frame dropping, computing with enough accuracy without having to plot all the solutions.

The source code includes an example of an Earth-Moon system, where you can check that, given realistic data for the initial values and masses, some other parameters like the orbital period match the known values.

In the video below you can watch an execution example of the three-body problem, under DOSBox emulator, where you can appreciate the chaotic behaviour.

Links

Source code

A circle algorithm for ZX Spectrum

As like many people of my generation, my first steps in computer programming were with a ZX Spectrum. Following a natural order, Sinclair BASIC was the first language I dealt with, but quite often the hardware limitations forced me to move to the Z80 assembler language. A common practice was coding some time critical routines in assembler to avoid many of the bottlenecks, like some graphic routines.

In this post, I’ll explain, as an example, how to outperform the built-in circle drawing routine.

Algorithm explanation

The original built-in algorithm (command CIRCLE in BASIC) drew the circumference with an angular sweep, from 0 to 360 degrees, and although it was implemented in machine code, it looked quite slow.

It’s possible to make a more efficient implementation without using trigonometric functions nor square roots, and even without using multiplications. The following algorithm is equivalent to the now known mid-point algorithm (but with different error formulation), and it’s based on the implicit equation of the circumference (let’s assume the circumference is centered at (0,0))

  x^2 + y^2 - r^2 = 0

To draw the contour, we establish (x,y) = (r,0) as the starting point, and draw the first octant, with angles between 0 and 45 degrees. As below 45 degrees the y component goes faster than the x one, we’ll increment the y coordinate inside a loop, and correct the path decreasing the x component whenever we consider we are “far away” from the contour. To evaluate how far we are, let’s compute the squared error

  e(x,y) = x^2 + y^2 - r^2

Let \vec{p}_{k} = (x_{k},y_{k}) be the coordinates of a pixel at a given step k; at each step we have to choose whether to move upwards or towards the upper-left pixel:

  \begin{array}{l}  \vec{p}_{1,k+1} = \vec{p}_{1,k} + (0,1)\\  \vec{p}_{2,k+1} = \vec{p}_{2,k} + (-1,1)  \end{array}

Our decision will be based on a minimum squared error criterion; for the two options above we have the following expressions for the error (in our discrete space, we measure at the middle of the pixels, so the mathematical center is at the middle of the pixel (0,0))

  \begin{array}{ll}  e_{1,k+1} = & e(\vec{p}_{1,k+1}) = x_{k}^2 + (y_{k}+1)^2 - r^2 = e_{k} + 1 + 2\,y_{k}\\  \\  e_{2,k+1} = & e(\vec{p}_{2,k+1}) = (x_{k}-1)^2 + (y_{k}+1)^2 - r^2\\  &= e_{1,k+1} + 1 - 2\,x_{k} = e_{k} + 1 + 2\,y_{k} + 1 - 2\,x_{k}  \end{array}

Notice that we can know the error at a given step if we know the error previous to it, with only a few additions. Whenever |e_{1,k+1}| < |e_{2,k+1}|, we choose \vec{p}_{1,k+1} as the next pixel, otherwise we take \vec{p}_{2,k+1}. We can simplify the comparison in the following way, knowing that e_{2,k+1} is always negative and x_{k} and y_{k} are integers

  \begin{array}{l}  |e_{1,k+1}| < |e_{2,k+1}| \,\,\,\, \Rightarrow \,\,\,\, e_{1,k+1} < -e_{2,k+1}  \,\,\,\, \Rightarrow  \,\,\,\,  e_{1,k+1} < -e_{1,k+1} - 1 + 2\,x_{k}  \\  \,\,\,\, \Rightarrow \,\,\,\, 2\,e_{1,k+1} <  2\,x_{k} - 1 \,\,\,\, \Rightarrow  \,\,\,\, e_{1,k+1} <  x_{k} - \frac{1}{2}  \,\,\,\, \Rightarrow  \,\,\,\,  e_{1,k+1} <  x_{k}  \end{array}

The initial value for the error is e_{0} = 0, as the first pixel \vec{p}_{0} = (r,0) satisfies the circle equation. Observe the error evolution in the figure beside; the corrections produced by the x component decrements tend to keep the error around 0.

The algorithm can be coded in C as follows:

int x = r;
int y = 0;
int e = 0;

for (;;) {

  drawPixel(xc + x, yc + y);

  if (x <= y) {
    break;
  }

  e += 2*y + 1;
  y++;

  if (e > x) {
    e += 1 - 2*x;
    x--;
  }
}

Once we know how to draw the first octant, the rest of the circle can be drawn applying symmetry. Instead of drawing one single pixel, we draw 8 of them:

...
drawPixel(xc + x, yc + y);
drawPixel(xc + x, yc - y);
drawPixel(xc - x, yc - y);
drawPixel(xc - x, yc + y);
drawPixel(xc + y, yc + x);
drawPixel(xc + y, yc - x);
drawPixel(xc - y, yc - x);
drawPixel(xc - y, yc + x);
...

The code

The source code is a .asm text file that you can compile with a z80 assembler like pasmo (you can use other assemblers, but some of them ignore the ORG directives, so be careful with the relocations). With pasmo you can generate directly a .tap file ready to be loaded in an spectrum emulator.

The code contains two main functions: one for drawing pixels and another one for drawing circles. Also, the file includes an execution example placed at the address 53000, that draws a set of concentric circles growing in size.

For drawing pixels, two lookup tables are used: tabpow2, with powers of 2, and tablinidx, with the order of the 192 screen lines (remember that the ZX spectrum used an interlaced access).

You can invoke the routine by placing the point coordinates at the addresses 50998 and 50999, and jumping to the address 51000

POKE 50998, 128
POKE 50999, 88
RANDOMIZE USR 51000

To invoke the circle routine, you must place the center coordinates at 51997 and 51998, and the radius at 51999, and then jump to the address 52000.

POKE 51997, 128
POKE 51998, 88
POKE 51999, 80
RANDOMIZE USR 52000

In the video below you can see the execution of the following code under Spectemu emulator, comparing the performance of the two algorithms.

10 FOR i=1 TO 20
20 CIRCLE 128, 80, i
30 NEXT i
40 RANDOMIZE USR 53000

Links

Source code (includes a .tap tape file generated with pasmo v0.5.3, and a .z80 snapshot file generated with spectemu v0.94/c).

FFT-based multiplication algorithm

We have presented in a previous post a library for operating with arbitrarily big numbers, where we had implemented some basic arithmetic operations, like addition, multiplication and division.

In this post we’ll focus on improving the multiplication algorithm efficiency through Fast Fourier Transforms.

Algorithm explanation

Let’s take two random numbers, say

  \begin{array}{c}  a = 143672,\\  b = 669381  \end{array}

The multiplication result is

  a.b = 96171307032

If we see each number as a digit sequence (digits between 0 and 9), we can think of each of those digits as the coefficients of a polynomial. The multiplication of two polynomials and the ordinary multiplication are closely related

  \begin{array}{c}  p_a(x) = x^5+4\,x^4+3\,x^3+6\,x^2+7\,x+2\\  p_b(x)  = 6\,x^5+6\,x^4+9\,x^3+3\,x^2+8\,x+1  \end{array}

  \begin{array}{c}  p_a(x) . p_b(x) = 6\,x^{10} + 30\,x^9 +  91\,x^8 + 53\,x^7 + 125\,x^6 + 150\,x^5 + \\  121\,x^4 + 90\,x^3 + 68\,x^2 + 23\,x +  2  \end{array}

Thus, we obtain the sequence [6 30 91 53 125 150 121 90 68 23 2]. As some of these digits are greater than 9, we need to apply a carry propagation correction, obtaining the sequence [9 6 1 7 1 3 0 7 0 3 2], which is the result we were looking for. The carry correction algorithm has an order of n time complexity.

It’s well known that a polynomial multiplication algorithm is equivalent to a discrete convolution over the coefficients, so we conceive now our digit sequences as discrete signals.

The discrete convolution can be computed as

  (a * b)[n] = \displaystyle{\sum_{m=-\infty}^{\infty}{a[m]\,b[n-m]}}

The classical algorithm for the convolution has an order of n2 time complexity, so there is no point at the moment. To reduce this complexity, we’ll use the convolution theorem, which tells us that the Fourier transform of a convolution is the pointwise product of the Fourier transforms.

  \mathcal{F}\{a * b\} = \mathcal{F}\{a\}.\mathcal{F}\{b\}

That is, we can compute the DFTs of the original signals, multiply them pointwise, compute the inverse DFT and apply the decimal correction. The key point is that we can use efficient FFT algorithms to compute both the DFT and the inverse DFT.

From a O(n2) order algorithm, we have moved to the computation of 2 FFTs, with order O(n.log2(n)) each one, a pointwise product, with O(n), an inverse FFT, with O(n.log2(n)), and a carry propagation correction, with O(n), so putting all together, our final algorithm will have an order of O(n.log2(n)), much better than the original algorithm.

As a final optimization, it’s possible to avoid computing one of the two FFTs if we take in account that the original signals are both real, so if we build a complex signal which carries one of the signals in its real part and the other signal in the imaginary part

  c[n] = a[n] + i\,b[n]

We can extract the original signals applying the real part and imaginary part operators

  \begin{array}{c}  a[n] = \Re\{y[n]\} = \dfrac{c[n] + c^*[n]}{2}\\  \\  b[n] = \Im\{y[n]\} =  \dfrac{c[n] - c^*[n]}{2}  \end{array}

Using the following property of the Fourier transform

  \begin{array}{c}  a^*[n] \longrightarrow \mathcal{F}\{a\}^*[-n]\\  \end{array}

We can compute now the DFT of the composite signal and extract the DFTs of the original signals

  \begin{array}{c}  \mathcal{F}\{a\}[n] = \dfrac{\mathcal{F}\{c\}[n] + \mathcal{F}\{c\}^*[-n]}{2}\\  \\  \mathcal{F}\{b\}[n] = \dfrac{\mathcal{F}\{c\}[n] - \mathcal{F}\{c\}^*[-n]}{2}  \end{array}

Where the new operations have all order O(n)

The code

Starting from the original code, some functions have been added: FFT and inverse FFT computation functions (in the files fft.h and fft.c), and basic complex arithmetic functions in the class Complex (file complex.h). Also, the memory allocation is now dynamic in order to deal with really huge numbers; furthermore, the show() function has been improved in such a way that it only prints the most and less significant digits with very big numbers.

Due to implementation limitations, remember that the total amount of considered digits must be a power of 2 (N_DIGITS property in the file config.h).

To test the new algorithm, the program evaluates the largest known prime number, the Mersenne prime, 243,112,609-1, with 12,978,189 digits (N_DIGITS has been set to 224, that is, 16,777,216). The algorithm used to obtain the Mersenne number is detailed in the source code. You will need about 20 minutes and 1,3 GB of RAM memory to run the program with the current configuration in a normal computer (2 GHz CPU). If that’s too much for you, you can compute a cheaper Mersenne prime, but remember to adjust N_DIGITS to a lower value (e.g., you can compute the number 23,021,377-1, with 909,526 digits, with N_DIGITS = 1048576, in about 35 seconds).

It’s possible to do some improvements with parallelization and a more dynamic memory allocation, but that’s beyond the range of this post. In future posts, we’ll present some algorithms to perform more complicated mathematical operations, like square roots.

Links

Source code

Discrete event simulator

This is a project that I did some years ago for performing computer simulations based on discrete events. The program pretends to be versatile and very configurable in order to allow the user to simulate anything. By the way, using Python as its extension language, the user can define very complex scenarios with event triggers and processing functions. The result is a powerful simulation engine focused in the richness of scene definition rather speed. Therefore, it is a nice tool to try experimental simulation models before building a more specific simulator program.

First at all, I will explain how the simulator works. On the one hand, there is an event scheduler that processes the existing events by chronological order. On the other hand, there is an interface between the simulator core and the extension language interpreter, so it is possible to run some code associated to an event when processed. Following that explanation come a sample script and some concluding remarks.

How it works

When we talk about computer simulators, we can distinguish between continuous or discrete models. Regarding the last ones, the operation of the modelled system is represented as a chronological sequence of events.

Every event in that sequence has a timestamp, this is, the time instant when the event occours. All the events, when generated, are pushed into a special queue type called priority queue, waiting to be processed.

The program core

The core of the program is a class called Scheduler, and its mission is to run the event actions by the order imposed by the timestamp. This class uses the STL’s template for the priority queues, called priority_queue; and defines a comparison function in order to determine, given two events e1 and e2, which of these must be processed first. Thus, the events are stored in the priority queue as they are generated and the scheduler gets them, one by one, ordered by timestamp.

In order to model the events to be processed by the scheduler, there is a C++ class called Event. That is an abstract class, with no more functionality than to serve as basis for other classes derived from Event. Therefore, the simulator can handle many types of events; this is done by extending the base class and defining new event subclases with new behaviors.

Types of events

At this time, there are two types of events defined with their correspondent classes: CallbackEvent and CREvent. The first one is just handles a callback function that is called by the scheduler in the right instant. The last one is used to schedule pieces of code from a coroutine: a special function definition that stops its execution several times along its code to simulate pauses or time lapses of any kind.

The callback-type events

A callback is a function, defined in the extension language (Python), that models the behavior of an event. The user can use an event of this type to run arbitrary code in a given simulation time instant. When the simulator clock reaches that time, the scheduler will call that function.

The coroutine-type events

A coroutine is also a function, but there is a big difference between a callback event and a coroutine event: the first one will call a function and all is done, while the last one schedules an event for each pause -i.e. each yield statement- the coroutine does.

The CREvent class models an event of this type. Each time a coroutine executes a yield statement, a new event is scheduled just to awake again the coroutine function at the end of the time lapse. Then, the coroutine yields the execution control to the simulator core. Thus, the scheduler can now continue dispatching events.

The extension language

This simulator embeds a Python interpreter in order to serve the modelling needs by the user. The Python language became very popular in the last years and now it is being used in a lot of disciplines, from scientific applications to web development. This reason and the ease of the programming in this language qualify Python as a preferent choice for the extension language of this simulator. Additionally, the API for extending and embedding the language is clear and easy to understand.

Defining the primitive functions

There are four functions defined in this version; these are C++ static functions that use an special type called PyObject for their arguments and returning value. These functions must be registered in a table by the Py_InitModule function. They will appear into the Python interpreter as a module named simul.

The four functions defined are:

  • start(): Starts the scheduler; the simulation begins.
  • schedule_process(g): Schedules the coroutine g.
  • schedule_callback(t, fn, data): Schedules the callback function fn to be called at time t and passing data as argument to it.
  • clock(): Returns the current simulation time value.

Ending words

I have developed a tool quite simple. It is a versatile discrete event simulator very suitable for building preliminary simulation scenarios with no care of optimization nor speed issues. By the way, it is also a technical demonstration of the architecture of these type of simulation tools, so the simplicity is a desired feature.

There are several software packages for doing discrete event simulations: a good example is the SimPy package. People often develop their own simulation tools; these are very efficient and specialized programs. Finally, the added value of some of the most popular open source simulation tools is debt to the contributions of a community; these projects started as a simple but extensible tool and a lot of people began using it for their work by adding custom modules.

In future posts, I will extend this simulator with some useful features and I will expose some typical uses of this kind of tools.

Resources

Source code

Arbitrary-precision fixed-point arithmetic library

I decided to write this library in 2000, after watching “The Simpsons” episode “Treehouse of Horror VI”.

In the last story of this episode, Homer Simpson jumps into a three-dimensional world. The full story, and specially this parallel world, are full of mathematical tributes: geometrical shapes, the Euler’s identity, a reference to the P versus NP problem, and so on. Among these mathematical references, a disturbing numerical formula attracted our attention:  178212 + 184112 = 192212

Homer Simpson in a three-dimensional world

Homer Simpson in a three-dimensional world

The previous affirmation is obviously a joke, since it must be false according to Fermat’s Last Theorem. But if we do the operation with a calculator with less than 10 significant digits, one can believe in its truthfulness, since both sides of the equality match up in the 9 first of their 40 digits.

The equation falseness can only be proved with a calculator capable to handle the enough amount of significant digits, and even though there are numerous tools able to perform calculus with arbitrary precision (like the phyton language, or bc, into the GNU project), I found interesting to design a single and generic library to deal with arbitrary-sized numbers, which could solve this kind of problems and another similar challenges.

In this post I will present the library code and its basic structure, and in future publications we will try to improve the algorithms efficiency and we will show some practical examples.

Library specification

A natural binary-coded decimal (NBCD) system has been chosen, with sign and module agreement, so an arbitrarily big integer can be coded with a long enough BCD digit sequence.

BCD sequence.

In order to handle fractional numbers, the library uses fixed-point arithmetic, therefore the format (the BCD sequence) is divided into integer and fractional parts. Thus, fractional numbers can be approximated with arbitrary precision by simply allocating enough memory in the fractional part. This scheme can deal with arbitrary-sized numbers, but in a static way, i.e., having previously allocated the needed memory.

Multiplication with fixed-point arithmetic.

Once the format has been chosen, let us describe some simple arithmetic operations:

  • Addition and subtraction: These operations are made by a point-wise addition or subtraction of both BCD sequences, applying a carry propagation from each digit to the next one.
  • Multiplication: The operation is performed by a typical long multiplication algorithm.
  • Division: The algorithm used is the long division algorithm.

The code

The library is coded in C++, but almost none of the language features are used in this limited version.

A number is coded in BigNumber class, which consists basically in a byte array and a sign flag. For simplicity and clarity, only one decimal digit is coded in each byte. The file bignum.h contains the declaration and bignum.cc contains some basic functions implementation.

The file config.h is a configuration point, where we can set the number of digits assigned to the integer and fractional parts in the format.

The file test.cc contains a main function with an example application: evaluates both sides of the former equation 178212 + 184112 = 192212, and verifies its falseness. At least 40 digits are needed for doing the calculus (the default configuration reserves 50 digits for both the integer and fractional parts).

The rest of the files codify several operations.

  • addsub.cc: addition and subtraction.
  • mul.cc: multiplication.
  • div.cc: division.
  • convert.cc: conversion between bignumbers and floating point numbers.

Links

Source code

Legal Notice: The Simpsons TM and (C) Fox and its related companies. All rights reserved. Any reproduction, duplication, or distribution in any form is expressly prohibited.
Disclaimer: This web site, its operators, and any content contained on this site relating to The Simpsons are not specifically authorized by Fox.

Sudoku solver program: A challenge just for fun (2).

Related to the original post, I’m going to propose another way to solve a sudoku puzzle.

This program will try to find all the puzzle solutions in a more efficient way using an algorithm based on uncertainty and uniqueness constraints.

The basic idea and the source code date from the year 2005, when the puzzle came into fashion.

The algorithm basis

In a sudoku puzzle, we have 9×9 squares, each of them belonging to 3 uniqueness constraint groups: a row, a column and a 3×3 sub-grid. There are 3×9 restriction groups in total: 9 rows, 9 columns and 9 sub-grids.

Each square has a state: a set of digits that can be fitted without violating the 3 uniqueness constraints, so the total table state will be the set of all the 81 individual states. Without any information, the initial state is the one with the maximum uncertainty: all the digits are possible in each square.

Furthermore, each square state gives us an uncertainty metric: the number of possible digits, so considering the uncertainty of all the squares, we can obtain a global uncertainty metric. Our strategy will be to reduce this uncertainty until reaching the minimum state: the solution.

We have two ways to perform this reduction:

  1. Using direct rules: given a correct state, we move to other correct states that have progressively less uncertainty. No errors are made in this one-way type of movements.
  2. Using hypothesis: given a correct state, we make an assumption which takes us to another less uncertainty state that we suppose as correct . If we reach to an impossible state, we rule out the last hypothesis. These are the kind of rules considered in the original post.

Using the second kind rules in a standalone way involves a typical brute-force algorithm. But in this algorithm, the combination of the two rule types will allow us to reduce the space into which the hypothesis are made: the less the uncertainty is, the less assumptions we’ll need.

So the basic idea is to give priority to type 1 rules before making any hypothesis, but, how are those rules?.

The basic rule is: given a restriction group C, i.e., a set of 9 squares, let A be the state of one of them (the set of possible digits of that square) and let B be the set of squares of C with their state contained in A. If the cardinal of A is the same as the one of B, then the squares of C which aren’t contained in B (C\B) can’t contain in their states any element of A, so those elements can be removed from their states, and, consequently, the uncertainty is reduced.

Simple uncertainty reduction in a 3x3 sub-grid when we place the first known numbers. As this sub-grid is a restriction group, the numbers 4, 7, 3 and 9 can only be placed at the given squares, and therefore can be removed from the others.

In other words, if we want to distribute a set of N digits (A) among the squares of a restriction group (C), and there are exactly N squares (B) that accepts only digits from that set, my N digits necessarily have to be distributed among those N squares, and the other 9 – N (C\B) squares can’t receive any of them. A simple example: if a square of a row has its state fixed to one digit (no uncertainty), that digit can’t be placed on any of the 8 remaining squares of the same row.


A more complex reduction. The four numbers 2, 3, 5 and 6 can only be placed at four squares, so those four squares cannot accept any other number.

An interesting option is that when we reduce the uncertainty state of a square, we can trigger recursively a new reduction in their related squares, i.e., the other squares belonging to its restriction groups. This results in a notable global uncertainty reduction, without the necessity of making assumptions (only with the initial information).

The algorithm is as follows:

  1. We start from the maximum uncertainty state: a void table where all the digits (1..9) are possible in all the squares.
  2. Use the initial table information to make type 1 reductions (recursively). If the puzzle is solved, we exit (many cases can be solved using only this technique).
  3. Make a possible assumption over the next square with uncertainty (which will yield new type 1 reductions recursively). We have three possible results:
    • If the puzzle is solved, we exit.
    • If we arrive at an inconsistent state, the last hypothesis was incorrect and we try the next possible one over the same square.
    • Otherwise (the puzzle is not solved yet), we jump to the point 3.

The code

The program is written in C and has been tested under GNU Linux. More details about its usage and table format can be found in the README file.

The code is quite clear, so I’ll just give some guidelines to help its understanding:

  • About the data model, the structures square_t and square_group_t model, respectively, a square and a restriction group. No more complex data structures are needed.
  • As the square state is coded as a bit field (an unsigned short where the first 9 bits represent the possible digits), the following bitwise operations can be done:
    • An uncertainty reduction operation consists in removing a bit subset from a bit set: a = a and not(b) (the set b is removed from a).
    • A set inclusion can be detected checking the equality:  a or b == b? (is a contained in b?)
    • The number of considered digits (the number of set bits) can be computed with the built-in function __builtin_popcount.
    • The digit coded by a set of bits with only one set bit can be obtained with the built-in function __builtin_ffs, wich returns the first set bit index.
  • The method uncertainty_reduction performs a type 1 rule over a square (the recursion enables to trigger other reductions in the related squares), and the method make_assumption performs a type 2 rule over a square.

Results

Whereas the original post presented us a simple algorithm where some basic AI techniques and general concepts were used, this algorithm uses a more specific and optimized approach, and therefore the performance is higher for common sudokus.

Links

First Ordered List Element.

  • Unordered List element.
  • And another.

Sudoku solver program: A challenge just for fun.

Some years ago, newspapers and magazines worldwide introduced us a novelty puzzle: the sudoku. The game gained popularity from its first publishing, in 1979, becoming an international hit in 2005.

A sudoku puzzle.

Then, we decided to design a computer program to solve the puzzle automatically, using AI techniques — just a programming challenge. In this post we show you how the program works; explaining first the theory and then how was it coded.

The algorithm basis

The method used to solve a sudoku is inspired on a classic AI technique: a heuristic search procedure.

Consider the whole space of sudoku puzzles, this is, 9×9 matrixes, satisfying or not the yet well-known game rules — let’s call ‘state’ to any of these matrixes. Consider also transitions among states representing the possible settings of a given digit for a given empty cell in a sudoku table. Therefore, there are some sequences of states —differing only a digit between a given state and the next one— that lead to the final state: the puzzle solved.

But think about it: there are millions of possibilities, and only a few ones are valid. How can we find the good ones avoiding to run thru the whole tree of states? Using an heuristic is the answer.

In order to let the algorithm to find a good sequence of states (to solve the puzzle), we must define an utility function. That functions gives us an estimation of the goodness of any possible movement. Then we can use it applied on any empty cell to know how good is to try filling that cell with any valid digit. Choosing the right cell to fill is very important and filling it should reduce the most uncertainty as possible.

So we need an heuristic with an utility function that depends on the number of possibilities going from a given state to another valid one. The function we have choosen is the one that gives the number of candidates for a given empty cell in the puzzle, so if a cell can fit any of N digits without violate the puzzle rules the function will return N.

Then we operate recursively on the puzzle matrix, choosing the empty cell with less candidates and trying to fill it with each one of those digits. If we reach a state with empty cells but no candidates we must undo some actions and keep on trying with other candidates and other cells. At the end, we get a state with no empty cell; that means the puzzle is solved.

The code

The program’s code is simple and easily understandable. There are three major functions, working together, to perform the heuristic algorithm. We defined a procedure to look for empty cells in a sudoku table, another procedure that tries to fill those empty cells with the candidates given by the third function: the heuristic itself. These functions are called, respectively, do_sudoku, try_a_number and find_candidates.

The do_sudoku function looks for blanks in a given sudoku, calculates the candidate digits for those blanks and tries to solve the puzzle filling the empty cell that has less candidates. All the work is done applying this rule recursively.

The try_a_number function iterates recursively the algorithm, trying to fit a candidate digit in a given empty cell. If it reaches a ‘dead-end’ state, returns undoing that try.

The find_candidates function looks for all fittable digits for a given cell. It checks if a digit is not being used in the same row, the same column and the same 3×3 block; in this case, it’s a valid candidate.

Results

After trying the program with several sudokus, we have observed that:

  • This is a very simple algorithm with no optimizations. With the most of the cases it finds a solution quickly. Nothing is done in order to shortcut some trivial or repetitive cases.
  • Memory consumption: Tables are backuped at each recursion. Therefore, the memory allocated is proportional to the maximum level of recursion achieved; this is equivalent to the number of empty cells at the initial state.

Links