Continuity, instantaneous change, infinitesimals
How we faced the ghosts of departed quantities
The subject of calculus, developed independently by Newton and Leibniz — accompanied by a century of raging dispute over the proportion of credit due to each of them — is concerned fundamentally with the idea of instantaneous rates of change, particularly for functions on the real numbers. The class of continuous functions becomes centrally important. But what does it mean, precisely, for a function to be continuous?
Please enjoy this free extended excerpt from Lectures on the Philosophy of Mathematics, published with MIT Press 2021, an introduction to the philosophy of mathematics with an approach often grounded in mathematics and motivated organically by mathematical inquiry and practice. This book was used as the basis of my lecture series on the philosophy of mathematics at Oxford University.
Continuity
In natural language, one distinguishes between a continuous process, which is one that proceeds in an unbroken manner, without interruption, and a process performed continually, which means without ending. For example, you might hope that your salary payments arrive continually in the coming decades, but it is not necessary that they do so continuously, since it will be fine to receive a separate payment each month.
Informal account of continuity
In mathematics, a continuous function is one whose graph is unbroken in a sense. What is this sense? Perhaps an informal continuity concept suffices at first. In my junior high school days, my teachers would say:
A function is continuous if you can draw it without lifting your pencil.
This statement conveys the idea that a jump discontinuity, as occurs in the middle of the red function, should disqualify a function from being continuous, because you would have to lift your pencil to jump across the gap. But surely it is inadequate to support precise mathematical argument; and it is inaccurate in the fine detail, if one considers that the lead of a pencil has a certain nonzero width, and furthermore, the material coming off the pencil consists of discrete atoms of carbon. So let us take it as a suggestive metaphor rather than as a mathematical definition.
In introductory calculus classes, one often hears a slightly better statement:
A function f is continuous at c if the closer and closer x gets to c, the closer and closer f(x) gets to f(c).
This is an improvement, by suggesting that one can obtain increasingly good approximations to the value of a continuous function at a point by applying the function to increasingly good approximations to the input; we view f(x) as an approximation of f(c) when x is an approximation of c.
But the definition is still much too vague. Worse, it is not quite right. Suppose you were to walk through Central Park in New York, proceeding uptown from Central Park South. As you walk north, you would be getting closer and closer (if only slightly) to the North Pole. But you would not be getting close to the North Pole, since you would remain thousands of miles away from it. The problem with the definition above is that it does not distinguish between the idea of getting closer and closer to a quantity and the idea of getting close to it. How close does it get? How close is close enough? The definition does not tell us.
To make the same point differently, consider the elevation function of a hiker as she descends a gently sloped plateau toward its edge, where a dangerous cliff abruptly drops. As she approaches the cliff's edge, she is getting closer and closer to the edge, and her elevation gets closer and closer to the elevation of the valley floor (since she is descending, even if only slightly), but the elevation function is not continuous, since there is an abrupt vertical drop at the cliff's edge, a jump discontinuity, if she were to proceed that far.
The definition of continuity
A more correct definition should therefore not speak of “closer and closer,” but should rather concern itself with exactly how close x is to c and how close f(x) is to f(c), and how these degrees of closeness are related. This is precisely what the epsilon-delta definition of continuity achieves.
Definition. A function f on the real numbers is continuous at the point c if for every positive ε > 0, there is δ > 0 such that whenever x is within δ of c, then f(x) is within ε of f(c). The function overall is said to be continuous if it is continuous at every point.
In the figure, the y values within ε of f(c) are precisely those within the horizontal green band, while the x values within δ of c are those within the vertical red band. The diagram therefore illustrates a successful choice of δ, a situation (and you will explain precisely why in the exercise questions) where every x within δ of c has f(x) within ε of f(c).
We may express the continuity of f at c succinctly in symbols as follows:
The quantifier symbol ∀ is to be read as “for all” and the symbol ∃ as “there exists.” So a function f is continuous at a point c, according to what this says, if for any desired degree of accuracy ε, there is a degree of closeness δ, such that any x that is that close to c will have f(x) within the desired accuracy of f(c). In short, you can ensure that f(x) is as close to f(c) as you want by insisting that x is sufficiently close to c.
The continuity game
Consider the continuity game. In this game, your role is to defend the continuity of the function f. The challenger presents you with a value c and an ε > 0, and you must reply with a δ > 0. The challenger can then pick any x within δ of c, and you win, provided that f(x) is indeed within ε of f(c). In the exercise questions, you will show that the function f is continuous if and only if you have a winning strategy in this game.
Many assertions in mathematics have such alternating ∀∃ quantifiers, and these can always be given the strategic reading for the game, in which the challenger plays instances of the universal ∀ quantifier and the defender answers with witnesses for ∃. Mathematically complex assertions often have many alternations of quantifiers, and these correspond to longer games. Perhaps because human evolution took place in a challenging environment of essentially game-theoretic human choices, with consequences for strategic failures, we seem to have an innate capacity for the strategic reasoning underlying these complex, alternating-quantifier mathematical assertions. I find it remarkable how we can leverage our human experience in this way for mathematical insight.
Estimation in analysis
Let us illustrate the epsilon-delta definition in application by proving that the sum of two continuous functions is continuous. Suppose that f and g are both continuous at a point c, and consider the function f + g, whose value at c is f(c) + g(c). To see that this function is continuous at c, we shall make what is known as an ε/2 argument. Consider any ε > 0. Thus also, ε/2 > 0. Since f is continuous, there is δ1 > 0 such that any x within δ1 of c has f(x) within ε/2 of f(c). Similarly, since g is continuous, there is δ2 > 0 such that any x within δ2 of c has g(x) within ε/2 of g(c). Let δ be the smaller of δ1 and δ2. If x is within δ of c, therefore, then it is both within δ1 of c and within δ2 of c. Consequently, f(x) is within ε/2 of f(c) and g(x) is within ε/2 of g(c). It follows that f(x) + g(x) is within ε of f(c) + g(c), since each term has error less than ε/2, and thus we have won this instance of the continuity game. So f + g is continuous, as desired.
This argument illustrates the method of “estimation,” so central to the subject of real analysis, by which one delimits the total error of a quantity by breaking it into pieces that are analyzed separately. One finds not only ε/2 arguments, but also ε/3 arguments, breaking the quantity into three pieces, and ε/2n arguments, splitting into infinitely many pieces, with the error in the nth piece at most ε/2n. The point is that because
one thereby bounds the total error by ε, as desired. Let me emphasize that this use of the word estimate does not mean that one is somehow guessing how much the difference can be, but rather one is proving absolute bounds on how large the error could possibly be.
The analyst's attitude can be expressed by the slogan:
In algebra, it is equal, equal, equal.
But in analysis, it is less-than-or-equal, less-than-or-equal, less-than-or-equal.
In algebra, one often proceeds in a sequence of equations, aiming to solve them exactly, while in analysis, one proceeds in a sequence of inequalities, mounting an error estimate by showing that the error is less than one thing, which is less than another, and so on, until ultimately, it is shown to be less than the target ε, as desired. The exact value of the error is irrelevant; the point, rather, is that it can be made as small as desired.
Limits
The epsilon-delta idea enables a general formalization of the limit concept. Namely, one defines that the limit of f(x) as x approaches c is the quantity L, written like this:
if for any ε > 0, there is δ > 0 such that any x within δ of c (but ignoring x = c) has f(x) within ε of L.
But why all the fuss? Do limits and continuity require such an overly precise and detailed treatment? Why can't we get by with a more natural, intuitive account? Indeed, mathematicians proceeded with an informal, intuitive account for a century and half after Newton and Leibniz. The epsilon-delta conception of limits and continuity was a long time coming, achieving its modern form with Weierstrass and with earlier use by Cauchy and Bolzano, after earlier informal notions involving infinitesimals, which are infinitely small quantities. Let us compare that usage with our modern method.
Instantaneous change
In calculus, we seek to understand the idea of an instantaneous rate of change. Drop a steel ball from a great tower; the ball begins to fall, with increasing rapidity as gravity pulls it downward, until it strikes the pavement — watch out! If the height is great, then the ball might reach terminal velocity, occurring when the force of gravity is balanced by the force of air friction. But until that time, the ball was accelerating, with its velocity constantly increasing. The situation is fundamentally different from the case of a train traveling along a track at a constant speed, a speed we can calculate by solving the equation:
For the steel ball, however, if we measure the total elapsed time of the fall and the total distance, the resulting rate will be merely an average velocity. The average rate over an interval, even a very small one, does not quite seem fully to capture the idea of an instantaneous rate of change.
Infinitesimals
Early practitioners of calculus solved this issue with infinitesimals. Consider the function f(x) = x2. What is the instantaneous rate of change of f at a point x? To find out, we consider how things change on an infinitesimally small interval — the interval from x to x + δ for some infinitesimal quantity δ. The function accordingly changes from f(x) to f(x + δ), and so the average rate of change over this tiny interval is
Since δ is infinitesimal, this result 2x + δ is infinitely close to 2x, and so we conclude that the instantaneous change in the function is 2x. In other words, the derivative of x2 is 2x.
Do you see what we did there? Like Newton and Leibniz, we introduced the infinitesimal quantity δ, and it appeared in the final result 2x + δ, but in that final step, just like them, we said that δ did not matter anymore and could be treated as zero. But we could not have treated it as zero initially, since then our rate calculation would have been 0/0, which makes no sense.
What exactly is an infinitesimal number? If an infinitesimal number is just a very tiny but nonzero number, then we would be wrong to cast it out of the calculation at the end, and also we would not be getting the instantaneous rate of change in f, but rather only the average rate of change over an interval, even if it was a very tiny interval. If, in contrast, an infinitesimal number is not just a very tiny number, but rather infinitely tiny, then this would be a totally new kind of mathematical quantity, and we would seem to need a much more thorough account of its mathematical properties and how the infinitesimals interact with the real numbers in calculation. In the previous calculation, for example, we were multiplying these infinitesimal numbers by real numbers, and in other contexts, we would be applying exponential and trigonometric functions to such expressions. To have a coherent theory, we would seem to need an account of why this is sensible.
Bishop Berkeley (1734) makes a withering criticism of the foundations of calculus.
And what are these same evanescent Increments? They are neither finite Quantities nor Quantities infinitely small, nor yet nothing. May we not call them the ghosts of departed quantities?
Berkeley's mocking point is that essentially similar-seeming reasoning can be used to establish nonsensical mathematical assertions, which we know are wrong. For example, if δ is vanishingly small, then 2δ and 3δ differ by a vanishingly small quantity. If we now treat that difference as zero, then 2δ = 3δ, from which we may conclude 2 = 3, which is absurd. Why should we consider the earlier treatment of infinitesimals as valid if we are not also willing to accept this conclusion? It seems not to be clear enough when we may legitimately treat an infinitesimal quantity as zero and when we may not, and the early foundations of calculus begin to seem problematic, even if practitioners were able to avoid erroneous conclusions in practice. The foundations of calculus become lawless.
Modern definition of the derivative
The epsilon-delta limit conception addresses these objections and establishes a new, sound foundation for calculus, paving the way for the mature theory of real analysis. The modern definition of the derivative of a function f is given by
provided that this limit exists, using the epsilon-delta notion of limit we mentioned earlier. Thus, one does not use just a single infinitesimal quantity δ, but rather one in effect uses many various increments h and takes a limit as h goes to zero. This precise manner of treating limits avoids all the paradoxical issues with infinitesimals, while retaining the essential intuition underlying them — that the continuous functions are those for which small changes in input cause only small changes in the output, and the derivative of a function at a point is obtained from the average rate of change of the function over increasingly tiny intervals surrounding that point.
An enlarged vocabulary of concepts
The enlarged mathematical vocabulary provided by the epsilon-delta approach to limits expands our capacity to express new, subtler mathematical concepts, greatly enriching the subject. Let us get a taste of these further refined possibilities.
Strengthening the continuity concept, for example, a function f on the real numbers is said to be uniformly continuous if for every ε > 0, there is δ > 0 such that whenever x and y are within δ, then f(x) and f(y) are within ε. But wait a minute — how does this differ from ordinary continuity? The difference is that ordinary continuity is a separate assertion made at each point c, with separate ε and δ for each number c. In particular, with ordinary continuity, the value δ chosen for continuity at c can depend not only on ε, but also on c. With uniform continuity, in contrast, the quantity δ may depend only on ε. The same δ must work uniformly with every x and y (the number y in effect plays the role of c here).
Consider the function f(x) = x2, a simple parabola, on the domain of all real numbers. This function is continuous, to be sure, but it is not uniformly continuous on this domain, because it becomes as steep as one likes as one moves to large values of x. Namely, for any δ > 0, if one moves far enough away from the origin, then the parabola becomes sufficiently steep so that one may find numbers x and y very close together, differing by less than δ, while x2 and y2 have changed by a huge amount. For this reason, there can be no single δ that works for all x and y, even when one takes a very coarse value of ε. Meanwhile, using what is known as the compactness property of closed intervals in the real number line (expressed by the Heine-Borel theorem), one can prove that every continuous function f defined on a closed interval in the real numbers [a,b] is uniformly continuous on that interval.
The uniform continuity concept arises from a simple change in quantifier order in the continuity statement, which one can see by comparing:
A function f on the real numbers is continuous when
\(∀ y\ ∀ε>0\ ∃δ>0\ ∀ x\ [x,y \text{ within }δ\ \implies\ f(x),f(y)\text{ within }ε],\)whereas f is uniformly continuous when
\(∀ε>0\ ∃δ>0\ ∀ x,y\ [x,y\text{ within }δ\ \implies\ f(x),f(y)\text{ within }ε].\)
Let us explore a few other such variations — which concepts result this way? The reader is asked to provide the meaning of these three statements in the exercise questions and to identify in each case exactly which functions exhibit the property:
Also requiring x ≠ y in the last example makes for an interesting, subtle property.
Suppose we have a sequence of continuous functions f0, f1, f2, ..., and they happen to converge pointwise to a limit function fn(x) → f(x). Must the limit function also be continuous? Cauchy made a mistake about this, claiming that a convergent series of continuous functions is continuous. But this turns out to be incorrect.
For a counterexample, consider the functions x, x2, x3, ... on the unit interval, as pictured here in blue. These functions are each continuous, individually, but as the exponent grows, they become increasingly flat on most of the interval, spiking to 1 at the right. The limit function, shown in red, accordingly has constant value 0, except at x = 1, where it has a jump discontinuity up to 1. So the convergent limit of continuous functions is not necessarily continuous. In Cauchy's defense, he had a convergent series ∑n fn(x) rather than a pointwise convergent limit limnfn(x), which obscures the counterexamples, although one may translate between sequences and series via successive differences, making the two formulations equally wrong. Meanwhile, Imre Lakatos (1976) advances a more forgiving view of Cauchy's argument in its historical context.
One finds a correct version of the implication by strengthening pointwise convergence to uniform convergence fn ⇉ f, which means that for every ε > 0, there is N such that every function fn for n ⩾ N is contained within an ε tube about f, meaning that fn(x) is within ε of f(x) for every x. The uniform limit of continuous functions is indeed continuous by an ε/3 argument: if x is close to c, then fn(x) is eventually close to f(x) and to fn(c), which is close to f(c). More generally, if a merely pointwise convergent sequence of functions forms an equicontinuous family, which means that at every point c and for every ε > 0, there is a δ > 0 that works for every fn at c, then the limit function is continuous.
What I am arguing is that the epsilon-delta methods do not serve merely to repair a broken foundation, leaving the rest of the structure intact. We do not merely carry out the same old modes of reasoning on a shiny new (and more secure) foundation. Rather, the new methods introduce new modes of reasoning, opening doors to new concepts and subtle distinctions. With the new methods, we can make fine gradations in our previous understanding, now seen as coarse; we can distinguish between continuity and uniform continuity or among pointwise convergence, uniform convergence, and convergence for an equicontinuous family. This has been enormously clarifying, and our mathematical understanding of the subject is vastly improved.
Continue reading more about this topic in the book:
Lectures on the Philosophy of Mathematics, MIT Press 2021
The epsilon-delta-criterion aims to formalize our intuitive understanding what it takes for (a part) of some function to be continuous, i.e, replacing the „idealized pencil“ by requiring that for some sufficiently small variation of the argument, the changes in value become arbitrarily small. As often is the case with formalisms, this gives raise to wild examples. For example, https://en.m.wikipedia.org/wiki/Thomae%27s_function is continuous almost everywhere, yet the points of discontinuity lie dense. One might reject the idea of continuity in a single point as nonsensical. Playing the devils advocate, one should only be allowed to assign this property to some section of the function graph (if my pencil just marks a point, then what is the point). I am not very literate in the history if analysis. Were there alternative proposals put forward that would judge Thomae‘s function differently?
What do you think of the definition of continuity in terms of open balls? I find that version somewhat easier to think about than the epsilon-delta version. I learned of it in D. J. Bernstein’s 1997 essay “Calculus for Mathematicians”:
> Definition 2.1. Let 𝑓 be a function defined at 𝑐. Then 𝑓 is continuous at 𝑐 if, for any open ball 𝐹 around 𝑓(𝑐), there is an open ball 𝐵 around 𝑐 such that 𝑓(𝐵) ⊆ 𝐹.