The Variational Calculus: Part 2

The Euler-Lagrange Equation

In the previous post, we introduced the variational calculus and took the first steps towards deriving necessary and sufficient conditions for establishing local extrema of functionals. In this post, we will build on our previous work and derive the Euler-Lagrange equation, then subsequently apply it to some physics problems.

Recall, we left off with the following equation for the first variation of a functional $$\delta J(\eta, y) = \int_{a}^{b} \left[ \frac{\partial f}{\partial y} \eta + \frac{\partial f}{\partial y’} \eta’ \right] dx.$$ One annoyance here is the dependence of $\delta J$ on $\eta’$. We can eliminate this dependence by using integration by parts. As a refresher, the formula for integration by parts is $$\int u\ dv = uv - \int v\ du.$$ Setting $dv = \eta’$ and $u = \frac{\partial f}{\partial y’}$, we get $$\int_a^b \frac{\partial f}{\partial y’}\eta’ = \eta \frac{\partial f}{\partial y’}\bigg|_a^b - \int_a^b \eta \frac{d}{dx}\frac{\partial f}{\partial y’}\ dx = - \int_a^b \eta \frac{d}{dx}\frac{\partial f}{\partial y’}\ dx$$ Note, as shown here, the $uv$ term evaluates to 0 because $\eta(a) = \eta(b) = 0$. This is the boundary condition of $\eta$ (being a perturbation) that we imposed in the previous post.

Substituting back into our equation for $\delta J$, we get $$\delta J(\eta, y) = \int_a^b \eta\left[ \frac{\partial f}{\partial y} - \frac{d}{dx}\frac{\partial f}{\partial y’} \right]\ dx.$$

Recall, in the last post we showed that the first variation of a functional must vanish for all $\eta$ if $y$ is an extremum of $J$. We can use this fact to draw an analogy with the gradient in vector calculus. Recall from vector calculus that $\mathbf{x}$ is an extremum of a function $f$ if $\nabla f(\mathbf{x}) = 0$. Equivalently, $\mathbf{x}$ is an extremum if $\langle \mathbf{v}, \nabla f(\mathbf{x}) \rangle = 0$ for all $\mathbf{v}$, i.e. the directional derivative of $f$ in all directions is zero1.

In the same way, we require that $\delta J(\eta, y) = 0$ for all $\eta$ for an extremum to exist at $y$. Here, $\eta$ takes the place of $\mathbf{v}$ in the vector calculus setting and $$E(x) = \frac{\partial f}{\partial y} - \frac{d}{dx}\frac{\partial f}{\partial y’}$$ takes the place of $\nabla f$. In fact, we can regard $E(x)$ and $\eta$ as elements of the Hilbert space $L^2[a, b]$ with the inner product given by $$\langle \eta, E \rangle = \int_a^b \eta(x) E(x)\ dx$$ Thus, the condition that $\delta J(\eta, y) = 0$ for all $\eta \in H$ is equivalent to the condition that $\langle \eta, E \rangle = 0$ for all $\eta \in H$. As in the vector case, we are requiring that the directional derivative of $J$ in the “direction” of every function (i.e. perturbation) $\eta$ is zero. This is essentially the directional derivative in function space. We will soon show that this condition is equivalent to $E(x) = 0$ for all $x \in [a, b]$, a condition we call the Euler-Lagrange equation.

Lemma Suppose $\langle \eta, g \rangle = 0$ for all $\eta \in H$. If $g: [a, b] \to \mathbb{R}$ is a continuous function, then $g(x) = 0$ on $[a, b]$.

Proof: Suppose there’s some $c \in [a, b]$ for which $g \neq 0$. Assume wlog that $g(c) > 0$ and $c \in (a, b)$. Then, by continuity, there exist $\alpha, \beta$ such that $a < \alpha < c < \beta < b$ and $g(x) > 0$ for all $x \in (\alpha, \beta)$. It’s possible to construct a function $\nu \in C^2[a, b]$ such that $\nu(x) > 0$ for $x \in (\alpha, \beta)$ and $\nu(x) = 0$ for $x \in [a, b] - (\alpha, \beta)$ (it’s basically a bump function, consult any calculus of variations text for more details; it’s boring to construct, so I won’t do so here). Therefore, $\nu \in H$ and $$\langle \nu, g \rangle = \int_a^b \nu(x)g(x)\ dx = \int_{\alpha}^{\beta} \nu(x)g(x) > 0$$ which contradicts the assumption that $\langle \eta, g \rangle = 0$ for all $\eta \in H$. Thus, $g = 0$ on $(a, b)$ and so too on $[a, b]$ by continuity.

This proves that if $y$ is an extermal of $J$, then $E = 0$ on $[a, b]$. We call the condition $E = \frac{\partial f}{\partial y} - \frac{d}{dx}\frac{\partial f}{\partial y’} = 0$ on $[a, b]$ the Euler-Lagrange equation. It is a second-order ordinary differential equation satisfied by any smooth extremal. To solve this ODE, we use the boundary conditions $y(a) = y_a$ and $y(b) = y_b$.

Example: Minimizing Arc Length

OK, that was a whole bunch of machinery to set up. Let’s finally put it to good use! Here, we solve a simple problem: that of minimizing the arc length of a curve in the plane. More specifically, we restrict to paths connecting $a = (0, 0)$ to $b = (1, 1)$. The formula for arc length, as you may recall, is given by the functional $$J(y) = \int_a^b \sqrt{1 + y’^2}\ dx.$$

Computing the Euler-Lagrange equation, we get $$\frac{d}{dx}\frac{\partial f}{\partial y’} - \frac{\partial f}{\partial y} = \frac{d}{dx} \left( \frac{y’}{\sqrt{1 + y’^2}} \right) - 0 = 0$$ In other words, we require that $$\frac{y’}{\sqrt{1 + y’^2}} = C$$ for some constant $C$. Rearranging things, gives $$y’^2(1 - C^2) = C^2 \implies y’ = \pm \frac{C}{\sqrt{1 - C^2}}.$$ We can just substitute $c_1 = \pm \frac{C}{\sqrt{1 - C^2}}$ to get $y’ = c_1$. Thus, an extremal of $J$ must have the form $$y(x) = c_1x + c_2,$$ where $c_2$ is a constant of integrate. However, we can use the boundary condition $y(0) = 0$ to find that $c_2 = 0$. Similarly, $y(1) = 1$ gives $c_1 = 1$. Thus, the shortest path (the one with minimal arc length) connecting $(0, 0)$ to $(1, 1)$ is the straight line $y = x$.

This is a simple example, but it illustrates the power of the variational calculus.


  1. In case you need a reminder, the directional derivative of $f$ in the direction of $\mathbf{v}$ is given by $\mathbf{v} \cdot \nabla f = \langle \mathbf{v}, \nabla f \rangle$. ↩︎

Daniel McNeela
Daniel McNeela
Machine Learning Researcher and Engineer

My research interests include patent and IP law, geometric deep learning, and computational drug discovery.