Lecture 1: Multidimensional signals and systems

48 minute read

Prof. Alex Bronstein

Notation

In our course, we will deal almost exclusively with real functions. A scalar function will be denoted as f:RdR, f(x), or simply f, with d denoting the domain dimension. d-dimensional multi-indices will be denoted by n=(n1,,nd)Zd. An operator acting on a function will be denoted by H(f) or simply Hf.

Multidimensional signals and systems

A d-dimensional signal is a real- (or sometimes complex-) valued function f:RdR. In the case of d=2, we will refer to such signals as ‘‘images’’ and interpret the argument of the function, x=(x1,x2) as the spatial location of a point in an image; for d=3, we will refer to such signals as to images, interpreting the additional third dimension as ‘‘time’’. In the one-dimensional case, the scalar argument will be denoted by t and interpreted as ‘‘time’’.

Denoting the space of real-valued signals as F(Rd,R)=f:f:RdR, a system is an operator H:F(Rd,R)F(Rd,R). That is, a system H accepts a signal as the input and produces another signal as the output. For example the system receiving f(x) and producing f(xp) will be called translation (or shift) and the corresponding operator will be denoted as τp. With this notation we can write (τpf)(x)=f(xp).

Linearity

A system H is said to be a linear system is it is additive and homogenous, i.e., for every f,gF(Rd,R) and a,bR,

H(af+bg)=aHf+bHg.

The output of (almost) every linear system can be represented as a linear functional of the form

(Hf)(x)=Rdh(x,x)f(x)dx,

where the function (more generally, distribution – see in the sequel) h(x,x) is called the impulse response of the system. Informally, h(x,x) tells what will be the output of the system at point x when the input is an impulse at x.
An alternative way to view the above description is by defining the standard inner product on L2(R)F(Rd,R):

f,g=Rdf(x)g(x)dx;

since most of the time we will be interested in real-valued functions, we will often omit the complex conjugate from the second argument.

Shift invariance

A linear system H is called shift invariant (LSI for short) if it commutes with the translation operator, i.e., for every pRd

τpH=Hτp.

In other words, the output of the system given a shifted input is the same as the output of the system shifted by the same amount. Substituting a specific input f and shift p, we have the identity

Rdh(xp,x)f(x)dx=(Hf(x))(xp)=(Hf(xp))(x)=Rdh(x,x)f(xp)dx=Rdh(x,x+p)f(x)dx.

Since the latter holds for every h and p, we have h(xp,x)=h(x,x+p) at every x. In other words, h(x,x) is effectively only a function of xx, which we will continue denoting as ‘‘h’’ with some abuse of notation. This particular structure is called Toeplitz and is the multidimensional infinite support equivalent of a circulant matrix. The response of an LSI system can be therefore represented as

(Hf)(x)=Rdh(xx)f(x)dx.

The latter linear operation is called convolution and denotes as hf. It is straightforward to show that hf=fh. For real-valued functions, convolution of h with f can be also seen as (hf)(x)=τxˉh,f, where ˉh(x)=h(x).

Fourier transform

Harmonics

A multi-dimensional harmonic is the function ϕξ(x)=e2πiξx with the d-dimensional real parameter ξ. Note that the function has constant values along lines perpendicular to the direction of ξ, and makes a full period every 1/|ξ| units of space in the direction of xi. In other words, ϕξ looks like a periodic wave in the direction of ξ with the wavelength inversely proportional to |ξ| (or frequency proportional to |ξ|). For this reason, ξ is usually referred to as the spatial frequency measured in inverse length units.
Note that ϕξ(p)=ϕp(ξ), meaning that we can freely exchange the roles of the space location x and the spatial frequency ξ.

Fourier transform

Orthogonal projections onto the set of harmonics is a function of the argument ξ,

F(ξ)=f,ϕξ=Rdf(x)e2πiξxdx,

is called the (forward) Fourier transform of f. We will denote it as the action of the operator F on f or simply by the capital F. The argument of F is conventionally referred to as frequency, while the argument of f is called space.

The inverse Fourier transform is defined as

f(x)=(F1F)(x)=RdF(ξ)e2πiξxdξ.

In what follows, we prove several very useful properties of the Fourier transforms.

Tensor products

Let f1,,fd:RR be functions of a single variable, and let us define the tensor product (f1fd)(x)=f1(x1)fd(xd). Applying the Fourier transform yields

(F(f1fd))(ξ)=Rdf1(x1)fd(xd)e2πi(x1ξ1++xdξd)dx1dxd=Rf1(x1)e2πix1ξ1dx1Rfd(xd)e2πixdξddxd=(Ff1)(ξ1)(Ffd)(ξd).

We can write this result in the following convenient notation:

f1fdF F1Fd

A useful example is the d-dimensional box function that can be defined as a tensor product of one-dimensional rectangular functions: box=rectrect, where

rect(x)={0:|x|>121:|x|<12.

Since (Frect)(ξ)=sinc(ξ)=sin(πξ)πξ, we have (Fbox)(ξ)=sinc(ξ1)sinc(ξd).

Translation and modulation

Let pRd and f:RdR. Then,

(F(τpf))(ξ)=Rdf(xp)e2πiξxdx=Rdf(y)e2πiξ(y+p)dy=e2πiξpRdf(y)e2πiξydy=e2πiξp(Ff)(ξ)

We can write this result as the duality between translation and modulation (i.e., multiplication by a harmonic)

τpfF ϕpF

Obviously, the same relation holds when swapping the space and frequency domains. ϕpfF τpF. In other words, shift in the space domain results in modulation (the addition of linear phase) in the frequency domain and modulation in the space domain results in shift in the frequency domain.

Convolution

Let pRd and f,h:RdR. Then,

(F(hf))(ξ)=Rd(Rdh(xy)f(y)dy)e2πiξxdx=RdRdh(x)f(y)e2πiξ(x+y)dxdy=Rdh(x)e2πiξxdxRdf(y)e2πiξydy=(Fh)(ξ)(Ff)(ξ)

In other words, convolution is dual to multiplication

hfF HF

Obviously, the relation hfF HF also holds.

The fact that convolution (a Toeplitz operator) becomes a pointwise product (a diagonal operator) under the Fourier transform is the manifestation of the fact that the harmonics form an eigenbasis for every linear shift invariant system (or, equivalently, the Fourier transform diagonalizes every linear shift invariant operator). To see that, observe that when a harmonic ϕξ is given as the input to a system described by the impulse response h, the output,

(hϕξ)(x)=(ϕξh)(x)=Rdϕξ(xx)h(x)dx=Rdh(x)e2πiξ(xx)dx=Rdh(x)e2πiξxdxe2πiξx=F(ξ)ϕξ(x),

meaning that ϕξ is an eigenfunction of the system corresponding to the eigenvalue F(ξ). Note that the spectrum is continuous and depends on the specific h, while the eigenfunctions are independent of h.\footnote{It is a general fact from linear algebra that operators can be jointly diagonalized if and only if they commute with each other.}

Stretching

Let ARd×d be a regular (i.e., non-singular) matrix and let us define the stretching operator SA:f(x)f(Ax). Then,

(F(SAf))(ξ)=Rdf(Ax)e2πiξxdx=Rdf(y)e2πiξA1ydy|detA|=(Ff)(Aξ)|detA|.

In other words,

SAfF SAF|detA|

This identity will reappear when talking about dual lattices in relation to sampling.

A useful example of the stretching property is the multi-dimensional Gaussian. Let us first remind ourselves (or accept without a proof) that in the one-dimensional case ex2F eπξ2. The tensor product property immediately yields exxF eπξξ in the case of unit covariance (a.k.a. normal) multidimensional Gaussian. A general Gaussian exC1x with the covariance matrix C can be obtained by stretching the normal Gaussian with the symmetric inverse square root matrix A=C12, since (Ax)(Ax)=xAAx=xC1x. The stretching property of the Fourier transform tells us that the corresponding stretch in frequency is by A=C12 with furher scaling of the transform by the determinant of C12:

exC1xF 1detCeπξCξ

The covariance matrix C tells us how much the signal is concentrated in various spatial directions; its determinant can be used to quantify this spread: when detC is small, the signal is localized in space. In the frequency domain, the signal remains Gaussian with the inverse covariance. Since both C and C1 have the same eigenbasis but reciprocal eigenvalues, observe that the more the signal is concentrated in space in a certain direction, the more it is spread in frequency in the corresponding direction, and vice versa. The quantity πdetC1 measures the spread of the signal in frequency. Since detCπdetC1=π, we conclude that the Gaussian cannot be simultaneously localized both in space and frequency. This conclusion holds for every signal\footnote{Actually, the Gaussian achieves the best possible simultaneous localization in both domains.} – a result known as the uncertainty principle with profound implications on why the Universe looks like it looks.

Rotation

An important particular case of the stretching property is when A is an orthonormal matrix representing rotations (and, possibly, reflections). Let us define the rotation operator as RR:f(x)f(Rx), where R is a rotation matrix. Since orthonormal matrices satisfy R1=R and detR=±1, the stretching property reduces to RRfF RRF. In other words, the Fourier transform commutes with rotation:

FRR=RRF

Projection

Let us define the projection operator along the last spatial coordinate as

P:f(x)Rf(x1,,xd1,xd)dxd.

Note that projection maps functions from F(Rd,R) to F(Rd1,R). Interpreting f as a d-dimensional probability density function, the latter operation can be interpreted as marginalization with respect to xd. Invoking the (d1)-dimensional Fourier transform on Pf yields

(F(Pf))(ξ)=Rd1(Rf(x1,,xd1,xd)dxd)e2πi(x1ξ1+xd1ξd1)dx1dxd1=Rdf(x)e2πixξdx|ξd=0=(Ff)(ξ1,,ξd1,0).

Defining the slice operator Q:f(x)f(x1,,xd1,0) from F(Rd,R) to F(Rd1,R) we can express the latter result more compactly as FP=QF. Note that while on the right hand side F denotes a d-dimensional Fourier transform, on the left hand side it stands for the (d1)-dimensional counterpart.

Applying a rotation operator on the right to both sides of the identity yields

FPRR=QFRR=QRRF

where in the last passage we used the commutativity of the Fourier transform with rotation. We interpret the composition PRR as a general projection operator, PR, that first rotates the function and then project along the last axis. This essentially allows to project the function along any direction. In the same manner, we interpret QRR as a general slice operator, QR, slicing the function along an arbitrary direction. This general result is known as the slice-projection theorem that in our notation can be expressed as

FPR=QRF

An extremely important example where this result is used is computerized tomography (CT). In an idealized scenario, let us assume a d=2 dimensional world, in which we are interested in measuring the density of a slice of the human body, denoted by the function f(x,y). Being unable to actually slice it (without going to jail), the next best thing we can do is to irradiate it with penetrating radiation (x-rays) from the side. Let us assume that an x-ray source sends parallel rays from one side of the body to the other side along the vertical (y) direction, where a linear detector measures the intensity profile I(x). According to the Beer-Lambert law of light attenuation, the measured intensity is given by

I(x)=I0exp(Rf(x,y)dy),

where I0 is the emitted intensity. Taking the logarithm yields

p(x)=logI(x)I0=Rf(x,y)dy=Pf.

Rotating the emitter-detector setup around the body yields a collection of projections Pθf (note that in two dimensions, the rotation matrix is parameterized by a single angle θ). The function (x,θ)(Pθf)(x) is often referred to as the Radon transform or the sinogram of f. The slice projection theorem tells us that the one-dimensional Fourier transform of each such projection Pθf yields a correspondingly directed slice Qθf of the two-dimension Fourier transform of the unknown function f. Collecting enough projections, it is ‘‘just’’ a matter of numerics to estimate the said transform and invert it, yielding what we see on the screen as a slice of a CT scan.