看心情写の博文 | LEON の熬夜空间

Images and Image Formation

Images

When we capture an image we sample light intensity in space (and time), and quantize the result Fidelity depends on sampling density and quantization strength (Spatial resolution and radiometric resolution)

Shading

Bidirectional Reflection Distribution Function (BRDF)

the proportion of the incoming light is reflected along outgoing ray $$ \rho(\theta_{i}, \phi_{i}, \theta_{i}, \phi_{i}) $$

Diffuse (Lambertian) Reflection

Light is reflected equally in all directions
- Dull, matte surfaces like chalk or latex paint
- Microfacets scatter incoming light randomly
- Effect is that light is reflected equally in all directions
- Common assumption in computer vision
Brightness of the surface depends on the incidence of illumination

Lambert’s law

Lambertian surface BRDF: $$ B = \rho (N \cdot S) = \rho ||S|| cos\theta $$

B: radiosity (total power leaving the surface per unit area)
ρ: albedo (fraction of incident irradiance reflected by the surface)
N: unit normal
S: source vector (magnitude proportional to intensity of the source)

Specular Reflection

Radiation arriving along a source direction leaves along the specular direction (source direction reflected about normal) in a lobe of directions
Specular: reflected energy fall

Phong Model

Ambient + Diffuse + Specular

Color

Color Perception

Rods and cones act as filters on the spectrum

To get the output of a filter, multiply its response curve by the spectrum, integrate over all wavelengths
- Each cone yields one number
- All colors are reduced to three real numbers
How can we represent an entire spectrum with 3 numbers?
- We can’t! Most of the information is lost
- As a result, two different spectra may appear indistinguishable
  - such spectra are known as metamers

Color Spaces: RGB

Any color: $$ C = r \times R + g \times G + b \times B $$

Easy for devices
Non-perceptual
- Strongly correlated channels
- Where is hue and saturation? We

Normalised RGB

How to cope with changing illumination?

E.g., (r,g,b) with doubled illumination to (2r,2g,2b)
To make many vision tasks easier, a common trick is to normalise for illumination $$ R = \frac{r}{r+g+b} \quad G = \frac{g}{r+g+b} \quad B = \frac{b}{r+g+b} $$ $$ R = \frac{2r}{2r+2g+2b} \quad G = \frac{2g}{2r+2g+2b} \quad B = \frac{2b}{2r+2g+2b} $$ For normalised RGB, common to use only two channels, e.g., (r,g), because r+g+b=1.
Shadowed and unshadowed RGB areas nearly the same
Normalised white and grey all the same
Normalized black is unpredictable due to division by small number

Color Space: HSV

Hue, Saturation, Value (Intensity)

Cameras

Pinhole Model

Captures pencil of rays – all rays through a single point
The point is called Center of Projection (COP)
The image is formed on the Image Plane
Effective focal length f is distance from COP to Image Plane

3D to 2D Projection

Points -> Points
- But projection of points on the focal plane is undefined
Lines -> Lines
- But lines through the focal point (visual rays) project to a point
Planes -> Planes
- But planes through the focal point project to lines
Many-to-one
- All points along the same visual ray map to the same point in an image
- An infinite number of 3D world scenes correspond to any single image
  - We (our brains and computer vision algorithms) need to use prior knowledge about likely scenes to disambiguate

Ideal Camera Models

Modelling Projection

Given a point P=(x,y,z) in the world, where does it appear in the image plane?

z: z distance from pinhole to object
f: z distance from pinhole to plane
y: y distance from pinhole to object
y': y distance from pinhole to plane $$ \frac{f}{-y'} = \frac{z}{y} \quad \text{ to } \quad y' = \frac{-fy}{z} $$ hence: $$ (x,y,z) \rightarrow (-f \frac{x}{z}, -f \frac{y}{z}) $$ Projection from world to imaging plane
All points along one ray project to the same point on the plane
The projection of the point at O is undefined
The image is inverted vertically and horizontally (Kepler)

Projective Geometry: Some Linear Algebra

!!important!! see slides

Filtering and Edges

Low-Level Image Processing Overview

Point Operators

Point operators are the simplest kind of image processing transform
Each pixel's output value depends only on the corresponding pixel's input value
- Examples: Brightness adjustment by adding a constant to each pixel's value
- Examples: Histogram equalization, contrast, color correction, etc.

Image Filtering

Goal
- Remove unwanted sources of variation
- Keep the information relevant for whatever task we need to solve
Approach
- Modify the pixels in an image based on a function of the local neighbourhood around each pixel

Fourier Transforms

Fourier analysis decomposes any 2D luminance image into the sum of a set of sinusoidal gratings that differ in spatial frequency, orientation, amplitude and phase. (Modify an image holistically by manipulating its power spectra.)

Image Filtering

Images as Functions

Interpret an image as a function f from R2 to R1 or R3
Consider f(x,y) gives the intensity at position (x,y)
Function is defined over a rectangle, with a finite range:$$ f: [a,b] \times [c,d] \rightarrow [0,255] $$
Color image: $$ f(x,y) = [r(x,y), g(x,y), b(x,y)] $$

Images as Discrete Functions

Images are usually digital (discrete):
- Sample the 2D space on a regular grid

Filtering as Functions of Functions

$$ f[n,m] \rightarrow \text{System S} \rightarrow g[n,m] $$ or $$ g[n,m] = S{f[n,m]} $$

Image Filtering

Compute function of local neighbourhood at each position $$ h[m,n] = \sum_{k,l} f[k,l] I[m+k,n+l] $$

h: output image
f: filter
I: input image
2d coords=k, l
2d coords=m, n

Linear Filtering

Different Choices of filter f achieve different effects
Let’s replace each pixel with a weighted average of its neighborhood
- The weights are called the filter kernel Box filter (Mean Filter) $$ \frac{1}{9} \begin{bmatrix} 1 & 1 & 1 \ 1 & 1 & 1 \ 1 & 1 & 1 \ \end{bmatrix} $$
Replaces each pixel with an average of its neighborhood
Achieve smoothing effect (remove sharp features)

Non-linear Filtering

Median filter

Replace each pixel by the median of its neighbours
A 'rank' filter based on ordering of gray levels
Outlier robustness. Better at removing salt and pepper noise

Properties of Convolution and Correlation

2D Correlation

h=filter2(f,I) or h=imfilter(I,f) $$ h[m,n] = \sum_{k,l} f[k,l] I[m+k,n+l] $$

Key properties:
- Linearity: imfilter(I, f1 + f2) = imfilter(I,f1) + imfilter(I,f2)
- Shift invariance: same behaviour regardless of pixel location imfilter(I,shift(f)) = shift(imfilter(I,f))
- Any linear, shift-invariant operator can be represented as a convolution

2D Convolution

h=conv2(f,I) $$ h[m,n] = \sum_{k,l} f[k,l] I[m-k,n-l] $$

conv2(I,f) is the same as filter2(rot90(f,2),I)
Correlation and convolution are identical when the filter is symmetric
Key properties:
- Commutative: a * b = b * a
  - Conceptually no difference between filter and signal
  - But particular filtering implementations might break this equality, e.g., image edges
- Associative: a * (b * c) = (a * b) * c
  - Often apply several filters one after another: (((a * b1) * b2) * b3)
  - This is equivalent to applying one filter: a * (b1 * b2 * b3)
  - Correlation is not associative (rotation effect)
- Distributes over addition: a * (b + c) = (a * b) + (a * c)
- Scalars factor out: ka * b = a * kb = k (a * b)
- Identity: unit impulse e = [0, 0, 1, 0, 0], a * e = a

Gaussian Kernel Filtering

$$ G_{\sigma} = \frac{1}{2\pi\sigma^2} e^{-\frac{x^2 + y^2}{2\sigma^2}} $$ Standard deviation $\sigma$: determines extent of smoothing

Remove high-frequency components from the image (low-pass filter)
Convolution with self is another Gaussian
- So can smooth with small-s kernel, repeat, and get same result as larger-s kernel would have
- Convolving two times with Gaussian kernel with std. dev. $\sigma$ is same as convolving once with kernel with std. dev. $\sigma \sqrt{2}$
Separable kernel
- Factors into product of two 1D Gaussians

Separability

The 2D Gaussian can be expressed as the product of two functions, one a function of x and the other a function of y $$ G_{\sigma}(x,y) = \frac{1}{2\pi\sigma^2} \exp\left(-\frac{x^2 + y^2}{2\sigma^2}\right) = \left( \frac{1}{\sqrt{2\pi\sigma}} \exp\left(-\frac{x^2}{2\sigma^2}\right) \right) \left( \frac{1}{\sqrt{2\pi\sigma}} \exp\left(-\frac{y^2}{2\sigma^2}\right) \right) $$

Separability means that a 2D convolution can be reduced to two 1D convolutions (one among rows and one among columns)
- (Recall associative property of convolution)
What is the complexity of filtering an n*n image with an m*m kernel? $O(n^2m^2)$
What if the kernel is separable? $O(n^2m)$

Dealing with Edges

What about missing pixel values?

the filter window falls off the edge of the image
need to extrapolate
methods (MATLAB):
- clip filter (black): imfilter(f, g, 0)
- wrap around: imfilter(f, g, ‘circular’)
- copy edge: imfilter(f, g, ‘replicate’)
- reflect across edge: imfilter(f, g, ‘symmetric’)

Edges Detection

Origin of Edges

surface normal discontinuity
depth discontinuity
surface color discontinuity
illumination discontinuity

Edge Detection

An edge is a place of rapid change in the image intensity function

Derivatives with convolution

For 2D function f(x,y) and discrete data, we can approximate using finite differences: $$ \frac{\partial f(x, y)}{\partial x} \approx \frac{f(x + 1, y) - f(x, y)}{1} $$ It shows changes with respect to x, so it draws the vertical lines in the image. On the contrary, use respect to y to draw the horizontal lines: $$ \frac{\partial f(x, y)}{\partial y}$$

Image Gradient

The gradient of an image with direction $\theta$: $$ \nabla f = \left[ \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right] $$ The gradient points in the direction of most rapid increase in intensity

$\nabla f = \left[ \frac{\partial f}{\partial x}, 0 \right]$: left to right, black to white
$\nabla f = \left[ 0, \frac{\partial f}{\partial y} \right]$: top to bottom, black to white
The gradient direction is given by: $$ \theta = \tan^{-1} \left( \frac{\partial f}{\partial y} / \frac{\partial f}{\partial x} \right) $$
The edge strength is given by the gradient magnitude: $$ |\nabla f| = \sqrt{\left(\frac{\partial f}{\partial x}\right)^2 + \left(\frac{\partial f}{\partial y}\right)^2} $$

With Noise

Due to the noise, it is hard to find the edges by derivatives, so we need to smooth it first, e.g. use kernel g To find edges, look for peaks in $\frac{d}{dx}(f \times g)$ Differentiation is convolution, and convolution is associative: $$ \frac{d}{dx} (f \times g) = f \times \frac{d}{dx} g $$ This saves one operation: Precompute the differentiated filter

Smoothing vs Derivative Filters

Smoothing filters
- Gaussian: remove “high-frequency” components; “low-pass” filter
- Can the values of a smoothing filter be negative? No
- What should the values sum to?
  - One: constant regions are not affected by the filter
Derivative filters
- Derivatives of Gaussian
- Can the values of a derivative filter be negative?
- What should the values sum to?
  - Zero: no response in constant regions

Canny Edge Detector

Filtering, followed by some post processing:

Compute x and y gradient images
Find magnitude and orientation of gradient
Non-maximum suppression:
- Thin wide “ridges” down to single pixel width
Linking and thresholding (hysteresis):
- Use a high threshold to start edge curves and a low threshold to continue them
- Two thresholds - high and low
  - Grad. mag. > high threshold: strong edge
  - Grad. mag. < low threshold: noise
  - low threshold < Grad. mag. < high threshold: weak edge
  - 'Follow' edges starting from strong edge pixels
  - Continue them into weak edges

Line Fitting

Difficulty

Extra edge points (clutter)
- Which point go with which line, if any?
Only some parts of each line are detected, some are missing
- How to fill the gaps?
Noise in measured edge position, orientation
- How to fit the true underlying edge?

Least Squares Line Fitting

Data: $(x_1,y_1), \text{...}, (x_n,y_n)$ Line equation: $y_i = mx_i + b$ Find $(m, b)$ to minimize $$ E = \sum_{i=1}^{n} (y_i - mx_i - b)^2 $$

Good
- Clearly specified objective
- Optimization is easy
Bad
- Sensitive to outliers
  - Bad matches, extra points
- Doesn’t allow you to get multiple good fits
  - Detecting multiple objects, lines, etc

Random Sample Consensus (RANSAC)

Outline:
- Choose a small subset of points ($s$) uniformly at random
- Fit a model to the subset
- Find all points that are “close” to the model (threshold $t$), and reject the rest as outliers
- Do this many times ($N$) and choose the best model
Algorithm:
1. Sample (randomly) the number of points $s$ required to fit the model
2. Solve for model parameters using samples
3. Score by the fraction of inliers within a preset threshold of the model
4. Repeat 1-3 until the best model is found with high confidence
Key parameters:
- More complex models need more $s$ to fit
- Choose $t$ to separate “noise” and true “outliers”
- Higher outlier ratio. OR complex model requiring more samples $s$
  - Less likely to randomly chose an inlying point set, more $N$ required to find a good answer
Good:
- Robust to outliers
- Applicable to a large number of objective function parameters
Bad:
- Computational time grows quickly with fraction of outliers and number of parameters
- Not good for getting multiple fits
- Various parameters to tune
Common applications:
- Line fitting
  - (Use least squares fit as a subroutine)
- Computing a homography (e.g., image stitching)
- Estimating fundamental matrix (relating two views)

Interest Points Descriptors Matching

Interest Points

Pipeline for Matching

Detection: Find a set of distinctive key points
Description: Extract feature descriptor around each interest point as vector
Matching: Compute distance between feature vectors to find correspondence

Characteristics of Good Keypoints

Repeatability: The same keypoint can be found in several images despite geometric and photometric transformations
Saliency: Each keypoint is distinctive
Compactness and efficiency: Many fewer keypoints than image pixels
Locality: A keypoint occupies a relatively small area of the image; robust to clutter and occlusion

Interest Point Detection - Harris

Corners

We should easily recognize the point by looking through a small window. Shifting a window in any direction should give a large change in intensity

Harris Detector

Eigenvalues Corner Detector

$$ H = \sum_{x,y} w(x, y) \begin{bmatrix} I_{xx} & I_{xy} \ I_{yx} & I_{yy} \end{bmatrix} $$ 2 x 2 matrix of image derivatives smoothed by Gaussian weights

First compute $I_x$, $I_y$, and $I_xI_y$ as 3 images; then apply Gaussian to each
OR, first apply the Gaussian and the compute the derivatives. To Compute the Eigenvalues: See Slides. Interpreting the Eigenvalues:
Flat: $\lambda_1$ and $\lambda_2$ are small; $E$ is almost constant in all directions
Edge: $\lambda_1 >> \lambda_2$ or $\lambda_2 >> \lambda_1$
Corner: $\lambda_1$ and $\lambda_2$ are small, $\lambda_1 \sim \lambda_2$; $E$ increases in all directions

Corner Response Function Detector

$$ R = \lambda_1\lambda_2 - \alpha(\lambda_1 + \lambda_2)^2 = \det(M) - \alpha \cdot \text{trace}(M)^2 $$ Where $$ \det (\begin{bmatrix} a & b \ c & d \end{bmatrix}) = ad - bc $$ $$ \text{trace}(A) = \sum_{i=1}^{n}a_{ii} $$

Harris Detector: Steps

Compute derivatives at each pixel and smooth with Gaussian (or smooth first then derivatives)
Compute Harris matrix H in a window around each pixel
Compute corner response function R
Threshold R
Find local maxima of response function (non-maximum supression)

Properties of the Harris Corner Detector

Intensity Invariant: Partial
Translation invariant: Yes
Rotation invariant: Yes
Scale invariant: No

Interest Point Detection - SIFT

Convolve the image with a 'blob filter' at multiple scales and look for extrema of filter response in the resulting scale space

Ideal Filter for Blob Detection:
- Laplacian (second derivative) of Gaussian Scale of Gaussian controls scale of Blob that filter responds to
In practice:
- Approximate Laplacian-of-Gaussian (LoG) by Difference-of-Gaussian (DoG)
  1. Blur image with σ Gaussian kernel
  2. Blur image with kσ Gaussian kernel
  3. Subtract 2. from 1.

Post Processing

Procedure so far:
- Returns (lots) of keypoints that are:
  - Blobs: Local maxima of LoG/DoG in both space and scale.
- But some may be low-contrast, others may lie on edges.
Post-processing:
- Prune low-contrast points:
  - Remove points with DoG response values below a threshold.
- Prune points along edges:
  - Filter based on the Harris ('cornerness') function at each detected blob

Descriptor - Patch

Patches with similar content should have similar descriptors The simplest way to describe the neighborhood around an interest point is to write down the list of intensities to form a feature vector. But this is very sensitive to all transformations: Small shifts, rotations, illumination, etc.

From Feature Detection to Feature Description

Detection should be covariant:
- features(transform(image)) = transform(features(image))
Description should be invariant
- features(transform(image)) = features(image)

Descriptor - SIFT

Full version
- Divide the 16x16 patch into a 4x4 grid of cells
- Compute an orientation histogram for each cell
- 16 cells * 8 orientations = 128 dimensional descriptor
Additional Processing
- Gaussian weighted magnitudes
- Trilinear interpolation so each gradient contributes to neighboring bins and orientations
- Threshold gradients to avoid domination by outlying gradients
- Normalize 128-dim vector to magnitude 1
Orientation Normalization
- Compute orientation histogram
- Select dominant orientation $\theta$
- Normalize: rotate to fixed orientation

Properties of SIFT

Extraordinarily robust matching technique
- Can handle changes in viewpoint
  - Up to about 30 degree out of plane rotation
- Can handle significant changes in illumination
  - Sometimes even day vs. night (below)
- Fast and efficient - can run in real time
- Various code available

Matching and Alignment

How to Align Two Images

Detect and describe keypoints in each image
Find matching keypoints
Find the transformation that brings the keypoints into alignment

Basic Keypoint Matching

RANSAC Matching

Randomly select a seed group of matches
Compute transformation for seed group (least squares)
Find inliers to this transformation: SSD(x,Hx’) < threshold.
- Keep the transformation with the largest number of inliers
- Re-compute least-squares estimate of transformation on all of the inliers

Recognition

Representations

Histogram of Gradients (HoG)

Merits:
- Relatively simple and fast (convolution and binning)
- Based on gradients: Invariant to global illumination change (unlike pixels!)
- Due to quantization of gradients, robust to small changes in viewpoint/scale/pose
- If background clutter has random orientations, it simply adds a constant to the histogram => Some robustness to clutter
Issues:
- Number of cells proportional to size of image.
  - => Different sized images have different sized dense HoG descriptors
  - => Every image has to be resampled to fixed size before HoG to get fixed length vector for downstream machine learning classifier.
- NOT robust to translation, major change of pose, viewpoint, etc.
  - (=> Not comparing like-with-like on a per-cell basis )

Image and Vision Computing复习笔记

Images and Image Formation

Images

Shading

Bidirectional Reflection Distribution Function (BRDF)

Diffuse (Lambertian) Reflection

Lambert’s law

Specular Reflection

Phong Model

Color

Color Perception

Color Spaces: RGB

Normalised RGB

Color Space: HSV

Cameras

Pinhole Model

3D to 2D Projection

Ideal Camera Models

Modelling Projection

Projective Geometry: Some Linear Algebra

Filtering and Edges

Low-Level Image Processing Overview

Point Operators

Image Filtering

Fourier Transforms

Image Filtering

Images as Functions

Images as Discrete Functions

Filtering as Functions of Functions

Image Filtering

Linear Filtering

Non-linear Filtering

Properties of Convolution and Correlation

2D Correlation

2D Convolution

Gaussian Kernel Filtering

Separability

Dealing with Edges

Edges Detection

Origin of Edges

Edge Detection

Derivatives with convolution

Image Gradient

With Noise

Smoothing vs Derivative Filters

Canny Edge Detector

Line Fitting

Difficulty

Least Squares Line Fitting

Random Sample Consensus (RANSAC)

Interest Points Descriptors Matching

Interest Points

Pipeline for Matching

Characteristics of Good Keypoints

Interest Point Detection - Harris

Corners

Harris Detector

Eigenvalues Corner Detector

Corner Response Function Detector

Harris Detector: Steps

Properties of the Harris Corner Detector

Interest Point Detection - SIFT

Post Processing

Descriptor - Patch

From Feature Detection to Feature Description

Descriptor - SIFT

Properties of SIFT

Matching and Alignment

How to Align Two Images

Basic Keypoint Matching

RANSAC Matching

Recognition

Representations

Histogram of Gradients (HoG)