Biometrics Note4

Gait Biometrics



We want variance within subject << variance between subjects.
1994, idea on gait. 1999 first book on biometrics and supported by DARPA. 2005 first book on gait.

Biometrics and gait

Gait is non-contact and uses sequences.
Advantages: perceivable at distance and hard to disguise.
Potential applications: security/surveillance (监视), immigration, forensics, medicine
Other application: Moving objects
Related fields: animation, tracking
As a biometric, gait is available at a distance when other biometrics are obscured or at too low resolution.

Gait and medicine

Many medical studies concern pathological (病态的) gait.
Point markers attached to a subject.
Can use video, optoelectronics (光电子学) moving light displays, electrogoniometers (电子侧向仪).


Early UCSD Data

Use image frame extract subject.
6 subjects; 7 sequences. Use Sony camera. Walk in circular track.

HiD (NIST) Database

Acquired by NIST on DARPA human ID at a Distance program. 122 subjects. 30 fps video.
Include change in surface, shoe and luggage. Different Views, Different surfaces.

Southampton Data

100 subjects.
Filmed indoors and outdoors.
Included covariate data for 12 subjects.

CASIA Database

124 subjects, 11 viewpoints.

Performance Evaluation of Vision-based Gait Recognition using a Very Large-scale Gait Database

From Osaka Univ, Japan. Gathered large database > 1000 subjects, at exhibition.


Consistent with many other studies. First gait biometrics paper had 90% CCR.
IdentificationRate to Rank;
False Rejection Rate to False Acceptance Rate.

Gait-based Age Estimation using a Whole generation Gait Database

Techniques for gait extraction and description

  1. Silhouette description (Many)
    • Established statistical analysis
    • Temporal symmetry
    • Velocity moments
      • Extension of spatial moments
      • Applied to silhouettes
      • Selected by ANOVA
      • A_{mn\mu \gamma}=\frac{m+1}{\pi}\underset{i=2}{\overset{images}{\Sigma }}\underset{x,y}{\overset{}{\Sigma }} U( i,\mu ,\gamma ) S( m,n) P_{i_{x} ,y}
      • 3 moments for visualisation; subjects are clusters of 4
    • Unwrapped silhouette
    • Average Silhouette
      • Most popular technique for gait representation
      • Simple and effective
      • Also called gait energy image
      • New form is gait entropy image
      • Background is taken from each frame and pixels thresholded resulting in a binary image
      • Normalise silhouettes by height to account for distance
      • Add all silhouettes together and divide by the number of frames
      • Resulting image is the signature
    • HiD Baseline Analysis
      • Form silhouette (background subtraction)
      • Detect gait periods
      • Estimate correlation of frame similarity between sequences
      • Similarity is median of max correlation between gallery and probe sequences
    • HMM Analysis
      • Form silhouette
      • Use contour width
      • Captures structure and dynamics
      • Capture gait information using Hidden Markov Model
    • Finding moving objects
      • Describe shape by Fourier descriptor
      • Include velocity in accumulation
      • Extract moving continuous shape
      • Include trajectory described by Fourier descriptor
      • Allow for arbitrary deformation
  2. Modelling Movement (few)
    1. Pendular thigh motion model

      Modeling the Thigh’s Motion: \phi (t)=a_0+\underset{k=1}{\overset{N}{\Sigma }}[b_kcos(k\omega _0t+\Psi
      Extended pendular thigh-model, based on angles
      Uses forced oscillator (振荡器)/bilateral symmetry/phase coupling (双边对称/相位联轴器)
    2. Translation to the real world
      1. Finding subjects in outdoor imagery: invariance to background etc.
      2. Analysing covariate structure: invariance to factors which affect gait
      3. Understanding feature space: invariance to recognition methodology
      4. Analysing other views: invariance to viewpoint
      5. Forensic analysis: can criminals be recognised
    3. Anatomically-guided skeleton
    4. 3D Recognition – Marionette Based
    5. Analysing the Effects of Time

Biometrics Note3

Face and Fingerprint Biometrics


Fingerprint data

  • Offline acquisition: Ink or latent
  • Live acquisition: optical sensing

Fingerprint pattern

Fingerprint is a set of ridges (sweat bands)

whorl loop arch

Minutiae are the terminations (end points) or the bifurcation (splitting points)

  • High: sweat pores
  • Medium: incipient ridges
  • Low: minutiae (and creases)

Enhancement and Image Processing

Basic Processing

basic enhancement: histogram equalisation make fingerprint comfortable for human vision
basic filtering: median filtering denoise.
basic feature extraction: thresholding binarisation images.

Enhancement examples:

Input image \rightarrow Normalisation \rightarrow Orientation Image Estimation \rightarrow Frequency Image Estimation \rightarrow Region Mask Generation \rightarrow Filtering \rightarrow Enhanced Image 

Recognition approaches

  • Minutiae: choose maximum alignment of minutae pairings (dominant technology)
    • detection: after ridge detection there is ridge tracking and
      • termination minutiae, bifurcation minutia, false minutia
    • A lot minutiae matching approaches proposed
  • Correlation: maximise match between fingerprint images
    • Avoid “core” and minutiae detection
    • use pattern
    • Holistic (全面的) fingerprint recognition using Gabor filters
    • Gabor wavelets gw2D(x,y)=\frac{1}{\sigma\sqrt\pi}e^{-\frac{(x-x_0)^2+(y-y_0)^2}{2\sigma^2}}e^{-j2\pi f_0((x-x_0)cos(\theta)+(y-y_0)sin(\theta))}
  • Ridges: maximise match of selected ridges features, such as local orientation, frequency, shape, and texture
    • Texture based matching
    • Pressure correction
    • 3D fingerprint scanning


  • Latent fingerprints
  • Forensics
  • Determine gender possible
  • fingervein fuse with fingerprint

Face Recognition


  • Holistic: image as a whole
  • Model based: Recognition by parts


  • Calculate a set of weights based on the input image and the M eigenfaces by projecting the input image onto each of the eigenfaces
  • Determine if the image is a face at all by checking to see if the image is sufficiently close to “face space”
  • If it is a face, classify the weight pattern as either a known person or as unknown
  • (Optional) Update the eigenfaces and/or weight patterns
  • (Optional) If the same unknown face is seen several times, calculate its characteristic weight pattern and incorporate into the known faces
  • using PCA or SVD to reduce dimensionalities


  • Collect image dataset
  • Calculate the matrix L, find its eigenvectors and eigenvalues, and choose the M eigenvectors with the highest associated eigenvalues
  • Combine the normalized training set of images to produce the eigenfaces
  • For each known individual, calculate the class vector by averaging the eigenface pattern vectors per subject. Choose thresholds for face class, and face space
  • For new face image calculate pattern vector and distances to classes, to face space. If the minimum distances < threshold, face class identified
  • If the new image is known, this image may be added to the original set of familiar face images, and the iegenfaces may be recalculated


  • af

Face Detection — Haar wavelets 

face image + template for eyes and nose bridge = best match of template to image

and then Adaboost = classification + feature set selection.

Face Recognition in changing illumination

Enhanced local texture feature sets for face recognition under difficult lighting conditions.


Raw Images \rightarrow Illumination Normalization

\rightarrow Robust Feature Extraction \rightarrow Subspace Representation \rightarrow Output

Input \rightarrow Gamma Correction \rightarrow DoG filtering \rightarrow Masking \rightarrowEqualization of Variation \rightarrow Output

Challenges in face recognition

  • Lighting/Illumination
  • Viewpoint
  • Occlusion
  • Resolution
  • Facial expression
    • Mental States; Non-Verbal Communication; Physiological Activities; Verbal Communication
  • Ageing
    • Face Aqusition
    • Face Normalization / Face segmentation
    • Face Feature Extraction: Deformation Extraction, Motion Exraction
    • Facial Feature Representation
    • Facial Expression Classification
  • Make-up/ cosmetics
  • 3D recognition


Biometrics Note1


Biometrics is about measuring unique and permanent personal identifiers.


An example, as a biometric, gait is available at a distance when other biometrics are obscured or at too low resolution.
A biometric is a unique personal identifier.

Biometric consideration

  • Vast range of biometrics now available: face, hand, finger, eye, thermal
    • Universality: we all have that trait
    • Acceptability: do we happy for it to be measured
    • Uniqueness: is it unique to an individual
    • Repeatability: are the measurements the same at different times
  • Basic considerations: universality, acceptability, uniqueness, repeatability
  • Application potential: contact, performance, circumvention(欺骗), interaction
  • Bussiness sectors: immigration, security, airports, banking, forensic, ‘domestic’


  • Enrolment: Data \rightarrow Processing \rightarrow Features \rightarrow Database
  • Verification: Data \rightarrow Processing \rightarrow Features \rightarrow Using Database to Match \rightarrow Accept
  • Recognition/Identification: Data \rightarrow Processing \rightarrow Features \rightarrow Using Database to Match \rightarrow Identity

Biometric Measurements

  • Euclidean Distance often used
  • We want (Variance within subject) << (Variance between subjects)
  • Manhattan distance
  • Mahalanobis d_{MAH} =\sqrt{( p-\mu )^{T} \Sigma ^{-1}( p-\mu )}

Performance measures

  • Verification
    • Confirmation of identity
    • Is the subject in the database
    • inter- and intra-class variation
      • Distance between same subject is small – good
      • Distance between different subjects is large – good
    • True and False accept rate; True and False reject rate
    • True and False Positives
    • Need to choose a threshold
  • Recognition
    • Can the subject’s identity be determined
    • Correct Classification Rate (CCR)
      • CCR=\frac{\Sigma subjects\ correctly\ recognised}{\Sigma subjects}
    • Rank
      • ordered list of subject recognition: closest = rank1; furthest = rank N
      • Correct Recognition Rate (CRR) = \frac{\Sigma Correct\ subjects\ rank1}{\Sigma subjects}
  • Others
    • Daugman’s Decidability: d
    • F_ratio = \frac{\mu_{genuine}-\mu{imposter}}{\sigma_{genuine}+\sigma{imposter}}
    • If distributions are Gaussian ERR = \frac{1}{2}-\frac{1}{2}erf(\frac{F_ratio}{\sqrt{2}}) where erf(x)=\frac{2}{\sqrt{\pi}}\int ^{x}_{0} e^{-t^{2}} dt
    • Detection Error Tradeoff (DET)
    • Precision
    • Recall
    • Failure to enrol
  • Terms Related
    • Detection
    • Liveness (活性检测)
    • Spoofing (冒名顶替)
    • Goats (替罪羊)
    • Response time
    • Covariates
    • Doppelganger (多重身)

Security system should have high False Reject Rate;
Banking should have high False Accept Rate.

Human vision and computer vision

  • Conceptually similar to CV — Sensor, storage and processing
    • Differences in sensing
    • Differences in measurement
    • Differences in processing
    • Differences in training
  • Approaches
    • Model based measures
      • features
      • points
    • Image based measures
      • transforms
      • mappings



Advanced Machine Learning Note 2

 Support Vector Machines

Support vector machines, when used right often have the best generalisation results. Typically used on numerical data, but can have been adapted to text, sequences, etc. Often be the choice of small dataset.
SVMs classify linearly separable data. To increase the likelihood of linear-separability we often use a high-dimensional mapping:

x=( x_{1} ,\ x_{2} ,\ ...,\ x_{p}) \rightarrow \underset{}{\overset{\rightarrow }{\phi }}( x) =( \phi _{1}( x) ,\ \phi _{2}( x) ,\ ...,\ \phi _{m}( x)) ,\ x\gg p
Finding the maximum margin hyper-plane is time consuming in “primal” form if m is large.
We can work in the “dule” space of patterns, then we only need to compute dot products, \underset{}{\overset{\rightarrow }{\phi }}( x_{i}) \centerdot \underset{}{\overset{\rightarrow }{\phi }}( x_{j}) =\underset{k=1}{\overset{m}{\Sigma }}\underset{}{\overset{}{\phi _{k}}}( x_{i})\underset{}{\overset{}{\phi _{k}( x_{j})}}.

Kernel trick

If we choose a positive semi-definite kernel function K(x,y) then there exists functions \phi_{k}(x), such that, just like an eigenvalue decomposition of a matrix
K(x_{i}, x_{j}) = \overset{\rightarrow}{\phi}(x_{i})\centerdot\overset{\rightarrow}{\phi}(x_{j}).
Never need to compute \overset{\rightarrow}{\phi}(x_{i}) explicitly as we only need the dot-product to compute maximum margin separating hyperplane.

Kernel function

Kernel functions are symmetric functions of two variable. Not all symmetric functions are positive semi-definite.
* Quadratic kernel: K(x,y) = (x^Ty)^2
* Gaussian (RBF) kernel: K(x,y)=e^{-\gamma||x-y||^2}

Non-lilnearly Sepearation Data

K(x_1, x_2)=(\sqrt 2x_1y_1, x_{1}^{2}, y_1^2)\begin{pmatrix}\sqrt{2} x_{2} y_{2} & \\x^{2}_{2} & \\y_2^2 &\end{pmatrix}=2x_1x_2y_1y_2+x_1^2x_2^2+y_1^2y_2^2=(x_1x_2+y_1y_2)^2 = (x_1^Tx_2)^2


 Computing the Maximum-Margin Hyperplane

We use the kernel trick the time to compute the solution to the quadratic programming problem is pN^3 where N is the number of training examples and p is he number of features.
SVMs rely on distances between data points. If we rescale some features but not other — giving different maximum-margin hyperplanes.

 Soft Margins

Relax the margin constraints by introducing *slack variables*, \xi_k\geqq0,
y_k(x_k^T\omega -b)\geqq 1-\xi_k.
Minimise \frac{||\omega||^2}{2} + C\underset{k=1}{\overset{n}{\Sigma }}\xi_{k} subject to constraints, large C punishes slack variables.

Optimising C
Optimal C values changes by many orders of magnitude from 2^{-5}-2^{15}. Typically optimised by a grid search.

 Choose right Kernel Function

For numerical data people tend to look at using no kernel (linear SVM), a radical basis function (Gaussian) kernel or polynomial kernels.
Kernel’s often come with parameters, the popular radial basis function kernel K(x,y) = e^{-\gamma||x-y||^2}. The optimal \gamma values range over 2^{-15} - 2^3

Reading: On soft biometrics


The field of soft biometrics was originally aimed to augment the recognition process by fusion of metrics that were sufficient to discriminate populations rather than individuals. A further branch of this new field concerns approaches to estimate soft biometrics, either using conventional biometrics approaches or just from images alone.The approaches lead to a new type of recognition, and one similar to Bertillonage which is one of the earliest approaches to human identification.


Unlike Bertillonage, soft biometrics is unlikely to be superseded as it can be used to reinforce biometric identification as well as be deployed alone in the analysis of data that conventional biometrics cannot handle or with invariant attributes that conventional biometrics cannot even approach.
The information to conduct a fingerprint-based background check. The information includes sex/gender, race, height, weight, eye and hair color. These factors are primary human factors that we shall see can be identified from image data that are now termed soft biometrics.
The original formulation concerned measures that can be used to aid recognition rather than for identification. The measures were suited to the discrimination between classes rather than individuals, and could be used to buttress recognition performance. Given the pervading need for security in modern environments, there has been a concerted interest for the use of soft biometrics largely to handle the low quality of video images where traditional biometrics can be applied.
This paper review the approaches that have been made in this new field. This provides an updated definition of the term soft biometrics and describe its advantages in biometric recognition.


A variety of factors motivated a precise system for identification in 19th century France after the abandonment of branding criminals and a system of deportation. Bertillon’s system of anthropometrics, eponymously Bertillonage, outlined the tools and techniques for the careful measurement of:

  • Physical features including length/width of heaad, lengths of certain fingers and the dimensions of the feet, arm right ear and standing height;
  • Descriptions of the dimensions of the nose, eye and hair color;
  • The description and location of notable scars tattos and other marks

The metrics of the system were chosen primarily to be simple so that they could be gathered accurately. The measurementns were taken by a trained individual, though not necessarily a skilled one. Features were chosen to allow easy identification of points to begin and to end measurement. The success of Bertillonage came from its ability to reduce the probability of type 1 errors. Though two individuals may have very similar height, the chance of the same two having similar measurements for the other features is unlikely.
Difficulties where Bertillonage appeared to be unable to distinguish people of similar appearance is often quoted as a reason for it being suerseded by forms of identification such as fingerprint analysis. One study concluded that “The body is more variable than the face and should be used in indentification” also stating practical advantage.
These systems aim to reduce identity to a representative and measurable set of features, though not using descriptions of the human body as a whole. Measurements are taken in a controlled way, much the same as in modern biometrics, though lacking its sophisticated statistical and recognition techniques.

On the development of soft biometrics approaches

The earliest approach explicitly mentioning a form of soft biometrics appears to be by Wayman who proposed the use of soft biometric traits like gender and age, to filter a large biometric database. An early motivation of the first soft biometrics was to augment conventional biometric signatures and envisaged that soft biometrics would be obtained separately, perhaps not from images originally used for recognition, and then used to enrich the biometric signature. Gender and ethnicity information of users were automatically extracted from their face images using established techniques. These soft biometric measures were shown to vastly improve on standard fingerprint matching: e.g. for identification and increase from 86.4% to 90.2% at rank 1. Soft biometrics clearly improves the fingerprint recognition rate.
An extended version of the definition of soft biometrics later stated “Soft biometric traits are physical behavioral or adhered human characteristics, classifiable in pre-defined human compliant categories” later adding “In other words the soft biometric traits instances are created in a natural way, used by humans to distinguish their peers”. As such we shall define soft biometrics as: the estimation or use of personal characteristics describable by humans that can be used to aid or effect person recognition.
Overall, the measures were demonstrated to have good recognition capability when used alone and proved an excellent addition to recognition capability, as consistent with the earliest form of soft biometrics. The main soft biometrics that can be estimated for all modalities are age and gender, whilst some other soft biometrics such as weight and height can only be estimated from a single modality.

Soft biometrics for face

Undeniably, a significantt number of soft biometric traites can be extracted from face images and facial movements. This generally includes gender recognition, age categorization and ethnicity classification. These are often referred to as demographic traits and are very useful for more affective human computer interaction and smart environments in which the systems should adapt to the users whose behaviors and preferences are not only different at different ages but also specific to a given ethnicity and/or gender. Other soft biometric traits that can be extracted from face images include Kinship information and skin color.

Identifying faces by soft biometrics

The first approach to face verification using soft biometrics was described as using attributes that included face, age and gender. This was motivated b the need to be able to recognize people in scenarios where pose and expression were not constrained. These were formulated using binary classifiers trained to recognize the presence or absence of describable aspects of visual appearanc, thereby expressing gender as whether a subject was male or not, rather than male or female. This was accompanied by simile classifiers – that removed any manual labeling for training attribute classifiers instead using binary classifiers trained to recognize the similarity of faces, or regions of faces, to specific reference people. The study alsso showed how peoplpe could outperform the new approach, suggesting room for further improvement.
The attributes were erived by inviting people to label one subject cocmpared with another. These attributes included more detailed consideration of face components and 63 labelers derived the measures for 50 subjects.
More recently proposed a more elaborate approach, where human describable face attributes are exploited to perform face identification in criminal investigations. The extracted attributes, such as eyebrows, chin and eyes shape are compared with the same attributes encoed in hand rawn police sketches.
A recent study has considered the performance of umans in determining these attributes for faces. The approach extracted previously proposed biologically inspired features from face images and selected soft biometric features using a boosting algorithm. The results show that humans can iscriminate gender slightly better than the automated techniques whereas automated techniques generally outperform humans at age estimation.
One new approach on soft biometrics raher than on attributes has found modest performance advantage when soft biometrics were used to augment traditional biometrics approaches of Local Region PCA and Cohort Linear Discriminant Analysis.
Another novel approach, originally proposed to recognize a wide range of facial attributes, as been evaluated in the recognition of facial expressions, gender, race, disguise and beard.

Exploiting characteristic features in face iamges

Human faces always conain characteristic patterns which may alone provide a support for classification. Bicego proposed to compute the most distinctive facial regions by comparing a face image with face images from other individuals. Te algorithm is based on the computation of face differences to determine the level of distinctiveness of any given face image. Space-variant patterns are randomly sampled from the face image, obtaining a large number of scale-invariant local features. A feature was selected as distinctive if it was significantly different from any other feature in a given set.
In a statistical classification of image patterns into facial marks is performed. Facial marks are detected by means of a blob extractor and subsequently classified. The relevance of the texture color of facial marks is also addressed.

Gender classification from face images

Significant progress has been made and several approaches have been reported in the literature. Fundamentally, the proposed techniques differe in

  • the choice of the facial representation, raging from the use of simple raw pixels to more complex features such as Gabor responses
  • the design of the classifier, ranging from the use of nearest neighbor and Fisher linear discriminant classifiers to artificial neural networks, SVM and boosting schemes

Note that there is no public database specifically designed for gender recognition evaluation.

Age classification from facial images

Automatic age classification aims to assign a label to a face regarding the exact age or the age category it belongs to. Aging is a very complex process that is extremely difficult to model: a group of people of the same age may look very different depending on, for example, environment, lifestyle, genes etc. Thus, deriving a universal age classification model is troublesome.
Several approaches introduced.

Race and ethnicity classification from facial images

automatic ethnicity classification problem has received relatively far less attention despite the potential applications. This is perhaps due to the ambiguity and complexity in defining and describing different ethnic groups. The terms “race” and “ethnicity” are sometimes used interchangeably although they refer to biological and sociological factors respectively. Generally, race refers to a person’s physical appearance or characteristics, while ethnicity is more viewed as a culture concept, relating to nationality, rituals and cultural heritages.

Soft biometrics for body

Describing the whole body for recognition

The first approach to soft biometrics based on human description obtained soft biometric labels for each subject in the Southampton gait database. These factors included:

  • Memory – the labelers were allowed to view images indefinitely so that memory could not impair the labels provided;
  • Defaulting – the labelers were not provided with default labels, but were explicitly required to state values for labels
  • Anchoring – which could occur from the phrasing tof the label was addressed by using “unsure” as opposed to “average”. Anchoring could also occur due to the order in which subjects were presented and so this order was randomized
  • Categorization – labelers were provided with five distinct ccategories for each label
  • Owner variables – since labels are likely to be influenced by a labeler’s perception of themself, their description of themself was also collected

semantic biometric traits and descriptions

There labels watched video footage of subject walking at a regular pace around a room and rated them using 23 traits identified from human descriptions of physique and motioni. Statistical analysis of the 23 traits led to 13 that were the most significant. The traits chosen were suitable for depeloyment at a distance allowing also for analysis of surveillance data and not restricting the approach.
Traits were usually described using a five point scale, though th age appeared to need a much larger et of descriptions particularly for younger subjects and this reflects the rate of change in appearance in youth. The face features collected were those available at a distance, rather than the fine grained identifications used for facialimage identification. These descriptions of were accompanied by descriptions of age, ethnicity and sex.
The more informative correlations were observed between traits whose terms describe overall thickness and length of the body, as weel as extremities. As such, Figure and Weight were highly correlated, and in turn they are both correlated with Arm thickness, Leg Thickness and Chest annotations. Correlation was also noted between Height and Leg Length, each also portraying correlations with Arm Length. There were some inverse correlations such as between arm and leg shape and shoulder shape and many other measures. The Race and Sex features were found to be statistically the most significant of all the descriptions used.
In Samangooei’s study, this had seen labeled by users in a categorical way wherein the labels depended on the labelers and on their impression of scene geometry. The problem of confused label was solved by changing the structure of the semantic terms to be relative rather than categorical: the labels were derived by comparing one subject with another using a new Web interface.

Estimating gender from images of the whole body

In contrast with the high volume of work aimed to estimate gender from face images, the estimation of gender from other data has received much less attention. There have been works aimed to estimate gender automatically from static images of the whole human body, and works aimed at deploying gait biometrics for gender estimation.

Estimating subjects’ height from whole body images

There has been a long history in determining suspects’ height in video, not just for identification but also since height is a strong indicator of a subject’s gender. A later approach used gait to direct height estimation by focusing on subject height by estimating the body frame size from the silhouette corresponding to the double support gait pose when a human is at maximum height (the human appears shortest at heel strike which is when the leading foot first strikes the ground), achieving between 75% and 87% CCR (with a potential 10% error rate).
It is worth noting that the earlier semantic approaches were also interested in human perception of height, but this was not correlated with ground truth (where available) since the intention was to explore the ability of human descriptions for recognition, rather than human descriptions for estimating soft biometrics parameters.

Estimating subjects’ weight from whole body images

The study suggested a set of measures that comprehensively covered the whole body (upper and lower part) and were reasonably correlated to the weight. The measures included height, upper leg length, calf circumference, upper arm length, upper arm circumference, waist, upper leg circumference. The approach used manual estimates from a nutrition survey and showed that the circumference of the upper arm was most correlated to weight. The approach clearly suggests that human body metrology contains sufficient information to reliably predict gender and weight.

Estimating subjects’ age from images of the whole body

To our knowledge the only approach aimed to estimate a subject’s age from a single image of their body was a feature based approach that extracted the SIFT feature and then applied sparse coding to learn a dictionary for feature quantization. It will be interesting in future to note the relative advantages associated with sequence- or single-image-based techniques.

Exploiting correlation to predict body characteristics

As in any exercise in pattern recognition it is possible to explore the nature of the feature descriptions to improve retrieval and recognition performance. It was observed that the traits most successfully predicted were skin color and ethnicity, and that this was likelydue to strong correlation with other traits allowing accurate prediction of missing data.

Handbook of Medical Imaging Enhancement


Image enhancement techniques are mathematical techniques that are aimed at realizing improvement in the quality of a given image. The result is another image that demonstrates certain features in a manner that is better in some sense as compared to their appearance in the original image.
Simple image enhancement techniques are developed and applied in an ad hoc manner. Advanced techniques that are optimized with reference to certain specific requirements and objective criteria are also available.
Although most enhancement techniques are applied with the aim of generating improved images for use by a human observer, some techniques are used to derive images that are meant for use by a subsequent algorithm for computer processing.
If used inappropriately, enhancement techniques themselves may increase noise while improving contrast, they may eliminate small details and edge sharpness while removing noise, and they may produce artifacts in general.

First Chapter

The first chapter provides an introduction to basic techniques, including histogram manipulation, mean and median filtering, edge enhancement, and image averaging and subtraction, as well as the Butterworth filter. Applications illustrate contrast enhancement, noise suppression, edge enhancement, and mappings for image display systems. The histogram equalization technique is theoretically well founded with the criterion of maximal entropy, aiming for a uniform histogram or grey-level probability density function. However, this technique may have limited success on many medical images because they typically have details of a wide range of size and small grey-level differences between different tissue types.
The limitation of the fundamental techniques motivated the development of adaptive and spatially variable processing techniques.

Second Chapter

The second chapter presents the design of the adaptive Wiener filter. The Wiener filter is an optimal filter derived with respect to a certain objective criterion. It can be designed to adapt to local and spatially variable details in images. The filter is cast ass a combination of low-pass and high-pass filters, with factors that control their relative weights.

Third Chapter

The third chapter focuses on nonlinear contrast enhancement techniques for radiographic images, in particular mammographic images. A common problem in contrast or edge enhancement is the accompanying but undesired noise amplification. A wavelet-based framework is described to perform combined contrast enhancement and denoising, suppression of the noise present in the input image and/or control of noise amplification in the enhancement process. The basic unsharp masking and subtracting Laplacian techniques are included as special cases of a more general system for contrast enhancement.

Fourth and Final Chapter

The fourth and final chapter describes a hybrid filter incorporating an adaptive multistage nonlinear filter and a multiresolution/multiorientation wavelet transform. The methods address image enhancement with noise suppression, as well as decomposition and selective reconstruction of wavelet-based subimages.

Together, the chapters in this section present an array of techniques for image enhancement: from linear to nonlinear, from fixed to adaptive, and from pixel-based to multiscale methods.

Advanced Machine Learning Note 1


  • Kernel Methods: Support Vector Machine
  • Ensemble Methods: Random Forest, Gradient Boosting
  • Deep Learning: CNNs, RNNs
  • Probabilistic Methods: Gaussian Processes, LDA, etc

What Makes a Good Learning Machine

For this we have to get conceptual and think about generalisation performance: how well do we do on unseen data as opposed to the training data

Machine Learning Difficulties

  • Works in high dimensions (lots of features)
  • problems can be over-constrained (conflicting data to deal with)
  • problems can be under-constrained (many possible solutions that are consistent with the data)
  • can not visualise the data to know what is going
  • the data will over-constrained in some dimensions and under-constrained in others


Least Squared Errors

Suppose we want to learn some function f(x), we construct a learning machine that makes a prediction \hat{f}( x|\omega ), where \omega are the weights we want to learn.

We typically choose the weights to minimise a training error:

E_{T}( \omega ) =\underset{x\in D}{\Sigma }\left(\hat{f}( x|\omega ) \ -\ f( x)\right)^{2} =\underset{i=1}{\overset{n}{\Sigma }}\left(\hat{f}( x_{i} |\omega ) -y_{i}\right)^{2},

where D=\{( x_{i} ,\ y_{i} \ =\ f( x_{i}))\}^{N}_{i=1} is a set of size N, sampled from the set of all inputs, \mathcal{X}, according to a probablity distribution p(x) describinig where our data is.

Generalisation Error

We want to minimise the generalisation error which in this case we can measure as:
E_{G}( \omega ) =\underset{x\in \mathcal{X}}{\Sigma } p( x)\left(\hat{f}( x|\omega ) -f( x)\right)^{2} \ .
We can estimate this if we have some examples with known labels y{i}=f(x_{i}) which we have not trained on.


Expected Generalisation Performance

The generalisation performance will depend on our training set, D.
The expected generalisation is when we average over all different data sets of size N drawn independently from p(x).
For each data set, D, we would learn a different set of weights \omega_{D} and get a different approximator \hat{f}(x|\omega_{D}).
In practice we only get one data set.


Mean Machine

To help understand generalisation we can consider the mean prediction with respect to machines trained with all data sets of sizeN,

\widehat{f_{m}}( x) =\mathbb{E}_{D}\left[\hat{f}( x|\omega _{D})\right].

We can define the bias to be generalisation performance of the mean machine,

B=\underset{x\in \mathcal{X}}{\Sigma } p(x)\left(\widehat{f_{m}}( x) -f( x)\right)^{2}.

Bias and Variance

We can write the expected generalisation as:

\overline{E_{G}} =\mathbb{E}_{D}[ E_{G}( \omega _{D})]

Don’t know why there miss a \Sigma

Where V is the variance defined by V=\mathbb{E}_{D}[\underset{x\in \mathcal{X}}{\Sigma } p( x)\left(\hat{f}( x|\omega _{D}) -\widehat{f_{m}}( x)\right)^{2}


Complex machine can over-fitting: fitting the training data well at the cost of getting poorer generalisation performance

Dimensionality Reduction

We can simplify our machines by using less features.
We can project our data onto a lower dimensional sub-space.
We can use clustering to find exemplars and recode our data in terms of differences from the exemplars.

Feature Selection

We can try different combinations of features to find the best set, although it rapidly becomes intractable to do this in all ways.
We can use various heuristics to decide which features to keep, but no heuristic is fail-safe.

Explicit Regularisation

We can modify our error function to choose smoother functions:

E=\underset{n=1}{\overset{N}{\Sigma }}\left( \omega ^{T} x_{n} -y_{n}\right)^{2} +v||\omega ||^{2}
Second term is minised when \omega_{i}=0, if \omega_{i} is large then f(x|\omega)=\omega^{T}x_{n}=\underset{n=1}{\overset{N}{\Sigma }}\omega_{i}x_{i} varies rapidly as we change x_{i}.
We can use other regularisers like Lasso E=\underset{n=1}{\overset{N}{\Sigma }}\left( \omega ^{T} x_{n} -y_{n}\right)^{2} +v\underset{n=1}{\overset{p}{\Sigma }}|w_{i}|.

Reading: Handbook of Medical Imaging Preface


This handbook presents concepts and digital techniques for processing and analyzing medical images after they have been generated or digitized.
It is organized into six sections that correspond to the fundamental classes of algorithms: enhancement, segmentation, quantification, registration, visualization, and a section that covers compression, storage, and communication.


Enhancement algorithms are used to reduce image noise and increase the contrast of structures of interest.
In images where the distinction between normal and abnormal tissue is subtle, accurate interpretation may become difficult if noise levels are relatively high. In many cases, enhancement improves the quality of the image and facilitates diagnosis.
Enhancement techniques are generally used to provide a clearer image for a human observer, but they can also form a preprocessing step for subsequent automated analysis.
Image enhancement includes linear, nonlinear, fixed, adaptive, pixel-based, or multi-scale methods are described in the chapter.


Segmentation is the stage where a significant commitment is made during automated analysis by delineating structures of interest and discriminating them from background tissue.
The segmentation approach dictates the outcome of the entire analysis since measurements and other processing steps are based on segmented regions. Segmentation algorithms operate on the intensity of texture variations of the image using techniques that include thresholding, region growing, deformable templates, and pattern recognition techniques such as neural networks and fuzzy clustering. Hybrid segmentation and volumetric segmentation are also addressed in this section.


Quantification algorithms are applied to segmented structures to extract the essential diagnostic information such as shape, size, texture, angle, and motion.
A comprehensive chapter covers the choices and pitfalls of image interpolation, a technique included in many automated systems and used particularly in registration.


Registration of two images of the same part of the body is essential for many applications where the correspondence between the two images conveys the desired information.
Comparison of acquired images with digital anatomic atlas templates also requires registration algorithms. These algorithms must account for the distortions between the two images, which may be caused by differences between the imaging methods, their artifacts, soft tissue elasticity, and variability among subjects.


Visualization is a relatively new area that is contributing significantly to medicine and biology. While automated systems are good at making precise quantitative measurements, the complete examination of medical images is accomplished by the visual system and experience of the human observer.
The field of visualization includes graphics hardware and software specifically designed to facilitate visual inspection of medical and biological data.
The section starts with the evolution of visualisation techniques and presents the fundamental concepts and algorithms used for rendering, display, manipulation, and modelling of multidimensional data, as well as related quantitative evaluation tools. Fast surface extraction techniques, volume visualisation, and virtual endoscopy are discussed in detail, and applications are illustrated in two and three dimensions.

Compression, Storage, and Communication

Compression, storage, and communication of medical images are related functions for which demand has recently increased significantly.
Lossless image compression techniques ensure that all the original information will remain in the image after compression but they do not reduce the amount of data considerably. Lossy compression techniques can produce significant saving in storage but eliminate some information from the image.
Picture archiving and communication systems (PACS) are described and techniques for preprocessing images before storage and discussed.


Reading: Reinforcement Learning: An Introduction Chapter 1


In this book we explore a computational approach to learning from interaction. Rather than directly theorizing about how people or animals learn, we primarily explore idealized learning situations and evaluate the effectiveness of various learning methods. The approach we explore, called reinforcement learning, is much more focused on goal-directed learning from interaction than are other approaches to machine learning.

1.1 Reinforcement Learning

Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them. These two characteristics – trial-and-error search and delayed reward – are the two most important distinguishing features of reinforcement learning.
The basic idea is simply to capture the most important aspects of the real problem facing a learning agent interacting over time with its environment to achieve a goal. A learning agent must be able to sense the state of its environment to some extent and must be able to take actions that affect the state. The agent also must have a goal or goals relating to the state of the environment. Markov decision processes are intended to include just these three aspects – sensation, action, and goal – in their simplest possible forms without trivializing any of them. Any method that is well suited to soving such problems we consider to be a reinforcement learning problem.

  • Reinforcement learning is different from supervised learning. The object of this kind of learning is for the system to extrapolate, or generalize, its responses so that it acts correctly in situations not present in the training set. In interation problem it is often impractical to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act. An agent must be able to learn from its own experience.
  • Reinforcement learning is also different from unsupervised learning. Although one might be tempted to think of reinforcement learning as a kind of unsupervised learning because it does not rely on examples of correct behavior, reinforcement learning is trying to maximize a reward signal instead of trying to find hidden structure.

One of the challenges that arise in reinforcement learning is the trade-off between exploration and explioitation. To obtain a lot of reward, a reinforcement learning agent must prefer actions that it has tried in the past and found to be effective in producing reward. The agent has to exploit when it has already experienced in order to obtain reward, but it also has to explore in order to make better action selections in the future. The agent must try a variety of actions and progressively favor those that appear to be best. On a stochastic task, each action must be tried many times to gain a reliable estimate of its expected reward. For now, we simply note that the entire issue of balancing exploration and exploitation does not even arise in supervised and unsupervised learning, at least in their purest forms.
Another key feature of reinforcement learning is that it explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment. This is in contrast to many approaches that consider subproblems without addressing how they might fit into a larger picture.
Reinforcement learning takes the opposite tack, tarting with a complete, interactive, goal-seeking agent. All reinforcement learning agents have explicit goals, can sense aspects of their environments, and can choose actions to influence their environments. Moreover, it is usually assumed from the beginning that the agent has to operate despite significant uncertainty about the environmentn it faces. For learning research to make progress, important subproblems have to be isolated and studied, but they should be subproblems that play clear roles in complete, interactive, goal-seeking agents, even if all the details of the complete agent cannot yet be filled in.
One of the most exciting aspects of modern reinforcement learning is its substantive and fruitful interactions with other engineering and scientific disciplines. Reinforcement learning is part of a decades-long trend within artificial intelligence and machine learning toward greater integration with statistics, optimization, and other mathematical subjects. For example, the ability of some reinforcement learning methods to learn with parameterized approximators addresses the classical \curse of dimensionality” in operations research and control theory. More distinctively, reinforcement learning has also interacted strongly with psychology and neuroscience, with substantial benefits goingboth ways.

Finally, reinforcement learning is also part of a larger trend in artificial intelligence back toward simple general principles. Modern artificial intelligence now includes much research looking for general principles of learning, search, and decision making, as well as trying to incorporate vast amounts of domain knowledge.

Reading: Study on Comparison, Improvement and Application of Whitecap Automatic Identification Algorithm


Ocean whitecap is a typical sea surface phenomenon which is exremely significant and valuable to research on waave breaking. The features of whitecap automatic identification based on digital image processing are fast, efficient, low cost and large quatity. There are three kinds of whitecap automatic identification algorithm, such as AWE (automated whitecap extraction), ATS (adaptive thresholding segmentation) and IBCV (iterative between class variance). The result of sea surface image processed through these automatic identification algorithms are compared and analyzed. Aimed on uneven illumination of sea surface image and unstable operation results, it is proposed a kind of illumination correction algorithm using top-hat transform to eliminate negative impact by sunshine reflection and make operation stable using image enhancement technology. Experiments based on the shipboard video intentify this modified method enhaces robustness of the original algorithms and improves the computational efficient of WC so that it advantages for automated processing sequence images.


The extraction and identification of whitecap has significant importance on interation of ocean-air interation, ocean remote sense, and many other areas .
The bottleneck of sea surface whitecap research is the lack of field data.

  • Schwendeman used shipborne video data reseach the relationship between whitecap coverage and wind stress, wave slope, and turbulent dissipation rate
  • Bakhoday-Paskyabi based on buoy camera proposed histogram information based whitecap adaptive thresholding method
  • Allashi invented a omnioriental whitecap image system to capture whitecap images using fish-eyed camera to calculate whitecap coverage
  • Zhang [!!!] proposed EM based whitecap detection technique using recognition methods to achieve fast detection
  • Zhao [!!!] analysed sea surface images captured by CCD observation system and got live whitecap coverage percentage

The key technique of automatic recognition is to detect whitecap coverage area and calculate whitecap coverage percentage automaticall. The paper proposed a improved thresholding based recognition algoritm and test on field.

Theroy and Comparison

Whitecap extraction

Definition: WC\ =\ ( N_{W} /N_{A}) \ \times 100\%\
Total pixels divided by the number of whitecap pixels.
The optimal threshold is decided by the gray-scale of each image. The pixels above the threshold is considered to be whitecap and the other are sea background.

Algorithms detail

AWE: Automated Whitecap Extraction

AWE  is a classic adaptive thresholding algorithm. AWE choose the best threshold by second and third derivatives.该算法对灰度图像进行操作,归一化的阈值变化范围在0.01-1之间,由灰度直方图得出P(i)函数,P(i)是灰度级大于i的像素
个数.像素增长百分比(Percentage Increase of Pixels, PIP)函数由P(i)函数求导得到:
PIP( i) \ =\ \dfrac{P( i) \ -\ P( i\ +\ 0.01)}{P( i)}


ATS: Adaptive Thresholding Segmentation

受到AWE算法的启发,B阿卡akhoday-Paskyabi等人提出了自适应阈值分割算法ATS . ATS 算法同样是应用连续的灰度直方图进行求导分析来确定最佳阈值。对于归一化的离散直方图h,引入了连续函数CDF函数。CDF函数是像素值为k的累积分布函数,其表达式为H( k) \ =\ \Sigma ^{k}_{i\ =\ 0} h( i)。式中,H是基于变量x的连续函数。经过曲率函数的分析,CDF函数的拐点就是最佳阈值,然而在尾部区域有噪声或者振荡,难以检测正确的拐点。通过对H’(x),H’’(x)使用三角算法确定最佳阈值,即到峰值与拖尾处连线距离最大的点.

IBCV: Iterative Between Class Variance, IBCV

IBCV 是基于Otsu的最佳阈值选择方法提出 的。归一化灰度直方图的离散分布函数是p(i) = ni/N, ni是灰度级为i的像素数量,N是兴趣区域的所有像素个数。用阈值T将直方图分为两类,BCV被定义为BCV( T) \ =\ p_{1}( T) p_{2}( T)[ m_{2}( T) \ -m_{1}( T)]  p1, p2分 别 是 两 类(水和白冠区域)的概率,m1, m2分别是这两类的平均密度值。最佳的阈值通过迭代使BCV最大而得到。IBCV是将连续的灰度直方图分为三个区域,依次迭代,最后取得两个阈值分别使得第一部分与第二部分的平均密度差值最大,第二部分与第三部分的平均密度差值最大。最佳阈值由两个阈值加权平均得到。但该算法对光照敏感,作者建议进行光照校正处理。



Top-Hat Transform

形态学图像处理是用来提取图像分量的一种技术手段,利用结构元素为基本元素,目的是突出所需目标信息.输入图像为(x,y),结构元素为B(u,v)。顶帽变换是组合形态学运算,开操作去除了小于结构元素的明亮部分,保留了明亮的大区域,从原图减去开操作后的大片明亮区域来消除光照带来的影响。顶帽变换的公式为G=F-( F \dot B)。本文选取半径为20的圆盘结构作为顶帽变换的结构元素,圆盘结构各向同性,适合无规律地改变方向的白冠。

Image Enhancement




whitecap image Artificial annotation AWE ATS IBCV
origin alg 3.33% 1.70% 1.88% 57.31%
improved alg 3.33% 3.03% 3.36% 3.76%



  • 顶帽变换可有效减少光照不均的影响,提高识别准确率;
  • 求导算法的结果不稳定,通过图像增强降低最优阈值的选择对白冠覆盖率的影响;
  • 通过视频连续帧图像的实验分析,增强了原三种算法在光照不均情况下的鲁棒性,有效地提高了WC的计算正确率,有利于自动化处理视频序列图像。