false

Textbook Notes
(369,133)

United States
(206,214)

University of Miami
(1,303)

Computer Science
(5)

CSC 329
(3)

Chapter

School

University of Miami
Department

Computer Science

Course Code

CSC 329

Professor

Dr.Shahriar Negahdaripour

Description

Tensor Voting: Theory and Applications
GØrard Medioni Chi-Keung Tang Mi-Suen Lee
Institute for Robotics & Intelligent Systems Computer Science Department Philips Research
USC, Los Angeles, CA 90089 HKUST, Hong Kong Briarcliff Manor, NY 10510
[email protected] [email protected] [email protected]
Abstract present a novel and unified methodology for the robust
inference of features from noisy data. The organization
We present a unified computational framework which of this paper is as follows: Section 2 gives an overview
properly implements the smoothness constraint to gen-
erate descriptions in terms of surfaces, regions, curves, of previous work. Section 3 describes our overall tensor
and labelled junctions, from sparse, noisy, binary data voting formalism. In Section 4 we present examples on
feature inference from noisy 2-D and 3-D data. Then, in
in 2-D or 3-D. Each input site can be a point, a point
with an associated tangent direction, a point with an Section 5, we extend the basic formalism to solve early
associated normal direction, or any combination of the vision problems, such as shape from stereo, and motion
grouping and segmentation, and also visualization prob-
above. The methodology is grounded on two elements:
tensor calculus for representation, and linear voting for lems such as flow visualization. Finally, we conclude in
communication: each input site communicates its infor- Section 6.
mation (a tensor) to its neighborhood through a pre- 2 Previous Work
defined (tensor) field, and therefore casts a (tensor) Based on the type of input, we can broadly classify
vote. Each site collects all the votes cast at its location
and encodes them into a new tensor. A local, parallel previous work on the inference of curves, regions, and
surfaces into 3 areas, namely, dot clustering in 2-D,
marching process then simultaneously detects features. curve and contour inference in 2-D, and surface recon-
The proposed approach is very different from traditional
variational approaches, as it is non-iterative. Further- struction in 3-D.
In 2-D, Zucker and Hummel [33] address the prob-
more, the only free parameter is the size of the neigh- lem of region inference from dot clusters using iterative
borhood, related to the scale. We have developed
several algorithms based on the proposed methodology relaxation labeling technique. Recently, Shi and Malik
[24] propose to treat region inference as a graph parti-
to address a number of early vision problems, including tioning problem, and present a recursive algorithm that
perceptual grouping in 2-D and 3-D, shape from stereo,
and motion grouping and segmentation, and the results uses the normalized cut as the global criterion to seg-
ment tokens into regions. Later, Parent and Zucker [20]
are very encouraging. use a relaxation labeling scheme that imposes co-circu-
larity and constancy of curvature. Dolan and Weiss [4]
1 Introduction
demonstrate a hierarchical approach to grouping relying
In computer vision, we often face the problem of on compatibility measures such as proximity and good
identifying salient and structured information in a noisy continuation. Sha’ashua and Ullman [23] propose the
data set. From greyscale images, edges are extracted by use of a saliency measure to guide the grouping process,
first detecting subtle local changes in intensity and then and to eliminate erroneous features in the image. The
linking locations based on the noisy signal responses. scheme prefers long curves with low total curvature by
From binocular images, surfaces are inferred by first
using an incremental optimization scheme (similar to
obtaining depth hypotheses for points and/or edges dynamic programming).
using local correlation measures, and then selecting and In 3-D, the problem is to fit sufaces to a cloud of
interpolating appropriate values for all points in the possibly noisy points. Successful approaches include
images. Similarly, in image sequence analysis, the esti- the physics-based approaches [29], in which the initial
mation of motion and shape starts with local measure- pattern deforms iteratively to the desired shape, subject
ments of feature correspondences which give noisy data to an objective function. Poggio and Girosi’s [21] net-
for the subsequent computation of scene information. work learning approach can be used when we know in
Hence, for a salient structure estimator to be useful in advance the pattern consists of a single surface. Fua and
computer vision, it has to be able to handle the presence Sander [6] propose a local algorithm to describe sur-
of multiple structures, and the interaction between them,
faces from a set of points. Szeliski et al. [25] propose
in noisy, irregularly clustered data sets. In this paper, we the use of a physically-based dynamic local process evolving from each data point (particle), and subject to shows the 3-D version. The methodology is grounded
various forces. They are able to handle objects of unre- on two elements: tensor calculus for data representa-
stricted topology, but assume a single connected object. tion, and linear tensor voting for data communication.
Hoppe et al. [13] and Boissonnat [3] use computational Each input site propagates its information in a neighbor-
geometry to address the problem by treating the data as hood. The information is encoded in a tensor, and is
vertices of a graph and constructing edges based on determined by a predefined voting field. Each site col-
local properties. Another approach, alpha-shapes [5], lects the information cast there and analyzes it, building
has also attracted much attention in the computational a saliency map for each feature type. Salient features are
geometry community. Recently, a new approach, known located at local extrema of these saliency maps, which
as the level-set approach [32], has produced very good can be extracted by non-maximal suppression.
results, and attracted significant attention. It allows Each input token is first encoded into a second
changes in topology. Other methodologies employ the order symmetric tensor. For instance, if the input token
perceptual saliency approach, such as the works by has only position information, it is transformed into an
Thornber and Williams [31], and by Sarkar and Boyer isotropic tensor (a ball) of unit radius.
[22], Sha’ashua and Ullman [23]. Among all the meth- In a first voting stage, tokens communicate their
ods, Guy and Medioni’s work is one of the non-iterative information with each other in a neighborhood, and
approaches that is able to handle both curve inference in refine the information they carry. After this process,
2-D [9] and surface inference in 3-D from point, curve each token is now a generic second order symmetric
element, or surface patch element inputs [10], although tensor, which encodes confidence of this knowledge
individually. Unlike other methods mentioned here, (given by the tensor size), curve and surface orientation
their method is robust to a considerable amount of information (given by the tensor orientations).
noise, has no limitation on the surface topology, detects In a second stage, these generic tensor tokens prop-
orientation discontinuity, and is non-iterative. agate their information in their neighborhood, leading to
a dense tensor map which encodes feature saliency at
3 The tensor voting formalism
every point in the domain. In practice, the domain space
An overall illustration of our method, summarizing is digitized into a uniform array of cells. In each cell the
its different components, is shown in Figure 1, which
tensor can be decomposed into elementary components
In(sparse)ns which express different aspects of the information cap-
tured.
Encode The resulting dense tensor map is then decom-
posed. Surface, curve, and junction features are then
Tensor tokens obtained by extracting, with subvoxel precision, local
(sparse) extrema of the corresponding saliency values along a
direction. The final output is the aggregate of the out-
Tensor Voting puts for each of the components.
Tensor tokens 3.1 Representation
(refined)
Our goal is to extract geometric structures such as
regions, curves, surfaces, and the intersection between
Tensor Voting
them. These are well defined mathematical entities with
the well known properties.
Saliency tensor
field (dense) Points can simply be represented by their coordi-
nates. A first order local description of a curve is given
Decompose by the point coordinates, and its associated tangent or
normal. A second order description would also include
the associated curvature.
Surfmap saliency Curvmapaliency Junctmap saliency
A first order local description of a surface patch is
given by the point coordinates, and its associated nor-
Feature Extraction mal. A second order description would also include the
principal curvatures and their directions.
Here, however, we do not know in advance what
surfaces curves points type of entity (point, curve, surface) a token may belong
to. Furthermore, because features may overlap, a loca-
Layered Description
tion may actually correspond to both a point and a
curve, or even to a surface, a curve and a point at the
Figure 1. Overall approach
same time . An example of such a case occurs at the intersection between 2 smooth curves: the intersection
should be a point, that is, it represents a junction with no
associated tangent information, but also represents 2
curve elements, as the curves do not stop there.
λ 3
3.2 Second order symmetric tensor
λ 2
To capture first order differential geometry infor- λ 1
mation and its singularities,second order symmetric
tensor is used. It captures both the orientation informa-
tion and its confidence, or saliency. Such a tensor can be
visualized as an ellipse in 2-D, or an ellipsoid in 3-D.
Œ2
Intuitively, the shape of the tensor defines the type of
information captured (point, curve, or surface element), Œ3
and the associated size represents the saliency. For Œ 1 λ -λ
λ 3 λ -λ 1 2
instance, in 2-D, a very salient curve element is repre- 2 3
sented by a thin ellipse, whose major axis represents the
estimated tangent direction, and whose length reflects Œ
the saliency of the estimation. Œ 3 1
To express a second order symmetric tensor S, stick tensor
graphically depicted by an ellipse in 2-D, and an ellip-
soid in 3-D, we choose to take the associated quadratic
ball tensor plate tensor
form, and to diagonalize it, leading to a representation
based on the eigenvalues 1, 2 ,3 and the eigenvectors
Figure 3. tensor decomposition
Œ1, Œ2 Œ3 In a more compact form, S =
λ1 1 1λ ŒŒ2 2 2Œ ,3 3 3e λ ≥λ ≥λ 10 a2e 3he the decomposition of a general saliency tensor into the
eigenvalues, and Œ , Œ, Œ are the eigenvectors corre- stick, plate, and ball components.
1 2 3
sponding to λ1, 2 ,3λ respectively. Note that, becauseS At each location, the estimate of each of the 3 types
is a second order symmetric tensor, the eigenvalues are of information, and their associated saliency, is there-
real and positive (or zero), and the eigenvectors form anfore captured as follows:
orthonormal basis. point-ness: no orientation, saliency 3s λ
The eigenvectors correspond to the principal direc-
curve-ness: orientation i3 Œ, saliency i2 λ3-λ
tions of the ellipsoid/ellipse and the eigenvalues encode
the size and shape of the ellipsoid/ellipse, as shown in surface-ness: orientation i1 Œ, saliency 1s 2 -λ
In 2-D, there is no surface-ness, and curve-ness is
Figure 2. S is a linear combination of outer product ten-
sors and, therefore a tensor. expressed by Œ1for the tangent orientation, and 1y 2 -λ
for curve saliency.
λ3 λ1 Œ2 We have now explained the information encoded in
Œ a second order symmetric tensor, which consists of three
λ Œ 1 independent elements, and a measure of feature
2 3
saliency.
Figure 2. An ellipsoid and its eigensystem
3.4 Tensor communication
3.3 Tensor decomposition
We now turn to our communication and computa-
All the tensors used so far are somewhat singular, tion scheme, which allows a site to exchange informa-
in the sense that the associated ellipsoid has the shape of
either a stick, a plate, or a ball. As a result of the voting with its neighbors, and infer new information.
procedure (to be described later), we produarbitrary Token refinement and dense extrapolation.The input
second-order, symmetric tensors, therefore we need to
tokens are first encoded as perfect tensors. In 3-D, a
handle any generic tensor. point token is encoded as a 3-D ball. A point associated
The spectrum theorem [8] states that any tensor can
be expressed as a linear combination of these 3 cases, with tangent direction is encoded as a 3-D plate. A point
T associated with normal direction is encoded as 3-D
i.e., S=(λ1-2) 1 1(λ2-
λ3)(1 1+Œ2 2+λ 3ŒŒ1 1Œ 2 2 )3 3T, where ŒŒ T stick. These initial tensors communicate with each other
T T 1 1 in order to
describes a stick, 1 1Œ +2 2) describes a plate, and
(Œ1 1+ŒŒ2 2Œ )3 3scribes a ball. Figure 3 shows • derive the most preferred orientation information, or
refine the initial orientation if given, for each of the
input tokens (token refinement) , and • extrapolate the above inferred information at every
y
location in the domain for the purpose of coherent
feature extraction ( dense extrapolation ).
tangent normal
In the token refinement case, each token collects all
orientation
the tensor values cast at its location by all the other
tokens. The resulting tensor value is the tensor sum of
all the tensor votes cast at the token location.
In the dense extrapolation case, each token is first
decomposed into its independent elements, and broad-
casts this information, using an appropriate tensor field, intensity-coded strength (saliency)
which also defines a neighborhood, and puts the corre-
sponding tensor value at every location. In practice, val-
ues are entered at discrete cell locations. The tensor
value at any given location in the domain is the tensor
sum of all the tensor votes cast at this location. A some-
what subtle difference occurs in this second case, as ball
tensors define isolated features, which therefore do not
need to propagate their information, and thus do not
vote. 2 views of the field strength of 2-D stick field
While they may be implemented differently for
efficiency, these 2 operations are equivalent to a voting Figure 4. The fundamental 2-D stick field
process, and can be regarded as tensor convolution with
To obtain the plate kernel, we rotate the 3-D stick
voting kernels, and produces tensors in turn. This tensor kernel obtained above about the z-axis, integrating the
convolution is in fact a simple alignment process,
contributions by tensor addition. Therefore, the 3-D
involving a translation followed by a rotation. plate kernel thus obtained describes a plate with normal
Therefore, in 3-D, it remains to describe the deriva- orientation is T . Again, the other orientations can
tion of the 3-D voting kernels. 001
be obtained readily by a simple rotation. Denote the 3-D
All voting kernels can be derived from the funda- stick kernel by V, the plate is derived from V by:
mental 2-D stick kernel , by rotation and integration .
π
Figure 4 shows this 2-D stick kernel. In [18], we explain ∫0Vdγ α =0, β 0 (2)
in mathematical terms that this voting kernel in fact
encodes the proximity and the smoothness constraints. To obtain the ball kernel, we rotate the 3-D stick
Note that a direction can be defined by either the tan- kernel about the y-axis and z-axis (the order does not
gent vector, or the normal vector, which are orthogonal matter), integrating the contributions by tensor addition:
to each other. We can therefore define two equivalent
π
fundamental fields, depending whether we assign a tan- ∫ Vdβdγ α = 0 (3)
gent or normal vector at the receiving site. 0
3.5 Feature Extraction
Derivation of the 3-D voting kernels. Denote the fun- In this section, we summarize the marching process
damental 2-D stick kernel by V .In 3-D, we need 3 vot-
F for feature extraction in 3-D, by giving mathematical
ing kernels: the stick, the plate, and the ball voting definitions. Detailed implementation can be found in
kernels. The 3-D stick kernel is obtained by revolving
[18].
the normal version of VF90 degrees about the z-axis At the end of the voting process, we have produced
(denote it by V F). Then, we rotate VFabout the x-axis, a dense tensor map, which is then decomposed in two
and integrate the contributions by tensor addition during
dense vector maps (three in 3-D).
the rotation. The reTulting 3-D stick kernel describes the In 3-D, each voxel of these maps has a 2-tuple
orientation 100 in world coordinates. The other ori- ˆ ˆ
(), v , where s is a scalar indicating strength and v is
entations can be obtained by a simple rotation. Hence, a unit vector indicating direction:
w.l.o.g., the 3-D stick kernel is defined mathematically
as • Surface mapan(SdMap): , = λ 1– λ 2 v = e ˆ1
indicates the normal direction.
π
∫0V’Fdα β =0, γ 0 (1) • Curve mapa(nCdMap): , =s λ – λ v = e ˆ
2 3 3
where α,β,γ are angles of rotation about the x, y, z axis indicates the tangent direction.
ˆ
respectively. • Junction map (JMap): s = λ3 , and v is arbitrary. These maps are dense vector fields, which are then input consisting of an ellipse made up of curve elements
used as input to our extremal algorithms in order to gen- (tangents), and a curve made up of dots. Another exam-
erate features such as junctions, curves, and surfaces. ple of a “pretzel” is shown in Figure 6.
The definition of point extremality, corresponding
to junctions, is straightforward: it is a local maximum of
the scalar value s.
Surface extremality. Let each voxel in the given vector
ˆ
field hold a 2-tuple (s), nhere insicates field
strength and n denotes the normal direction. The vector
field is continuous and dense, i.e., (s), n is defined for input
every point p in 3-D space.
A point is on an extremal surface if its strength s is
locally extremal along the direction of the normal, i.e.,
ˆ
ds ∕dn = 0 . This is a necessary condition for 3-D sur-
face extremality. A sufficient condition, which is used in
implementation, is defined in terms of zero crossings
ˆ
along the line defined by n .We therefore define the Tra- dense curve
dienvtectoras, g g = ∂s ∕ ∂x,, s ∕ ∂y ∂sz∕ ∂ saliency map dense orientation
project g onto ˆ, i.e., q = n ⋅ g . Thus, an extremal uncertainty map
surface is the locus of points with q = 0 .
Curve extremality. Each voxel in the given potential
ˆ
vector field holds a 2-tuple (s), t, where is the field
strength and t indicates the tangent direction. Suppose
the field is co

More
Less
Unlock Document

Related notes for CSC 329

Only pages 1,2 and half of page 3 are available for preview. Some parts have been intentionally blurred.

Unlock DocumentJoin OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

Continue

Continue
OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.