tensor_voting.pdf

10 Pages
57 Views

Department
Computer Science
Course Code
CSC 329
Professor
Dr.Shahriar Negahdaripour

This preview shows pages 1,2 and half of page 3. Sign up to view the full 10 pages of the document.
Description
Tensor Voting: Theory and Applications GØrard Medioni Chi-Keung Tang Mi-Suen Lee Institute for Robotics & Intelligent Systems Computer Science Department Philips Research USC, Los Angeles, CA 90089 HKUST, Hong Kong Briarcliff Manor, NY 10510 [email protected] [email protected] [email protected] Abstract present a novel and unified methodology for the robust inference of features from noisy data. The organization We present a unified computational framework which of this paper is as follows: Section 2 gives an overview properly implements the smoothness constraint to gen- erate descriptions in terms of surfaces, regions, curves, of previous work. Section 3 describes our overall tensor and labelled junctions, from sparse, noisy, binary data voting formalism. In Section 4 we present examples on feature inference from noisy 2-D and 3-D data. Then, in in 2-D or 3-D. Each input site can be a point, a point with an associated tangent direction, a point with an Section 5, we extend the basic formalism to solve early associated normal direction, or any combination of the vision problems, such as shape from stereo, and motion grouping and segmentation, and also visualization prob- above. The methodology is grounded on two elements: tensor calculus for representation, and linear voting for lems such as flow visualization. Finally, we conclude in communication: each input site communicates its infor- Section 6. mation (a tensor) to its neighborhood through a pre- 2 Previous Work defined (tensor) field, and therefore casts a (tensor) Based on the type of input, we can broadly classify vote. Each site collects all the votes cast at its location and encodes them into a new tensor. A local, parallel previous work on the inference of curves, regions, and surfaces into 3 areas, namely, dot clustering in 2-D, marching process then simultaneously detects features. curve and contour inference in 2-D, and surface recon- The proposed approach is very different from traditional variational approaches, as it is non-iterative. Further- struction in 3-D. In 2-D, Zucker and Hummel [33] address the prob- more, the only free parameter is the size of the neigh- lem of region inference from dot clusters using iterative borhood, related to the scale. We have developed several algorithms based on the proposed methodology relaxation labeling technique. Recently, Shi and Malik [24] propose to treat region inference as a graph parti- to address a number of early vision problems, including tioning problem, and present a recursive algorithm that perceptual grouping in 2-D and 3-D, shape from stereo, and motion grouping and segmentation, and the results uses the normalized cut as the global criterion to seg- ment tokens into regions. Later, Parent and Zucker [20] are very encouraging. use a relaxation labeling scheme that imposes co-circu- larity and constancy of curvature. Dolan and Weiss [4] 1 Introduction demonstrate a hierarchical approach to grouping relying In computer vision, we often face the problem of on compatibility measures such as proximity and good identifying salient and structured information in a noisy continuation. Sha’ashua and Ullman [23] propose the data set. From greyscale images, edges are extracted by use of a saliency measure to guide the grouping process, first detecting subtle local changes in intensity and then and to eliminate erroneous features in the image. The linking locations based on the noisy signal responses. scheme prefers long curves with low total curvature by From binocular images, surfaces are inferred by first using an incremental optimization scheme (similar to obtaining depth hypotheses for points and/or edges dynamic programming). using local correlation measures, and then selecting and In 3-D, the problem is to fit sufaces to a cloud of interpolating appropriate values for all points in the possibly noisy points. Successful approaches include images. Similarly, in image sequence analysis, the esti- the physics-based approaches [29], in which the initial mation of motion and shape starts with local measure- pattern deforms iteratively to the desired shape, subject ments of feature correspondences which give noisy data to an objective function. Poggio and Girosi’s [21] net- for the subsequent computation of scene information. work learning approach can be used when we know in Hence, for a salient structure estimator to be useful in advance the pattern consists of a single surface. Fua and computer vision, it has to be able to handle the presence Sander [6] propose a local algorithm to describe sur- of multiple structures, and the interaction between them, faces from a set of points. Szeliski et al. [25] propose in noisy, irregularly clustered data sets. In this paper, we the use of a physically-based dynamic local process evolving from each data point (particle), and subject to shows the 3-D version. The methodology is grounded various forces. They are able to handle objects of unre- on two elements: tensor calculus for data representa- stricted topology, but assume a single connected object. tion, and linear tensor voting for data communication. Hoppe et al. [13] and Boissonnat [3] use computational Each input site propagates its information in a neighbor- geometry to address the problem by treating the data as hood. The information is encoded in a tensor, and is vertices of a graph and constructing edges based on determined by a predefined voting field. Each site col- local properties. Another approach, alpha-shapes [5], lects the information cast there and analyzes it, building has also attracted much attention in the computational a saliency map for each feature type. Salient features are geometry community. Recently, a new approach, known located at local extrema of these saliency maps, which as the level-set approach [32], has produced very good can be extracted by non-maximal suppression. results, and attracted significant attention. It allows Each input token is first encoded into a second changes in topology. Other methodologies employ the order symmetric tensor. For instance, if the input token perceptual saliency approach, such as the works by has only position information, it is transformed into an Thornber and Williams [31], and by Sarkar and Boyer isotropic tensor (a ball) of unit radius. [22], Sha’ashua and Ullman [23]. Among all the meth- In a first voting stage, tokens communicate their ods, Guy and Medioni’s work is one of the non-iterative information with each other in a neighborhood, and approaches that is able to handle both curve inference in refine the information they carry. After this process, 2-D [9] and surface inference in 3-D from point, curve each token is now a generic second order symmetric element, or surface patch element inputs [10], although tensor, which encodes confidence of this knowledge individually. Unlike other methods mentioned here, (given by the tensor size), curve and surface orientation their method is robust to a considerable amount of information (given by the tensor orientations). noise, has no limitation on the surface topology, detects In a second stage, these generic tensor tokens prop- orientation discontinuity, and is non-iterative. agate their information in their neighborhood, leading to a dense tensor map which encodes feature saliency at 3 The tensor voting formalism every point in the domain. In practice, the domain space An overall illustration of our method, summarizing is digitized into a uniform array of cells. In each cell the its different components, is shown in Figure 1, which tensor can be decomposed into elementary components In(sparse)ns which express different aspects of the information cap- tured. Encode The resulting dense tensor map is then decom- posed. Surface, curve, and junction features are then Tensor tokens obtained by extracting, with subvoxel precision, local (sparse) extrema of the corresponding saliency values along a direction. The final output is the aggregate of the out- Tensor Voting puts for each of the components. Tensor tokens 3.1 Representation (refined) Our goal is to extract geometric structures such as regions, curves, surfaces, and the intersection between Tensor Voting them. These are well defined mathematical entities with the well known properties. Saliency tensor field (dense) Points can simply be represented by their coordi- nates. A first order local description of a curve is given Decompose by the point coordinates, and its associated tangent or normal. A second order description would also include the associated curvature. Surfmap saliency Curvmapaliency Junctmap saliency A first order local description of a surface patch is given by the point coordinates, and its associated nor- Feature Extraction mal. A second order description would also include the principal curvatures and their directions. Here, however, we do not know in advance what surfaces curves points type of entity (point, curve, surface) a token may belong to. Furthermore, because features may overlap, a loca- Layered Description tion may actually correspond to both a point and a curve, or even to a surface, a curve and a point at the Figure 1. Overall approach same time . An example of such a case occurs at the intersection between 2 smooth curves: the intersection should be a point, that is, it represents a junction with no associated tangent information, but also represents 2 curve elements, as the curves do not stop there. λ 3 3.2 Second order symmetric tensor λ 2 To capture first order differential geometry infor- λ 1 mation and its singularities,second order symmetric tensor is used. It captures both the orientation informa- tion and its confidence, or saliency. Such a tensor can be visualized as an ellipse in 2-D, or an ellipsoid in 3-D. Œ2 Intuitively, the shape of the tensor defines the type of information captured (point, curve, or surface element), Œ3 and the associated size represents the saliency. For Œ 1 λ -λ λ 3 λ -λ 1 2 instance, in 2-D, a very salient curve element is repre- 2 3 sented by a thin ellipse, whose major axis represents the estimated tangent direction, and whose length reflects Œ the saliency of the estimation. Œ 3 1 To express a second order symmetric tensor S, stick tensor graphically depicted by an ellipse in 2-D, and an ellip- soid in 3-D, we choose to take the associated quadratic ball tensor plate tensor form, and to diagonalize it, leading to a representation based on the eigenvalues 1, 2 ,3 and the eigenvectors Figure 3. tensor decomposition Œ1, Œ2 Œ3 In a more compact form, S = λ1 1 1λ ŒŒ2 2 2Œ ,3 3 3e λ ≥λ ≥λ 10 a2e 3he the decomposition of a general saliency tensor into the eigenvalues, and Œ , Œ, Œ are the eigenvectors corre- stick, plate, and ball components. 1 2 3 sponding to λ1, 2 ,3λ respectively. Note that, becauseS At each location, the estimate of each of the 3 types is a second order symmetric tensor, the eigenvalues are of information, and their associated saliency, is there- real and positive (or zero), and the eigenvectors form anfore captured as follows: orthonormal basis. point-ness: no orientation, saliency 3s λ The eigenvectors correspond to the principal direc- curve-ness: orientation i3 Œ, saliency i2 λ3-λ tions of the ellipsoid/ellipse and the eigenvalues encode the size and shape of the ellipsoid/ellipse, as shown in surface-ness: orientation i1 Œ, saliency 1s 2 -λ In 2-D, there is no surface-ness, and curve-ness is Figure 2. S is a linear combination of outer product ten- sors and, therefore a tensor. expressed by Œ1for the tangent orientation, and 1y 2 -λ for curve saliency. λ3 λ1 Œ2 We have now explained the information encoded in Œ a second order symmetric tensor, which consists of three λ Œ 1 independent elements, and a measure of feature 2 3 saliency. Figure 2. An ellipsoid and its eigensystem 3.4 Tensor communication 3.3 Tensor decomposition We now turn to our communication and computa- All the tensors used so far are somewhat singular, tion scheme, which allows a site to exchange informa- in the sense that the associated ellipsoid has the shape of either a stick, a plate, or a ball. As a result of the voting with its neighbors, and infer new information. procedure (to be described later), we produarbitrary Token refinement and dense extrapolation.The input second-order, symmetric tensors, therefore we need to tokens are first encoded as perfect tensors. In 3-D, a handle any generic tensor. point token is encoded as a 3-D ball. A point associated The spectrum theorem [8] states that any tensor can be expressed as a linear combination of these 3 cases, with tangent direction is encoded as a 3-D plate. A point T associated with normal direction is encoded as 3-D i.e., S=(λ1-2) 1 1(λ2- λ3)(1 1+Œ2 2+λ 3ŒŒ1 1Œ 2 2 )3 3T, where ŒŒ T stick. These initial tensors communicate with each other T T 1 1 in order to describes a stick, 1 1Œ +2 2) describes a plate, and (Œ1 1+ŒŒ2 2Œ )3 3scribes a ball. Figure 3 shows • derive the most preferred orientation information, or refine the initial orientation if given, for each of the input tokens (token refinement) , and • extrapolate the above inferred information at every y location in the domain for the purpose of coherent feature extraction ( dense extrapolation ). tangent normal In the token refinement case, each token collects all orientation the tensor values cast at its location by all the other tokens. The resulting tensor value is the tensor sum of all the tensor votes cast at the token location. In the dense extrapolation case, each token is first decomposed into its independent elements, and broad- casts this information, using an appropriate tensor field, intensity-coded strength (saliency) which also defines a neighborhood, and puts the corre- sponding tensor value at every location. In practice, val- ues are entered at discrete cell locations. The tensor value at any given location in the domain is the tensor sum of all the tensor votes cast at this location. A some- what subtle difference occurs in this second case, as ball tensors define isolated features, which therefore do not need to propagate their information, and thus do not vote. 2 views of the field strength of 2-D stick field While they may be implemented differently for efficiency, these 2 operations are equivalent to a voting Figure 4. The fundamental 2-D stick field process, and can be regarded as tensor convolution with To obtain the plate kernel, we rotate the 3-D stick voting kernels, and produces tensors in turn. This tensor kernel obtained above about the z-axis, integrating the convolution is in fact a simple alignment process, contributions by tensor addition. Therefore, the 3-D involving a translation followed by a rotation. plate kernel thus obtained describes a plate with normal Therefore, in 3-D, it remains to describe the deriva- orientation is T . Again, the other orientations can tion of the 3-D voting kernels. 001 be obtained readily by a simple rotation. Denote the 3-D All voting kernels can be derived from the funda- stick kernel by V, the plate is derived from V by: mental 2-D stick kernel , by rotation and integration . π Figure 4 shows this 2-D stick kernel. In [18], we explain ∫0Vdγ α =0, β 0 (2) in mathematical terms that this voting kernel in fact encodes the proximity and the smoothness constraints. To obtain the ball kernel, we rotate the 3-D stick Note that a direction can be defined by either the tan- kernel about the y-axis and z-axis (the order does not gent vector, or the normal vector, which are orthogonal matter), integrating the contributions by tensor addition: to each other. We can therefore define two equivalent π fundamental fields, depending whether we assign a tan- ∫ Vdβdγ α = 0 (3) gent or normal vector at the receiving site. 0 3.5 Feature Extraction Derivation of the 3-D voting kernels. Denote the fun- In this section, we summarize the marching process damental 2-D stick kernel by V .In 3-D, we need 3 vot- F for feature extraction in 3-D, by giving mathematical ing kernels: the stick, the plate, and the ball voting definitions. Detailed implementation can be found in kernels. The 3-D stick kernel is obtained by revolving [18]. the normal version of VF90 degrees about the z-axis At the end of the voting process, we have produced (denote it by V F). Then, we rotate VFabout the x-axis, a dense tensor map, which is then decomposed in two and integrate the contributions by tensor addition during dense vector maps (three in 3-D). the rotation. The reTulting 3-D stick kernel describes the In 3-D, each voxel of these maps has a 2-tuple orientation 100 in world coordinates. The other ori- ˆ ˆ (), v , where s is a scalar indicating strength and v is entations can be obtained by a simple rotation. Hence, a unit vector indicating direction: w.l.o.g., the 3-D stick kernel is defined mathematically as • Surface mapan(SdMap): , = λ 1– λ 2 v = e ˆ1 indicates the normal direction. π ∫0V’Fdα β =0, γ 0 (1) • Curve mapa(nCdMap): , =s λ – λ v = e ˆ 2 3 3 where α,β,γ are angles of rotation about the x, y, z axis indicates the tangent direction. ˆ respectively. • Junction map (JMap): s = λ3 , and v is arbitrary. These maps are dense vector fields, which are then input consisting of an ellipse made up of curve elements used as input to our extremal algorithms in order to gen- (tangents), and a curve made up of dots. Another exam- erate features such as junctions, curves, and surfaces. ple of a “pretzel” is shown in Figure 6. The definition of point extremality, corresponding to junctions, is straightforward: it is a local maximum of the scalar value s. Surface extremality. Let each voxel in the given vector ˆ field hold a 2-tuple (s), nhere insicates field strength and n denotes the normal direction. The vector field is continuous and dense, i.e., (s), n is defined for input every point p in 3-D space. A point is on an extremal surface if its strength s is locally extremal along the direction of the normal, i.e., ˆ ds ∕dn = 0 . This is a necessary condition for 3-D sur- face extremality. A sufficient condition, which is used in implementation, is defined in terms of zero crossings ˆ along the line defined by n .We therefore define the Tra- dense curve dienvtectoras, g g = ∂s ∕ ∂x,, s ∕ ∂y ∂sz∕ ∂ saliency map dense orientation project g onto ˆ, i.e., q = n ⋅ g . Thus, an extremal uncertainty map surface is the locus of points with q = 0 . Curve extremality. Each voxel in the given potential ˆ vector field holds a 2-tuple (s), t, where is the field strength and t indicates the tangent direction. Suppose the field is co
More Less
Unlock Document

Only pages 1,2 and half of page 3 are available for preview. Some parts have been intentionally blurred.

Unlock Document
You're Reading a Preview

Unlock to view full version

Unlock Document

Log In


OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit