To appear in the ACM SIGGRAPH conference proceedings
Automatic Rigging and Animation of 3D Characters
Ilya Baran Jovan Popovic ´
Computer Science and Artiﬁcial Intelligence Laboratory
Massachusetts Institute of Technology
Animating an articulated 3D character currently requires manual
rigging to specify its internal skeletal structure and to deﬁne how
the input motion deforms its surface. We present a method for ani-
mating characters automatically. Given a static character mesh and
a generic skeleton, our method adapts the skeleton to the character
and attaches it to the surface, allowing skeletal motion data to an-
imate the character. Because a single skeleton can be used with a
wide range of characters, our method, in conjunction with a library
of motions for a few skeletons, enables a user-friendly animation
system for novices and children. Our prototype implementation,
called Pinocchio, typically takes under a minute to rig a character
on a modern midrange PC.
CR Categories: I.3.7 [Computer Graphics]: Three-Dimensional
Graphics and Realism—Animation
Keywords: Animation, Deformations, Geometric Modeling
1 Introduction Figure 1: The automatic rigging method presented in this paper
allowed us to implement an easy-to-use animation system, which
Modeling in 3D is becoming much easier than before. User-friendly
we called Pinocchio. In this example, the triangle mesh of a jolly
systems such as Teddy [Igarashi et al. 1999] and Cosmic Blobs cartoon character is brought to life by embedding a skeleton inside
(http://www.cosmicblobs.com/) have made the creation it and applying a walking motion to the initially static shape.
of 3D characters accessible to novices and children. Bringing these
static shapes to life, however, is still not easy. In a conventional
skeletal animation package, the user must rig the character man-
function. To make the optimization problem computationally feasi-
ually. This requires placing the skeleton joints inside the charac- ble, we ﬁrst embed the skeleton into a discretization of the charac-
ter and specifying which parts of the surface are attached to which ter’s interior and then reﬁne this embedding using continuous op-
bone. The tedium of this process makes simple character animation
more difﬁcult than it could be. timization. The skin attachment is computed by assigning bone
weights based on the proximity of the embedded bones smoothed
We envision a system that eliminates this tedium to make an- by a diffusion equilibrium equation over the character’s surface.
imation more accessible for children, educators, researchers, and Our design decisions relied on three criteria, which we also used
other non-expert animators. For example, a child should be able to
model a unicorn, click the “Quadruped Gallop” button, and watch to evaluate our system:
the unicorn start galloping. To support this functionality, we need • Generality: A single skeleton is applicable to a wide vari-
a method (as shown in Figure 1) that takes a character, a skeleton, ety of characters: for example, our method can use a generic
and a motion of that skeleton as input, and outputs the moving char- biped skeleton to rig an anatomically correct human model,
acter. The missing portion is the rigging: motion transfer has been an anthropomorphic robot, and even something that has very
addressed in prior work [Gleicher 2001]. little resemblance to a human.
Our algorithm consists of two main steps: skeleton embedding
and skin attachment. Skeleton embedding computes the joint posi- • Quality: The resulting animation quality is comparable to
tions of the skeleton inside the character by minimizing a penalty that of modern video games.
• Performance: The automatic rigging usually takes under one
∗e-mail: [email protected]
†e-mail: [email protected]
minute on an everyday PC.
A key design challenge is constructing a penalty function that pe-
nalizes undesirable embeddings and generalizes well to new char-
acters. For this, we designed a maximum-margin supervised learn-
ing method to combine a set of hand-constructed penalty functions.
To ensure an honest evaluation and avoid overﬁtting, we tested our
algorithm on 16 characters that we did not see or use during devel-
opment. Our algorithm computed a good rig for all but 3 of these
characters. For each of the remaining cases, one joint placement
hint corrected the problem.
We simplify the problem by making the following assumptions.
The character mesh must be the boundary of a connected volume.
1 To appear in the ACM SIGGRAPH conference proceedings
The character must be given in approximately the same orientation the legs of the character in Figure 1 would be too short if a skeleton
and pose as the skeleton. Lastly, the character must be proportioned extraction algorithm were used.
roughly like the given skeleton.
We introduce several new techniques to solve the automatic rig- Template Fitting Animating user-provided data by ﬁtting a tem-
ging problem: plate has been successful in cases when the model is fairly similar
to the template. Most of the work has been focused on human mod-
• A maximum-margin method for learning the weights of a lin- els, making use of human anatomy speciﬁcs, e.g. [Moccozet et al.
ear combination of penalty functions based on examples, as 2004]. For segmenting and animating simple 3D models of charac-
an alternative to hand-tuning (Section 3.3).
ters and inanimate objects, Anderson et al.  ﬁt voxel-based
∗ volumetric templates to the data.
• An A -like heuristic to accelerate the search for an optimal
skeleton embedding over an exponential search space (Sec- Skinning Almost any system for mesh deformation (whether sur-
face based [Lipman et al. 2005; Yu et al. 2004] or volume based
• Use of Laplace’s diffusion equation to generate weights for at- [Zhou et al. 2005]) can be adapted for skeleton-based deformation.
taching mesh vertices to the skeleton using linear blend skin- Teichmann and Teller  propose a spring-based method. Un-
fortunately, at present, these methods are unsuitable for real-time
ning (Section 4). This method could also be useful in existing
3D packages. animation of even moderate size meshes. Because of its simplicity
and efﬁciency (and simple GPU implementation), and despite its
Our prototype system, called Pinocchio, rigs the given charac- quality shortcomings, linear blend skinning (LBS), also known as
ter using our algorithm. It then transfers a motion to the character skeleton subspace deformation, remains the most popular method
using online motion retargetting [Choi and Ko 2000] to eliminate used in practice.
footskate by constraining the feet trajectories of the character to the Most real-time skinning work, e.g. [Kry et al. 2002; Wang et al.
feet trajectories of the given motion. 2007], has focused on improving on LBS by inferring the char-
acter articulation from multiple example meshes. However, such
2 Related Work
techniques are unsuitable for our problem because we only have a
Character Animation Most prior research in character anima- single mesh. Instead, we must infer articulation by using the given
tion, especially in 3D, has focused on professional animators; very skeleton as an encoding of the likely modes of deformation, not just
little work is targeted at novice users. Recent exceptions include as an animation control structure.
Motion Doodles [Thorne et al. 2004] as well as the work of Igarashi
To our knowledge, the problem of ﬁnding bone weights for LBS
et al. on spatial keyframing [2005b] and as-rigid-as-possible shape from a single mesh and a skeleton has not been sufﬁciently ad-
manipulation [2005a]. These approaches focus on simplifying an- dressed in the literature. Previous methods are either mesh reso-
imation control, rather than simplifying the deﬁnition of the artic- lution dependent [Katz and Tal 2003] or the weights do not vary
ulation of the character. In particular, a spatial keyframing system
smoothly along the surface [Wade 2000], causing artifacts on high-
expects an articulated character as input, and as-rigid-as-possible resolution meshes. Some commercial packages use proprietary
shape manipulation, besides being 2D, relies on the constraints to methods to assign default weights. For example, Autodesk Maya 7
provide articulation information. The Motion Doodles system has assigns weights based solely on the vertex proximity to the bone,
the ability to infer the articulation of a 2D character, but their ap- ignoring the mesh structure, which results in serious artifacts when
proach relies on very strong assumptions about how the character the mesh intersects the Voronoi diagram faces between logically
is presented. distant bones.
Skeleton Extraction Although most skeleton-based prior work 3 Skeleton Embedding
on automatic rigging focused on skeleton extraction, for our prob-
Skeleton embedding resizes and positions the given skeleton to ﬁt
lem, we advocate skeleton embedding. A few approaches to the inside the character. This can be formulated as an optimization
skeleton extraction problem are representative. Teichmann and
Teller  extract a skeleton by simplifying the Voronoi skele- problem: “compute the joint positions such that the resulting skele-
ton with a small amount of user assistance. Liu et al.  use ton ﬁts inside the character as nicely as possible and looks like the
repulsive force ﬁelds to ﬁnd a skeleton. In their paper, Katz and Tal given skeleton as much as possible.” For a skeleton with s joints (by
“joints,” we mean vertices of the skeleton tree, including leaves),
 describe a surface partitioning algorithm and suggest skele-
ton extraction as an application. The technique in Wade  is this is a 3s-dimensional problem with a complicated objective func-
most similar to our own: like us, they approximate the medial sur- tion. Solving such a problem directly using continuous optimiza-
face by ﬁnding discontinuities in the distance ﬁeld, but they use it tion is infeasible.
Pinocchio therefore discretizes the problem by constructing a
to construct a skeleton tree.
For the purpose of automatically animating a character, however, graph whose vertices represent potential joint positions and whose
skeleton embedding is much more suitable than extraction. For ex- edges are potential bone segments. This is challenging because the
ample, the user may have motion data for a quadruped skeleton, graph must have few vertices and edges, and yet capture all poten-
tial bone paths within the character. The graph is constructed by
but for a complicated quadruped character, the extracted skeleton
is likely to have a different topology. The anatomically appropriate packing spheres centered on the approximate medial surface into
skeleton generation by Wade  ameliorates this problem by the character and by connecting sphere centers with graph edges.
techniques such as identifying appendages and ﬁtting appendage Pinocchio then ﬁnds the optimal embedding of the skeleton into
templates, but the overall topology of the resulting skeleton may this graph with respect to a discrete penalty function. It uses the
still vary. For example, for the character in Figure 1, ears may discrete solution as a starting point for continuous optimization.
be mistaken for arms. Another advantage of embedding over ex- To help with optimization, the given skeleton can have a lit-
traction is that the given skeleton provides information about the tle extra information in the form of joint attributes: for example,
expected structure of the character, which may be difﬁcult to ob- joints that should be approximately symmetric should be marked as
tain from just the geometry. So although we could use an existing such; also some joints can be marked as “feet,” indicating that they
skeleton extraction algorithm and embed our skeleton into the ex- should be placed near the bottom of the character. We describe the
tracted one, the results would likely be undesirable. For example, attributes Pinocchio uses in a supplemental document[Baran and
2 To appear in the ACM SIGGRAPH conference proceedings
Figure 2: Approximate Medial Sur- Figure 3: Packed Spheres Figure 4: Constructed Graph Figure 5: The original and
face reduced quadruped skeleton
Popovic 2007a]. These attributes are speciﬁc to the skeleton but are spheres. In fact, this step typically takes less than 1% of the time of
independent of the character shape and do not reduce the generality the entire algorithm.
of the skeletons.
Graph Construction The ﬁnal discretization step constructs the
3.1 Discretization edges of the graph by connecting some pairs of sphere centers (Fig-
ure 4). Pinocchio adds an edge between two sphere centers if the
Before any other computation, Pinocchio rescales the character to spheres intersect. We would also like to add edges between spheres
ﬁt inside an axis-aligned unit cube. As a result, all of the tolerances
are relative to the size of the character. that do not intersect if that edge is well inside the surface and if
that edge is “essential.” For example, the neck and left shoulder
Distance Field To approximate the medial surface and to facili- spheres of the character in Figure 3 are disjoint, but there should
still be an edge between them. The precise condition Pinocchio
tate other computations, Pinocchio computes a trilinearly interpo-
lated adaptively sampled signed distance ﬁeld on an octree [Frisken uses is that the distance from any point of the edge to the surface
et al. 2000]. It constructs a kd-tree to evaluate the exact signed dis- must be at least half of the radius of the smaller sphere, and the
closest sphere centers to the midpoint of the edge must be the edge
tance to the surface from an arbitrary point. It then constructs the endpoints. The latter condition is equivalent to the requirement that
distance ﬁeld from the top down, starting with a single octree cell
and splitting a cell until the exact distance is within a tolerance τ of additional edges must be in the Gabriel graph of the sphere centers
the interpolated distance. We found that τ = 0.003 provides a good (see e.g. [Jaromczyk and Toussaint 1992]). While other conditions
can be formulated, we found that the Gabriel graph provides a good
compromise between accuracy and efﬁciency for our purposes. Be- balance between sparsity and connectedness.
cause only negative distances (i.e. from points inside the character)
are important, Pinocchio does not split cells that are guaranteed not Pinocchio precomputes the shortest paths between all pairs of
to intersect the character’s interior. vertices in this graph to speed up penalty function evaluation.
3.2 Reduced Skeleton
Approximate Medial Surface Pinocchio uses the adaptive dis-
tance ﬁeld to compute a sample of points approximately on the The discretization stage constructs a geometric graph G = (V,E)
medial surface (Figure 2). The medial surface is the set of C - 1 into which Pinocchio needs to embed the given skeleton in an op-
timal way. The skeleton is given as a rooted tree on s joints. To
discontinuities of the distance ﬁeld. Within a single cell1of our oc-
tree, the interpolated distance ﬁeld is guaranteed to be C , so it is reduce the degrees of freedom, for the discrete embedding, Pinoc-
necessary to look at only the cell boundaries. Pinocchio therefore chio works with a reduced skeleton, in which all bone chains have
traverses the octree and for each cell, looks at a grid (of spacing been merged (all degree two joints, such as knees, eliminated), as
shown in Figure 5. The reduced skeleton thus has only r joints.
τ) of points on each face of the cell. It then computes the gradient
vectors for the cells adjacent to each grid point—if the angle be- This works because once Pinocchio knows where the endpoints of
tween two of them is 120 or greater, it adds the point to the medial a bone chain are in V , it can compute the intermediate joints by
surface sample. We impose the 120 condition because we do not taking the shortest path between the endpoints and splitting it in ac-
cordance with the proportions of the unreduced skeleton. For the
want the “noisy” parts of the medial surface—we want the points
where skeleton joints are likely to lie. For the same reason, Pinoc- humanoid skeleton we use, for example, s = 18, but r = 7; with-
chio ﬁlters out the sampled points that are too close to the character out a reduced skeleton, the optimization problem would typically
surface (within 2τ). Wade discusses a similar condition in Chap- be intractable.
Therefore, the discrete skeleton embedding problem is to ﬁnd
ter 4 of his thesis .
the embedding of the reduced skeleton into G, represented by an r-
Sphere Packing To pick out the graph vertices from the medial tuple v = (v 1...,v )rof vertices in V , which minimizes a penalty
surface, Pinocchio packs spheres into the character as follows: it function f(v) that is designed to penalize differences in the embed-
sorts the medial surface points by their distance to the surface (those ded skeleton from the given skeleton.
that are farthest from the surface are ﬁrst). Then it processes these
points in order and if a point is outside all previously added spheres, 3.3 Discrete Penalty Function
adds the sphere centered at that point whose radius is the distance The discrete penalty function has great impact on the generality and
to the surface. In other words, the largest spheres are added ﬁrst, quality of the results. A good embedding should have the propor-
and no sphere contains the center of another sphere (Figure 3). tions, bone orientations, and size similar to the given skeleton. The
Although the procedure described above takes O(nb) time in the paths representing the bone chains should be disjoint, if possible.
worst case (where n is the number of points, and b is the ﬁnal num- Joints of the skeleton may be marked as “feet,” in which case they
ber of spheres inserted), worst case behavior is rarely seen because should be close to the bottom of the character. Designing a penalty
most points are processed while there is a small number of large function that satisﬁes all of these requirements simultaneously is
3 To appear in the ACM SIGGRAPH conference proceedings
Good embeddings (p ’s):
difﬁcult. Instead we found it easier to design penalties indepen- i
dently and then rely on learning a proper weighting for a global Bad embeddings (q ’i):
penalty that combines each term. b2
The Setup We represent the penalty function f as a linear com-
bination of k “basis” penalty functions: f(v) = i=1 γi i(v).
Pinocchio uses k = 9 basis penalty functions constructed by hand. Best Γ
They penalize short bones, improper orientation between joints,
length differences in bones marked symmetric, bone chains shar-
ing vertices, feet away from the bottom, zero-length bone chains, Margin
improper orientation of bones, degree-one joints not embedded
at extreme vertices, and joints far along bone-chains but close in
the graph [Baran and Popovic 2007a]. We determine the weights 0 b1
Γ = (γ 1...,γ )ksemi-automatically via a new maximum margin
approach inspired by support vector machines. Figure 6: Illustration of optimization margin: marked skeleton em-
Suppose that for a single character, we have several example em-
beddings in the space of their penaltiesi(b ’s)
beddings, each marked “good” or “bad”. The basis penalty func-
tions assign a feature vector b(v) = (b 1v),...,b (k)) to each
example embedding v. Let p ,.1.,p m be the k-dimensional fea-
Learning Procedure The problem of ﬁnding the optimal Γ does
ture vectors of the good embeddings and let 1 ,...,q ne the fea-
ture vectors of the bad embeddings. not appear to be convex. However, an approximately optimal Γ
is acceptable, and the search space dimension is sufﬁciently low
Maximum Margin To provide context for our approach, we re- (9 in our case) that it is feasible to use a continuous optimization
view the relevant ideas from the theory of support vector ma- method. We use the Nelder-Mead method [Nelder and Mead 1965]
chines. See Burges  for a much more complete tuto- starting from random Γ’s. We start with a cube [0,1] , pick random
rial. If our goal were to automatically classify new embeddings normalized Γ’s, and run Nelder-Mead from each of them. We then
into “good” and “bad” ones, we could use a support vector ma- take the best Γ, use a slightly smaller cube around it, and repeat.
chine to learn a maximum margin linear classiﬁer. In its sim- To create our training set of embeddings, we pick a training set
plest form, a support vector machine ﬁnds the hyperplane that of characters, manually choose Γ, and use it to construct skeleton
separates the p is from the q is and is as far away from them embeddings of the characters. For every character with a bad em-
as possible. More precisely, if Γ is a k-dimensional vector with bedding, we manually tweak Γ until a good embedding is produced.
▯Γ▯ = 1` the classiﬁcation margin of th´ best hyperplane normal to We then ﬁnd the maximum margin Γ as described above and use
Γ is 1 min i=1 Γ q −imax i=1 Γ p i . Recalling that the total this new Γ to construct new skeleton embeddings. We manually
2 T classify the embeddings that we have not previously seen, augment
penalty of an embedding v is Γ b(v), we can think of the maxi-
mum margin Γ as the one that best distinguishes between the best our training set with them, and repeat the process. If Γ eventually
“bad” embedding and the worst “good” embedding in the training stops changing, as happened on our training set, we use the found
set. Γ. It is also possible that a positive margin Γ cannot be found, in-
In our case, however, we do not need to classify embeddings, dicating that the chosen basis functions are probably inadequate for
ﬁnding good embeddings for all characters in the training set.
but rather ﬁnd a Γ such that the embedding with the lowest penalty
f(v) = Γ b(v) is likely to be good. To this end, we want Γ to For training, we used 62 different characters (Cosmic Blobs
distinguish between the best “bad” embedding and the best “good” models, free models from the web, scanned models, and Teddy
models), and Γ was stable with about 400 embeddings. The weights
embedding, as illustrated in Figure 6. We therefore wish to max-
imize the optimization margin (subject to ▯Γ▯ = 1), which we we learned resulted in good embeddings for all of the characters in
deﬁne as: our training set; we could not accomplish this by manually tuning
the weights. Examining the optimization results and the extremal
n T m T
i=1 Γ q − iin i=1 . i embeddings also helped us design better basis penalty functions.
Although this process of ﬁnding the weights is labor-intensive,
Because we have different characters in our training set, and be- it only needs to be done once. According to our tests, if the basis
cause the embedding quality is not necessarily comparable between functions are carefully chosen, the overall penalty function gener-
alizes well to both new characters and new skeletons. Therefore,
different characters, we ﬁnd the Γ that maximizes the minimum