One of the main issues that I'll be trying to solve is how to make the cameras behave like... well, cameramen. When an event occurs, filming the entire thing in a single wide shot would work, but it wouldn't be very cinematic. Instead, the cameras will move and cut to different angles. There are rules of composition and editing that I'll get to in another post, but let's first talk about choosing shots in the first place. I originally had a vague idea of using some sort of finite-state machine, until Joe pointed me in the direction of behavior trees—specifically,
a chapter in Artificial Intelligence for Games, by Ian Millington and John Funge. It seems like behavior trees will work well for my cameras. So thanks, Joe!
Let's get started by looking at this wacky diagram of a basic camera behavior tree:
This tree describes how a single camera might behave. At the top of the tree is a Proceed Loop followed by a Selector—in simpler terms, this means that the camera is in a constant loop of "selecting" which of the two subtrees (left or right) to follow. Conditions, the white rectangles in the tree, test something and then return either true or false. A selector will go through each of its children, and stop once one of them returns "true."
Essentially, the left subtree is behavior for when the camera is not assigned to an event, and the right subtree is behavior for when it is assigned. The Parallel box means that its children execute simultaneously—so in the left subtree, the Assert box will constantly check to make sure that there's no event assignment, and the camera will "wander" as long as that assertion is returning true. The little "wander" triangle represents a Sequence: if you think of Selection as "OR," then think of Sequence as "AND." Whereas a Selector would try children until it finds one that returns true, a Sequencer would go through successful children and stop once it finds one that returns false. In this case, the children of Wander would be different actions that cause the camera to move sort of randomly through the environment.
The right subtree is the interesting one. It starts with another Parallel box, which makes sure that it will execute while the camera is still assigned to an event (once the event ends, we should revert back to wandering around). So what are we executing though? Another Selector, one which decides what kind of situation we're in. Many of the papers that I researched use the term "idiom" to refer to a series of individual shots for a certain purpose; many idioms involve the number of actors in the scene. Thus, a simple way to break this down further would be into behavior idioms for situations with only one actor (when I say "actor," I may be referring to any sort of object, not just a character), situations with two actors, and situations with more than two actors. Obviously there are more idioms than this, but this will be a good start.
This Selector (the one in the right branch of the tree) will be important. I'm going to call it the Cinematographer Node, since it plays the role of cinematographer: deciding what kinds of shots to use based on the current situation.
So let's say there are two actors in the scene, and the Cinematographer goes down the middle branch. As long as the event still involves 2 actors (by the assertion), the camera will enter the Shot / Reverse Shot idiom, which I've expanded at the bottom of the diagram. This idiom, often used in movies during conversations between two people, would involve the following series of shots:
- A wide shot with both characters in frame
- An over-the-shoulder (OTS) shot of character A (meaning a shot taken over character B's shoulder, so we see character A's face)
- The reverse shot (an over-the-shoulder shot of character B)
- A close-up of character A, without character B in the frame
- A close-up of character B, without character A in the frame
- Back to a wide-shot
This cycle will repeat as long as there are still 2 actors involved. If a third actor joins the event, the assertion will fail, and the Cinematographer will go to the Crowd Shot idiom. Similarly, if actors leave until 1 actor remains, the Cinematographer will go to the Follow idiom, which involves filming the lone actor by itself. If each actor is still labeled an event, we may need extra cameras to follow the individual actors, since they can no longer be contained in one frame. Once it's determined that a lone actor is no longer an event (sorry, lone actor, you're not important anymore), the entire right subtree would fail, and we'd go back to wandering. If extra cameras were added, they might just disappear.
One reason that this behavior tree structure will work well is that once it's set up, it will be easy to add more complicated idioms later—the Cinematographer Node will be happy to have more specific situations to test for. Right now I've got a test environment built in Unity, and I'm going to set up some simple, 1-actor events. The first version of the camera will only have a Follow idiom: for testing, this will involve cycling between a wide shot of the object, a moving shot (which would follow the object if it's moving as well), and a close-up of some feature of the object. After that, I can add idioms for dealing with multiple actors in an event.
There's one major thing missing from the behavior tree, however: while this tree will cause the camera to choose the right
types of shots for each situation, it doesn't ensure that the shots are composed
well. Worse, it doesn't prevent the camera from breaking cinematic rules while cutting
between shots. I'll get to these principles of composition and editing in the next post—but basically, the idioms will need to have asserts inside of them, rather than just being a sequence of shots.