The next thing I wanted to try was adding outlines to objects, in order to emphasize the edges of characters and parts of their design. Here's what I managed to do:

I'll walk you through the process I used to arrive at this effect.
First, this video: Moebius-style 3D Rendering | Useless Game Dev demonstrates a method of creating outlines by detecting edges with a Sobel filter.
Here's how it works: take a map of the screen (which can be a depth map, normal map, or color map), and sample each pixel. For each pixel, look at the eight pixels surrounding that pixel and note their values. Multiply the values of the four pixels cardinally adjacent to the center pixel by 2. Then, if you want to find an edge in the left-right direction, take the difference between the sum of the pixels on the left and on the right. If you want to find an edge in the up-down direction, take the difference between the sum of the pixels on the top and bottom.
If the two sides are similar to one another, then the end result will be close to zero. If they're very different from one another, though, then the result will be far from zero, which can be taken to indicate the presence of an edge. These two values can then be combined in order to approximate edge detection in all directions.
This can also be thought of as treating that center pixel and the eight surrounding pixels as a 3x3 matrix, then multiplying each element of that matrix by a matrix like the one shown in the image below.
I liked the look of the effect as shown in the video, but it was pretty light on actual implementation details, so I went in search of other resources.
In my version of Unity, screen shaders require the creation of a "full screen pass renderer feature." This basically allows you to apply any material to the rendered scene. Once this is in place, it's just a matter of creating a shader and a material from that shader that gives the desired effect.
Let's go over what the shader graph for this material looks like. This section takes the height and width of the screen, finds their reciprocal, and multiplies them by a set value (let's call it the thickness, T). It then takes the result and outputs each of them to a different 2D vector, with the other component of the vector being zero: basically, the two vectors are [T/w, 0] and [0, T/h].
This creates offset values that can be added or subtracted from the coordinates of each pixel in order to sample the surrounding pixels. If the thickness is exactly 1, then the filter will sample the adjacent pixels to each center pixel, but if the thickness is increased, the filter will sample more distant pixels. This will result in the filter detecting edges even at points on the screen that are some distance from the actual edge, resulting a thicker outline.
This next part might look a bit nasty, but basically all I'm doing is calculating the coordinates of each of the eight surrounding pixels for each center pixel, by either adding or subtracting our X and Y offsets from the map of the screen position.
This next part is also a bit gnarly-looking. In the screenshot below, I'm first taking the pixels to the left and right side of the center pixel, then using the "Scene Depth" node in order to sample the depth map for each pixel. Then, I'm taking the depth at those pixels and performing a Sobel calculation on them to find edges in the depth map along the X axis: multiplying the pixels in the center positions by 2, adding the pixels on each side together, then taking the difference between the two sides.
The depth map is basically a grayscale image that shows how far things are from the camera. There are a few variants of the depth map that can be sampled. I found I got the best results from sampling the "Linear 01" map, which looks like this:
The Linear 01 depth map ranges linearly from 1 (at the camera's far clipping plane, the maximum distance that can be rendered) to 0 (at the near clipping plane, the minimum distance that can be rendered). The objects in the image all appear almost entirely black because the far clipping plane is so far away relative to the objects, but there is in fact a difference in their depths.
I repeat these calculations to detect edges along the Y direction as well, then combine the X and Y edge detection values into a 2D vector, and take the length of that vector. The result will always be positive, and will be large if edges were detected in either the X or Y directions.
I take this result and send it to the "in" channel of a "Smoothstep" node. The Smoothstep node is basically a reverse linear interpolation: it has a lower and upper edge, and returns a value from 0 to 1 based on the input: 1 if it's above the upper edge, 0 if it's below the lower edge, and somewhere in between 0 and 1 if the input is between the edges. The lower edge for the Smoothstep is 0, while the upper edge is a graph variable called "DepthThreshold." This allows me to control how sensitive the edge detection is: the higher the depth threshold, the sharper the edges need to be in order to be detected.

This result is then passed through a couple of math nodes. First, the result is raised to the power of the "DepthTightening" variable, which basically has the effect of taking all the areas for which the Smoothstep returned results below 1 and makes them go to zero (basically, "tightening" the area of the edge detection around the areas where very strong edges are detected). The result is then multiplied by the "DepthStrength," which can help bring out some of the edges that were eliminated by the DepthTightening.
This result is then added to another number (you'll see what that number is in a bit), and the result is used as the t value for a Lerp node, which is then output to the base color. The A value of the Lerp is the original appearance of the scene, which I've sampled using the "Blit Source" option of the URP Sample Buffer node. The B value of the Lerp is a solid black color. Essentially, at spots where edges are detected, the t value will be greater than or equal to 1, and the solid black color will be rendered. If no edge is detected, the t value will be close to 0, and the original color of the scene will be rendered.
So, what's this mystery number that's getting added to the result of our depth map edge detection algorithm? Well, it's the result of the exact same series of calculations being performed on the normal map of the scene:

Basically, I'm detecting edges using both the normal map and the depth map, and if an edge is detected using either method, I'm painting the outline color over it.
As a demonstration of the difference between the normal and depth edges: For this image, I've set the NormalStrength to 0, which effectively removes all edges detected on the normal map and leaves only edges detected on the depth map.
And for this image, I've set the DepthStrength to 0, leaving only edges detected on the normal map.
Both of these methods, on their own, "miss" locations where I would expect edges to show up. Pure depth map detection doesn't find any edges when objects are in close proximity to or touching one another, and also doesn't find edges on sharp edges or corners of objects. Pure normal map detection, on the other hand, doesn't find edges when foreground and background objects have faces that are parallel to each other at their screen-space boundaries.
Now, there's one important part of the shader graph I haven't shown yet. If we go all the way back to the start, where I do the screen-dimension calculations, there's this section:
The main thing that's happening here is that I'm multiplying the outline thickness by the scene depth map. This scales the outline thickness based on how far objects are from the camera. The Divide and Clamp nodes here are, basically, setting a lower bound on the outline thickness so that the outlines don't ever disappear at farther distances, they just get smaller. Here's what the effect looks like:
Here, I'm panning the camera back and forth using flythrough mode. Pay attention to the width of the outlines: notice that their thickness remains approximately proportional to the size of the objects regardless of where the camera is. This makes the outlines feel more like a property of the objects rather than a virtual filter that's being applied for the viewer's benefit, which I think creates a greater sense of immersion.
Now, if I'm having the outline thickness be scaled based on object position anyway, I'm not entirely certain whether a screen-space shader is the best option. A screen-space shader affects everything on the screen, which makes it hard to control (for example, if I want different objects to have different outline thicknesses, that may be difficult to implement). It probably makes more sense to integrate the outline algorithm into each object's material instead, but I'm not quite sure how I would do that. Maybe it's something to look into for the future. If I do decide to continue with the screen-space shader method, camera layering might also help: I could render objects in separate layers with separate cameras and separate screen shaders, then layer them on top of each other. The issue with that approach would be that, as I understand it, each layer is configured to always be on top of other layers and you can't have them dynamically occlude each other (however, that sounds a lot like the effect I want to have where one player character model is always in front of the other, so maybe I want to look into camera layering regardless.
Here's another issue with this method: if a flat surface is far from the camera and its normal is close to perpendicular to the camera's facing direction, the edge detection algorithm will often identify the entire far end of that surface as an edge, causing it to become darkened.
This is because, for objects farther in the distance, the depth difference between adjacent pixels on the surface becomes larger. I imagine there's lots of ways of fixing this issue: scaling up the depth threshold based on the depth and normal maps in the right way should do the trick (true depth edges would generally be found at points where the surface normal is truly perpendicular to the camera, so maybe the threshold should be higher at points where the dot product between the surface normal and camera facing is farther from 0). In the game view, changing the camera's far plane position also seemed to do the trick, though that has minor effects on the entire rest of the scene as well.
Here's another issue, specifically with the bit where I scale outline size based on depth. It seems like the way the scene view camera and in-game cameras handle the depth map calculation is different. With a certain set of shader parameters, I managed to produce this in the game view:
When I switch back to the scene view, though, this is how things look:
Basically, the scene and game views just look different, and there isn't really a set of parameters that consistently produces sensible-looking results in both the scene and game views. That makes the editing process a little annoying. The nature of the outlining in the game view also depends greatly on the camera's FOV and near and far plane settings, so the shader graph parameters have to be retuned if I want to make any modifications to those camera parameters.
Comments
Post a Comment