March 19th, 2007
One of the most important requirements for a game engine is some form of agressive culling. If you take a look at any major 3D fps written within the past 10 years, all use some sort of culling algorithm. Traditionally, techniques such as BSP’s were employed (made famous by the Quake series), however the problem with BSP’s is that they must be computed offline, and impose some limits on your scene geometry. In other words, they are static.
This doesn’t work well in today’s games, where levels are often highly-dynamic, possibly even destructable via physics.
One, more recent approach that has surfaced to deal with dynamic environments is Portal Culling. The idea is fairly simple: divide the world into Zones or Sectors (call them what you will) - these are essentially “rooms”. For each opening into the sector (ie. window, doorway, etc) add a portal. Portals are used to connect different sectors together. Then, at runtime, find all the portals in the room your camera is in that are in the view frustum. For each of these, create a new frustum extending from the portal, and find all the portals visible to that. You do this recursively until you cannot find more portals, and then you check to see which of these portals are also visible to your original view frustum. If they are, then the sectors the portals belong to must be visible, and therefore you can render that sector.
Because portals are arbitrary, their position and the sectors they are linked too can be moved around dynamically. Also, because there is no pre-processing, portal culling works well in dynamic scenes where portals can be added/moved/resized/removed/etc. However, there are some weakness:
Overdraw: Because portal culling uses a bounding box intersection with the view frustum, there is a possibility of a portal being in the view frustum but not actually visible. A good example of this is to imagine a room with two doorways, opposite each other. In the middle of the room is a large block - large enough to obscure the opposing door when standing behind it. Portal culling, because it works only with portal and sector intersections, would have no knowledge of the block, and therefore would consider the opposing yet occluded doorway to be visible, potentially causing a very large overdraw (if there are several visible rooms beyond that one). Now, it’s true that you could pre-process the scene and therefore take into account the large block - however, this would defeat the purpose of dynamic portal culling.
Large Portals: In order for us to create a frustum at each portal, we must be given a location to set our camera. However, the portal itself is not a singular point - it might be a massive 24km wide window. Obviously, in such a case using a 45 degree viewing angle is not sufficient (or whatever view angle it is you are using). While there are ways around this (upping the angle to 180 degrees), most require artist intervention (which prevents dynamic creating of portals) and can lead to over/underdraw.
It is obvious then, that the weakness in portal culling is the fact that we are using simple frustum checks, which do not accurately represent the geometry in our world. What can we do about that? Well, starting with DX9, we have access to hardware occlusion.
Hardware occlusion, in theory, is pretty simple. Render all occluders to a rendersurface, and save the zbuffer info. Next, check the bounding box of the object you want to know whether is occluded or not against the zbuffer, and determine how many pixels would actually be drawn (depth < occluders depth). DX9 implemented a way to do this directly on the hardware, bypassing slow drivers, which makes this test fairly quick if used properly. For our purposes, we can replace all of our frustum checks with simple hardware occlusion queries. What are the advantages? Well:
Occluder Awareness: Because hardware occlusion automatically handles any occluding geometry, the scenario presented earlier with the large box in the room would be handled correctly - the opposite doorwar would be considered not in view and culled away. This is done automatically - because the box is in a visible room, we can safely assume that the box is an occluder and quickly render it to the occluder surface. If the box moves, the occlusion surface is updated accordingly and everything works without problems. This is especially useful for things like doors that can open/shut arbitrarily.
No More View Frustums: Because we are doing away with view frustum checks, we are also removing their inherint weakness (requiring a position). Now, portals of any size can be used without representing a problem. Also, because we are including occluding geometry in the check, there’s no reason to create multiple view frustums and check them against the main view.
Essentially, hardware occlusion-based portal culling gives us virtually NO overdraw, which can make a big difference in complicated scenes.
So, by now you’re wondering, “what’s the catch? Why isn’t everyone using this then?” Well, the answer put simply is that hardware occlusion queries are not free. In fact, they are fairly expensive. First you burn some fillrate rendering to your occlusion surface (though that surface can be very tiny). Second, you must access the zbuffer to check whether an object is occluded or not. Third, because this is all done on the GPU, the CPU and GPU arn’t going to be synced up, and the cpu generally has to wait for the GPU to catch up.
However, these issues can be overcome. First, only do occlusion checks when they are necessary. Second, batch all GPU calls together in order to reduce lag.
In an all out speed-battle, frustum-based portal culling will outperform hardware occlusion-based portal culling, almost always (checking a bounding box against a frustum is pretty damned fast). However, something I havn’t mentioned before is implementation time. In order to be robust, it is a fair bit more difficult to do frustum portal culling. Occlusion portal culling took me roughly a day and a half to implement and tweak. That’s pretty quick for something so vital!
As a note, I don’t actually use DX directly. Sylvain added hardware occlusion to yesterday’s TrueVision 6.5 beta release, and it’s just a matter of a few calls to the engine. However, I believe the calls mirror DX’s fairly closely.