Design Considerations for Mobile Game Engine Development

MEngine is build around a set of general design principles in order to accommodate for the specificities of mobile devices. This post is about the different design choices made during development.

Renderer agnostic design

The engine is designed to run without any active renderer. This design decision comes from the fact that on certain mobile targets, a render context can be destroyed at any time.

To achieve this loose coupling, game objects relying on context-specific OpenGL data have two internal representations:

  • The ‘Core’ data, storing the attributes, states and other core dependencies of our object
  • A ‘Render’ data, aggregated to the core game object and tied to the lifecycle of the current render context, containing all the OpenGL data structures (texture IDs, shader programs, VBOs, etc …)

A ‘Core’ Texture Object has the following structure:
namespace Mintaka {
	namespace Internal {
		namespace Render { struct Texture; }
		namespace Core {
			struct Image;
			struct Texture :
				public Mintaka::Core::ITexture,
				public Mintaka::Internal::Core::ITexture,
				public Mintaka::Internal::Render::IGraphicContextResponder {
			
				// the 'Render' texture object
				Render::Texture * rtx;
				
				// Image objects store the pixel data and states, and manage its lifecycle.
				// They provide an abstraction to the raw data, can release the memory region
				// if needed and recreate it on demand. They can also pin an image in memory
				// if the texture is meant to be updated in realtime.
				Image * ImageData;
				
				// Texture creation flags
				const uint MinF, MagF, WrapMode;
				
				// Constructors / Destructor ...
				// public interface definition (Mintaka::Core::ITexture) ...
				// private interface definition (Mintaka::Internal::Core::ITexture) ...
				// OnContextCreated() / OnContextLost() handlers (Mintaka::Internal::Render::IGraphicContextResponder) ...
			};
		}
	}
}
The ‘Render’ texture object storing the OpenGL states :
namespace Mintaka {
	namespace Internal {
		namespace Core { struct Texture; }
		namespace Render {
			struct Texture {
			private:
				// OpenGL's texture id coming from current context
				const uint _textureId;
			public:
				RenderTexture(const Mintaka::Internal::Core::Texture & texture);
				~RenderTexture();
				void Enable(const uint slot) const;
			};
		}
	}
}

Each instance of a class implementing the IGraphicContextResponder interface is registered in a context manager. When something appends to the current render context, the engine’s ContextManager notifies all the registered internal ‘Core’ objects of the change.

It is important to note that ‘Render’ objects default creation behaviour is lazy initialization, effectively delaying OpenGL resource acquisition to the last moment. However, for large resources, this behaviour can be overridden in the object’s OnContextCreated() implementation.

Strict separation between engine and game clients

The engine does not expose any of its internal data structures to the game clients, nor does it takes references on client-allocated objects. It exclusively relies on proxy objects to manipulate its internal structures. This greatly limits unintended external side effects such as memory corruption and spurious object deletion.

This separation is achieved using a structured API design, where the client can only access a public API exposing interfaces to the underlying game objects. Clients can request the creation of game objects by passing stack allocated descriptor structures to Factory functions.

Let’s take the lighting system as an example. Here is the client interface of a simple light:
namespace Mintaka {
	namespace Core {
		// Descriptor structure storing the relevant informations of a specific light
		struct DirectionalLightDesc {
			const glm::vec3 lightVector;
			const Color color; // Color is a 32bit RGBA color type where each channel is stored on 8 bits
			const float zNear;
			const float zFar;
			inline DirectionalLightDesc(const glm::vec3 & lightVector, const Color color, const float shadowCamZNear, const float shadowCamZFar) :
				lightVector(glm::normalize(lightVector)), color(color), zNear(shadowCamZNear), zFar(shadowCamZFar) {}
		};

		// the Light interface the client can manipulate
		struct ICamera;
		struct IDirectionalLight {
			// Shadowmap manipulation methods
			virtual void UpdateShadowContext(const ICamera & cam) = 0;
			virtual const ICamera & GetShadowmapCam() const = 0;
			
			// color manipulation methods
			virtual const Color GetColor() const = 0;
			virtual void SetColor(const Color color) = 0;
			
			// light direction manipulators
			virtual const glm::vec3 GetDir() const = 0;
			virtual void SetDir(const glm::vec3 & dir) = 0;
		};

		// the Light Manager interface: the engine's core module holds the light manager implementation
		struct ILightManager {
			// Factory methods
			virtual IDirectionalLight & Create(const DirectionalLightDesc & desc) = 0;
			virtual ITargetLight & Create(const TargetLightDesc & desc) = 0;
			virtual IPointLight & Create(const PointLightDesc & desc) = 0;
			// ...
		};

	}
}
Now, if a game client wants to create a directional light source, it will do the following:
IDirectionalLight & sun = engine.LightManager().Create(DirectionalLightDesc(glm::vec3(0.0f, -2.0f, -1.0f), Color(1.0f, 1.0f, 0.8f), 1.0f, 100.0f));
The definition of the internal DirectionalLight object looks like this:
namespace Mintaka {
	namespace Internal {
		namespace Core {
			struct DirectionalLight : public Mintaka::Core::IDirectionalLight {
				Camera lightCam;
				const glm::vec3 lightVector;
				glm::mat4 lightVP;
				Color color;
				inline DirectionalLight(const Mintaka::Core::DirectionalLightDesc & desc) :
					lightVector(desc.lightVector),
					lightCam(0, Mintaka::Core::CameraDesc(glm::vec3(0, 0, 0), glm::vec3(0, 0, 0), glm::vec3(0, 0, 1), 45, desc.zNear, desc.zFar)),
					color(desc.color) {}
				void UpdateShadowContext(const Mintaka::Core::ICamera & cam);
				inline const Camera & GetShadowmapCam() const { return lightCam; }
				
				inline const Color GetColor() const { return color; }
				inline void SetColor(const Color color) { color = color; }
				
				inline const glm::vec3 GetDir() const { return lightVector; }
				inline void SetDir(const glm::vec3 & dir) { lightVector = dir; }
			};
			
			struct LightManager : public Mintaka::Core::ILightManager {
				LightManager(const EngineParameters & params);
				~LightManager();
				DirectionalLight & Create(const Mintaka::Core::DirectionalLightDesc & desc);
				TargetLight & Create(const Mintaka::Core::TargetLightDesc & desc);
				PointLight & Create(const Mintaka::Core::PointLightDesc & desc);
			};
		}
	}
}

Note how the private DirectionalLight::GetShadowmapCam() function returns a private Camera object that will be automatically expressed as an ICamera interface if the call comes from the public IDirectionalLight interface. This design emphasize the use of Covariant Return Type to simplify internal implementations of the public and private interfaces: depending on the calling site, the same implementation will either return an object with public or private access.

The cost of calling thru these inheritance chains are mitigated by the fact that the engine maintains a 1-1 relationship between interfaces and their concrete implementations, enabling the compiler to effectively bypass the call chain using Link Time Optimisation.

One other strength of this approach is that, without any game client modification, the engine can plug a thin abstraction layer behind these proxy objects to forward the method calls to another engine instance, possibly on another network endpoint, for debugging purposes.

Internal Allocation Policies

To keep general heap allocation at a minimum, the engine relies internally on static memory pools. These pools can then be used as single purpose heaps or as dedicated containers for the internal engine objects.

The benefits are multiple:

  • Strong memory locality for objects of the same type
  • Strict separation of memory regions, greatly improving debugging experience
  • Retain the flexibility of allocation/deallocation pattern without the latency cost on most mobile devices
  • Developers get a clear picture of the amount of data allocated and can revise their allocation budget accordingly

The static size of each pool can be controlled from a configuration file. This, coupled with the fact that certain modules have fallback mechanisms if they are nearing their allocated limit, can lead to interesting behaviour control, such as limiting the number of particle effects by forcing the particle backing buffer size to a low value on startup based on the capabilities of the device, at runtime.

These memory pools are exposed using different containers:

  • MemoryPool : bounded, untyped memory pool, can act as a ring buffer.
  • MemoryPool<> : bounded, typed memory pool, can act as a ring buffer.
  • BucketList<> : unbounded, typed memory pool.
  • HandleContainer<> : bounded, typed memory pool with indirect data access using handles, supports vectorized updates.
  • DynamicPackedhandleContainer : bounded, untyped memory pool of variable size objects with underlying packing, metadata tagging and vectorized updates.

Some of these general purpose containers are exposed to the game clients from the Utils namespace. One of the advantages of not leaking any internal data to the game clients is that clients are free to use whatever memory allocation policy they dim best suited to their needs.

Threading support

MEngine threading environment can be described as :

  • 1 Input thread
  • 1 Main/Render thread
  • N worker threads (spawned by a threadpool, N being NBCores – 1)
  • 1 audio dispatch thread
  • N audio threads (N depends on the audio library used)

On multicore devices, MEngine uses a generic Threadpool to dispatch and execute computation-intensive tasks. Tasks are defined as a pair of function + data. Tasks are grouped into TaskGroups, which are scheduled to run in the ThreadPool. The ThreadPool dispatchs the tasks of the different active TaskGroups among the running worker threads. When all the tasks in a taskgroup are complete, the ThreadPool executes a thread completion function in the main thread’s context. This is when the results are synchronized with the global data representation.

Internally, the engine uses asynchronous tasks to load resources and perform physics and particle updates. But small tasks with strong data coupling, such as periodic position updates, GUI animations and camera movements, are executed using a synchronous execution queue in the main thread.

This hybrid threading model is the result of multiple attempts to produce a fully threaded engine on mobile platforms. It combines the simplicity of a general single threaded environment with the flexibility of asynchronous execution. This design choice is strongly influenced by the fact that a general purpose 3D game will be strongly Fill Rate limited on mobile devices.

The threadPool is exposed to the game client.

Renderer : Design Practices

Limited Overdraw

Overdraw is possibly the main bottleneck on various low and mid-level mobile devices, and specifically on tablets. On these devices, standard incremental scene composing results in degraded performance.

MEngine’s rendering pipeline relies on aggregated scene composing :

  1. Parts of game logic are performed on GPU, they either produce direct pixel data or intermediary data states.
  2. The results of these GPU computations are stored in specific-purpose off-screen buffers (SPOSB).
  3. When rendering a frame, the ‘gather’ shader takes the necessary SPOSB as input and use them to produce the final image.

The key point here is that these intermediary buffers are only updated on demand by the engine, based on low frequency timers or user input.

For example, the terrain rendering shader performs raw topography rendering and game data overlaying in the same draw call. The overlaying logic gets its information from a SPOSB containing the following game state for each terrain cell:

  • Fog of War : cell is visible or not
  • is this cell adjascent to a game entity, an if so:
    • what type of entity : building / fleet / other object ?
    • entity team
  • If an entity is selected and this cell is ‘in range’, stores the real distance in time units from this cell to that entity

This however comes with its share of trade-offs:

  • First, the number of possible SPOSB is directly dependent on the device capabilities (some early IPhones are limited to 4 active texture slots). in that case, multiple SPOSB can be interlaced in each color channels of an input texture.
  • Then, there is the ever changing driver-dependent limit of the size of a shader program, witch limits the complexity of the final gather shader.
  • Finally, if the update frequency of a SPOSB is too high, all the benefits of that technique are negated since we impose a greater burden on the often already saturated GPU.

Upscaling

On low-end devices, the engine can run in ‘Upscale Mode’ if the hardware can’t maintain the desired frame-rate. In this mode, a frame is computed at low resolution in a frame buffer, and then mapped to a quad covering the screen. Various sampling methods can be applied to the source texture to enhance final image quality.

Language Bindings

When it comes to cross-platform development, the choice of a development language should be made with special attention. In the case of MEngine, C++ was chosen for its generalized support on the target architectures. Even if the project was started in 2013, the standard specification chosen was C++03 due to a lack of support for newer versions in one of the cross compiling tool chains. The use of Templates are restricted to very specific internal parts of the code and are forbidden in Interfaces and APIs.

MEngine exposes its client API through 3 wrappers :

  • C# : Used by the various editor tools of the content integration pipeline
  • Java : Main binding on Android devices
  • ObjectiveC : For IOS & MacOSX

These wrappers obey a very strict data argument passing rule : all API function arguments are primitive types. This is possible because the engine doesn’t leak internal structures and does not reference client objects. Wrappers simply redefine the client API Interfaces in their target language, and handle the lifecycle of these objects.

Comments are closed