Author here, thanks for your perspective. Here some thoughts:
> approach of separating the simulation and presentation layers isn't all that uncommon
I agree that some level of separation is is not that uncommon, but games usually depend on things from their respective engine, especially on things like datatypes (e.g. Vector3) or math libraries. The reason I mention that our game is unique in this way is that its non-rendering code does not depend on any Unity types or DLLs. And I think that is quite uncommon, especially for a game made in Unity.
> Most games don't ship on the mono backend, but instead on il2cpp
I think this really depends. If we take absolute numbers, roughly 20% of Unity games on Steam use IL2CPP [1]. Of course many simple games won't be using it so the sample is skewed is we want to measure "how many players play games with IL2CPP tech". But there are still many and higher perf of managed code would certainly have an impact.
We don't use IL2CPP because we use many features that are not compatible with it. For example DLC and mods loading at runtime via DLLs, reflection for custom serialization, things like [FieldOffset] for efficient struct packing and for GPU communication, etc.
Also, having managed code makes the game "hackabe". Some modders use IL injection to be able to hook to places where our APIs don't allow. This is good and bad, but so far this allowed modders to progress faster than we expected so it's a net positive.
> In modern Unity, if you want to achieve performance, you'd be better off taking the approach of utilizing the burst compiler and HPC#
Yeah, and I really wish we would not need to do that. Burst and HPC# are messy and add a lot of unnecessary complexity and artificial limitations.
The thing is, if Mono and .NET were both equally "slow", then sure, let's do some HPC# tricks to get high performance, but it is not! Modern .NET is fast, but Unity devs cannot take advantage of it, which is frustrating.
By the way, the final trace with parallel workers was just C#'s workers threads and thread pool.
> Profiling the editor is always a fools errand
Maybe, but we (devs) spend 99% of our time in the editor. And perf gains from editor usually translate to the Release build with very similar percentage gains (I know this is generally not true, but in my experience it is). We have done many significant optimizations before and measurements from the editor were always useful indicator.
What is not very useful is Unity's profiler, especially with "deep profile" enabled. It adds constant cost per method, highly exaggerating cost of small methods. So we have our own tracing system that does not do this.
> I've seen a lot of mention around GC through this comment section, and professional Unity projects tend to go out of their way to minimize these at runtime
Yes, minimizing allocations is key, but there are many cases where they are hard to avoid. Things like strings processing for UI generates a lot of garbage every frame. And there are APIs that simply don't have an allocation-free options. CoreCLR would allow to further cut down on allocations and have better APIs available.
Just the fact that the current GC is non-moving means that the memory consumption goes up over time due to fragmentation. We have had numerous reports of "memory" leaks where players report that after periodic load/quit-to-menu loops, memory consumption goes up over time.
Even if we got fast CoreCLR C# code execution, these issues would prevail, so improved CG would be the next on the list.
>We don't use IL2CPP because we use many features that are not compatible with it. For example DLC and mods loading at runtime via DLLs, reflection for custom serialization, things like [FieldOffset] for efficient struct packing and for GPU communication, etc.
FieldOffset is supported by IL2CPP at compile time [0]. You can also install new DLLs and force the player to restart if you want downloadable mod support.
It's true that you can't do reflection for serialization, but there are better, more performant alternatives for that use case, in my experience.
> You can also install new DLLs and force the player to restart if you want downloadable mod support.
I am not aware of an easy way to load (managed) mods as DLLs to IL2CPP-compiled game. I am thinking about `Assembly.LoadFrom("Mod.dll")`.
Can you elaborate how this is done?
> there are better, more performant alternatives for that use case, in my experience.
We actually use reflection to emit optimal code for generic serializers that avoid boxing and increase performance.
There may be alternatives, we explored things like FlatBuffers and their variants, but nothing came close to our system in terms of ease of use, versioning support, and performance.
If you have some suggestions, I'd be interested to see what options are out there for C#.
> FieldOffset is supported by IL2CPP at compile time
You are right, I miss-remembered this one, you cannot get it via reflection, but it works.
>I am not aware of an easy way to load (managed) mods as DLLs to IL2CPP-compiled game. I am thinking about `Assembly.LoadFrom("Mod.dll")`.
Ah, I was thinking native DLLs (which is what we're using on a project I'm working on). I think you're right that it's impossible for an IL2CPP-built player to interoperate with a managed (Mono) DLL.
>If you have some suggestions [re: serialization], I'd be interested to see what options are out there for C#.
We wrote a custom, garbage-free JSON serializer/deserializer that uses a fluent API style. We also explored a custom codegen solution (similar to FlatBuffers or protobuf) but abandoned it because the expected perf (and ergonomic) benefits would have been minor. The trickiest part with Unity codegen is generating code that creates little to no garbage.
Re serialization: We have custom binary serialization that essentially dumps the game state into a binary stream. No allocations, no copies, no conversions. Our saves can be big, >100 MB uncompressed, so there is no room for waste.
The big advantage is that it reads data directly from game's classes, so there is no boilerplate needed, no prorobufs, no schema. And it supports versioning, adding or removing members mostly without limitations.
I think it's a cool system, maybe I should write a blog post about it :)
What I agree on is that if we had modern .NET available we'd get a free 2-3x improvement, it would definitely be great. BUT having said that, if you're into performance but unwilling to use the tools available then that's on you.
From the article it seems that you're using some form of threading to create things, but you don't really specify which and/or how.
The default C# implementations are usually quite poor performance wise, so if you used for example the default thread pool I can definitively say that I've achieved a 3x speedup over that by using my own thread pool implementation which would yield about the same 30s -> 12s reduction.
Burst threading/scheduling in general is also a lot better than the standard one, in general if I feed it a logic heavy method (so no vectorization) then I can beat it by a bit, but not close to the 3x of the normal thread pool.
But then if your generation is number heavy (vs logic) then having used Burst you could probably drop that calculation time down to 2-3 seconds (in the same as if you used Vector<256> numerics).
Finally you touch on GC, that's definitely a problem. The Mono variant has been upgraded by them over time, but C# remains C# which was never meant for gaming. Even if we had access to the modern one there would still be issues with it. As with all the other C# libraries etc., they never considered gaming a target where what we want is extremely fast access/latency with no hiccups. C# in the business world doesn't really care if it loses 16ms (or 160ms) here and there due to garbage, it's usually not a problem there.
Coding in Unity means having to go over every instance of allocation outside of startup and eliminating them, you mention API's that still need to allocate which I've never run into myself. Again modern isn't going to simply make those go away.
Sure, we could use Burst to speed up some strategic parts, but that would not help with the core of the game.
To give some context, things are very complex in our game, we have fully dynamic terrain with terrain physics (land-slides), advanced path-finding of hundreds of vehicles (each entity has its own width and height clearance), trains, conveyors and pipes carrying tens or even hundreds of thousands of individual products, machines, rockets, ships, automated logistics, etc. There is no one thing that could be bursted to get 3x gain. At this point, we'd have to rewrite the entire game in C++.
So what's the reason we use C#? Productivity, ease of debugging and testing, and resilience to bugs (e.g. null dereference won't kill the program). Messing with C++ or even burst would cost us more time and to be honest, the game would possibly not even exist at that point.
Could you share some details about your custom thread pool that got 3x speedup? What was the speedup from? It is highly unlikely that a custom thread pool would have any significant impact on the benchmark in our case. As you can see from Figure 3, threaded tasks run for about 25% of the total time and even with Mono, all tasks are reasonably well balanced between threads. Threads utilization is surely over 90% (there is always slight inefficiency towards the end as threads are finishing up, but that's 100's of ms). An "oracle" thread pool could speed tings up by 10% of 25%, so that is not it.
Vectorization could help too but majority of the code is not easily vectorizable. It's all kinds of workloads, loading data, deserialization, initialization of entities, map generation, precomputation of various things. I highly doubt that automatic vectorization from code generated by IL2CPP would bring more than 20% speedup here. The speedup from burst would mostly come from elimination of inefficient code generated by Mono's JIT, not from vectorization.
For now, we are accepting the Mono tax to be more productive. But I am hoping that Unity will deliver on the CoreCLR dream. In the meantime, my post was meant raise awareness and stir up some discussion, like this one, which is great. I've read lots of interesting thoughts in this comments section.
>Sure, we could use Burst to speed up some strategic parts... the game would possibly not even exist at that point.
Yeah, the thing with Burst is that its a lot easier to work with if you start with it than having to replace/upgrade code later, especially if you're not familiar with it. A big issue is usually that you create structs with data and they're referencing other structs etc., all those need to be untangled to really make use of Burst.
I myself am also a big C# fan, it is a lot easier than using C. Unity has a lot of issues but there's a reason its so widely adopted and used. (I myself am currently working on a Unity C# tool that I believe will speed up code development significantly).
Your game does sound as if its a VERY ripe target for Burst usage based on the elements that you describe, but the real question should be if you need it. For example if you're already running at 60 fps on whatever your mid target hardware is at whatever max + N% load/size for a game instance then you don't need it. But if you're only hitting 40fps and design-wise want to increase e.g. your map size by 2x then it might be something to look into. Also if you look at e.g. Factorio, they spend a LOT of time optimizing systems, but of course you first need to launch the game (which is and should be the priority).
If you have for example 25 systems (e.g. pathfinding, trains, pipes, etc.) and they're evenly balanced then as you say then you won't increase your game speed by 2x by just converting one of those. BUT if for example your pipes are being processed in 4ms per frame, so you instead adopt other strategies like only processing them every Nth frame or doing M pipes per frame; at that point using Burst to just get that 4ms down to 0.5ms might be a really worthwhile target to make your game play better. The same goes for all your systems where the upgrade will have a cumulative effect.
I highly suggest learning just the basics of Burst in your spare time and trying it out on something basic to get the feel of it. As with all code/libraries it'll unfortunately take some time to figure out how to effectively use it.
Roughly speaking:
- You don't have to have SOA data, but it helps. At the start just convert methods over 1 to 1.
- You have to convert most C# container types to Burst ones, for example in struct Vehicle { Wheel[] wheels } you need to change Wheel[] over to NativeArray<Wheel>, and the Wheel struct itself also need to not use complex types etc.
Other types such as NativeSpan are also very useful, instead of storing the wheels just use a ref Span to them instead.
- After you have basics going you can try out SOA along with more math/less logic so that the code can be vectorized, once you see that big speedup for certain types of code it's hard to go back.
>Could you share some details about your custom thread pool that got 3x speedup? What was the speedup from? It is highly unlikely that a custom thread pool would have any significant impact on the benchmark in our case. As you can see from Figure 3, threaded tasks run for about 25% of the total time and even with Mono, all tasks are reasonably well balanced between threads. Threads utilization is surely over 90% (there is always slight inefficiency towards the end as threads are finishing up, but that's 100's of ms). An "oracle" thread pool could speed tings up by 10% of 25%, so that is not it.
My thread pool itself is pretty standard, it spins up some heavy threads and uses ManualResetEvent to trigger them. Its advantage lies in pre-registering simple Action (with/without parameters) calls to set methods that'll be called when the thread runs; and with more gaming related options for whether we're waiting on thread completion, interleaving them with other threads etc.
A big plus is that it has a self-optimization function, so it'll self-adjust the thread count vs the total time runs take, the total # of amounts of items being processed for the given workload etc. so as to automatically find very good sizes for all those elements to use for the target computer, vs just assuming e.g. 32, 64 or 128 inner elements and launching the max available threads on the PC (as thread pools usually do).
>Vectorization could help too but majority of the code is not easily vectorizable. It's all kinds of workloads, loading data, deserialization, initialization of entities, map generation, precomputation of various things. I highly doubt that automatic vectorization from code generated by IL2CPP would bring more than 20% speedup here. The speedup from burst would mostly come from elimination of inefficient code generated by Mono's JIT, not from vectorization.
Yeah, if its startup/generating code that's mostly bypassed by loading a game then its not worth switching over. Do note that code compiled by Burst will in general be more optimized than Mono just due to better tooling, but in general its not worth moving over just for that due to the amount of work you need to do so. The real wins come in if some generating element that's done often is taking too long, or during gameplay where you can replace elements in the game that take e.g. N milliseconds to calculate every frame and drop those down to 1/10th - 1/100th of the time it used to take.
Per the separation, I think this was far more common both in older unity games, and also professional settings.
For games shipping on mono on steam, that statistic isn't surprising to me given the amount of indie games on there and Unity's prevalence in that environment. My post in general can be read in a professional setting (ie, career game devs). The IL injection is a totally reasonable consideration, but does (currently) lock you out of platforms where AoT is a requirement. You can also support mods/DLC via addressables, and there has been improvement of modding tools for il2cpp, however you're correct it's not nearly as easy.
Going to completely disagree that Burst and HPC# are unnecessary and messy. This is for a few reasons. The restrictions that HPC# enforce essentially are the same you already have if you want to write performant C# code as you just simply use Unity's allocators for your memory up front and then operate on those. Depending on how you do this, you either can eliminate your per frame allocations, or likely eliminate some of the fragmentation you were referring to. Modern .Net is fast, of course, but it's not burst compiled HPC# fast. There are so many things that the compiler and LLVM can do based on those assumptions. Agreed C# strings are always a pain if you actually need to interpolate things at runtime. We always try to avoid these as much as we can, and intern common ones.
The fragmentation you mention on after large operations is (in my experience) indicative of save/load systems, or possibly level init code that do tons of allocations causing that to froth up. That or tons of reflection stuff, which is also usually nono for runtime perf code. The memory profiler used to have a helpful fragmentation view for that, but Unity removed it unfortunately.
> Modern .Net is fast, of course, but it's not burst compiled HPC# fast.
Sure, but the fact that it is competitive with Burst makes it disappointing. If I'm going to go through the trouble of writing code in a different (and not portable!) way then it better be significantly faster. Especially when most code cannot be written as Burst jobs unless you use their (new) ECS.
Yeah to me, Burst+Jobs and Compute shaders are so easy to work with in Unity, I haven't felt the need to squeeze more perf out of C# in a long time.
For modding and OTA stuff I just use a scripting language with good interop (I made OneJS partially for this purpose). No more AOT issue and no more waiting for domain reload, etc.
> Going to completely disagree that Burst and HPC# are unnecessary and messy.
Making a managed code burst-compatible comes with real constraints that go beyond "write performant C#". In Burstable code, you generally can't interact with managed objects/GC-dependent APIs, so the design is pushed towards unmanaged structs in native collections. And this design spreads. The more logic is to be covered by Burst, the more things has to be broken down to native containers of unmanaged structs.
I agree that designing things in data-oriented way is good, but why to force this additional boundary and special types on devs instead of just letting them write it in C#? Writing burstable code can increase complexity, one has to manage memory/lifetimes, data layout, and job-friendly boundaries, copying data between native and managed collections, etc., not just "writing fast C#".
In a complex simulation game, my experience is that there are definitely things that fit the "raw data, batch processing" model, but not all gameplay/simulation logic does. Things like inheritance, events, graphs, AI (the dumb "game" version, no NN), UI, exceptions, etc. And on top of it all, debugging complications.
Wouldn't you be relieved with announcement: "C# is now as fast as Burst, have fun!"? You'd be able to do the same data-oriented design where necessary, but keep all the other tings handy standing by when needed. It's so close, yet, so far!
> The fragmentation you mention
What you say makes sense. I've actually spent a lot of time debugging this and I did find some "leaks" where references to "dead objects" were keeping them from being GC'd. But after sorting all these out, Unity's memory profiler was showing that "Empty Heap Space" was the culprit, that one kept increasing after every iteration. My running theory is that the heap is just more and more fragmented, and some static objects randomly scattered around it are keeping it from being shrunk. ¯\_(ツ)_/¯
From my experience, performance gains seen in Debug builds in Unity/C#/Mono nearly always translate in gains in Release mode. I know that this is not always true, but in this context that's my experience.
Setting up release benchmarks is much more complex and we develop the game in Debug mode, so it is very natural to get the first results there, and if promising, validate them in Release.
Also, since our team works in Debug mode, even gains that only speed things up in Debug mode are valuable for us, but I haven't encountered a case where I would see 20%+ perf gain in Debug mode that would not translate to Release mode.
MaFi Games | Senior ASP.NET (full-stack) | Contract | Remote
Hi, I'm the co-founder of MaFi Games, the indie studio behind Captain of Industry. We're looking for an experienced full-stack ASP.NET engineer to help us grow our community website, including features like a modding database, blog, and forum.
Some reasons you'd enjoy working with us:
- A multicultural, collaborative, and innovative work environment where your voice is heard.
- Fully remote job with flexible working hours and vacation schedule.
- High quality C# code base, code reviews, tests.
- High work satisfaction, work with a talented team on a popular video game with a wonderful community.
MaFi Games | Senior SWE/Game dev | Contract preferred | Remote | C#
Hi, I’m the co-founder of MaFi Games – an indie studio behind the game Captain of Industry. We are a small but passionate team who gave up their jobs at Google and Nvidia to pursue building the best factory simulation game possible, and we need more hands!
We are looking for an experienced software engineer to grow the team and accelerate our progress. We strongly prefer candidates with a background in game development, experience with 3D graphics, and performance optimizations.
Some reasons you’d enjoy working with us:
* A multicultural, collaborative, and innovative work environment where your voice is heard.
* Fully remote job with flexible working hours and vacation schedule.
* Long-term full-time collaboration (not a fixed-term contract).
* High quality C# code base, code reviews, tests.
* High work satisfaction, work on a popular video game with a wonderful community.
Official support for animations, yes! This feels so nostalgic to me, I have written an L-system generator with support for exporting animated PNGs 11 years ago! They were working only in Firefox, and Chrome used to have an extension for them. Too bad I had to take the website down.
Back then, there were no libraries in C# for it, but it's actually quite easy to make APNG from PNGs directly by writing chunks with correct headers, no encoders needed (assuming PNGs are already encoded as input).
While I welcome that there is now PNG with animations, I am less impressed about how Mozilla chose to push for it.
Using PNG's magic numbers and pretend to existing software that it is just normal PNG? That is the same mindset that lead to HTML becoming tag soup. After all, HTML with a <blink> tag is still HTML, no?
I think they could have achieved animated PNG standardization much faster with a more humble and careful approach.
This is awesome! I like the idea of abstracting the factory building with a code-like structure. I wonder if supplemental 2D image (mini-map style) as an input to the policy would help with the spatial reasoning?
I work on a similar factory game (Captain of Industry) and I have always wanted an agent that can play the game for testing and balancing reasons. However, pixels-to-mouse-actions RL policy (similar to Deep Mind's StarCraft agent) always seemed like a very hard and inefficient approach. Using code-like API seems so much better! I might try to find some time to port this framework to COI :) Thanks for sharing!
Regarding the 2d image - the issue is that these frontier models don't tend to support supplemental image inputs, and the ones that do aren't sufficiently well trained on (high precision) Factorio visuals to add that much information.
I see, integrating image inputs can be very challenging in this case as the models work with text input. I was not even thinking about the full isometric image, but just some simple 2D map where each pixel can be color-coded based on the entity type. I guess the problem is that these maps would look like nothing the models were trained on, so as you say, it might not provide any value.
The reason I was suggesting this is that I worked in robotics making RL policies, and supplying image data (be it maps, lidar scans, etc.) was a common practice. But our networks were custom made to ingest these data and trained from scratch, which is quite different from this approach.
Indeed I think the trade-off here is the more "pure factorio" types of images we give to the agents, the more likely it is that they've seen it during training (from google etc), however the signal-to-noise ratio is low and hence the current models get confused as the map complexity (amount of entities) and level of detail grows. If we start to create custom images, we can reduce the unneeded noise, but then risk giving something completely OOD to the agent (unless we train a visual encoder) and the performance also tanks
MaFi Games | Senior SWE/game dev | Contract or full-time | $70-110k | Remote | C#
EDIT: This posting is no longer active, thanks for all the applicants for applying!
I’m the co-founder of MaFi Games – an indie studio behind the game Captain of Industry. We are a small but passionate team who gave up their jobs at Google/Nvidia to pursue building the best factory simulation game possible, and we need more hands!
We are looking for an experienced software engineer to grow the team and accelerate our progress. We strongly prefer candidates with a background in game development or with experience in desktop UI, 3D graphics, and performance optimizations.
Some reasons you’d enjoy working with us:
* A multicultural, collaborative, and innovative work environment where your voice is heard.
* Fully remote job with flexible working hours and vacation schedule.
* High quality C# code base, code reviews, tests.
* High work satisfaction, work on a popular video game with a wonderful community.
I'm currently exploring game development and would love to contribute to your team as an intern. Although I have no prior experience in game development, I bring over five years of experience as a FullStack JavaScript Developer, with strong skills in UI/UX design and a passion for learning new technologies.
Could you please let me know if there are any internship opportunities available?
I am very interested in MaFi Games.
MaFi Games | Senior SWE/game dev | Contract or full-time | $70-110k | Remote (World) | C#
EDIT: This posting is no longer active, thanks for all the applicants for applying!
I’m the co-founder of MaFi Games – an indie studio behind the game Captain of Industry. We are a small but passionate team who gave up their jobs at Google/Nvidia to pursue building the best factory simulation game possible, and we need more hands!
We are looking for an experienced software engineer or game developer to grow the team and accelerate our progress. We are also looking for part-time UI/UX designers.
Some reasons you’d enjoy working with us:
* A multicultural, collaborative, and innovative work environment where your voice is heard.
* Fully remote job with flexible working hours and vacation schedule.
* High quality C# code base, code reviews, tests.
* High work satisfaction, work on a popular video game with a wonderful community.
* No bureaucracy, no politics, no perf, 1 regular meeting per week.
Why did this post disappear from the front page? I understand that this is now a heated topic, but I think it is good for people to know about things like this.
Can you share how large is the team responsible for .NET Modernization?
> migrating legacy native C++ code to C# backed by CoreCLR.
Yes please! Surely things like Quaternion.Lerp don't have to be C++ code under CoreCLR.
Feel free to get in touch in case I could be of any help :)