Odin compilation speed tips

2026-01-19

Compilation speed is really important. If you hit build and it takes so long it pulls you out of the flow and you start scrolling twitter, that’s a problem.

So what can we do to compile Odin programs faster? Here I’ll try to explain some ways of troubleshooting Odin compile times and ways to improve them.

TL;DR

Use faster linkers
There is a -show-debug-messages flag to dump things like LOC/s and total parsed lines/tokens/packages, plus a lot of debug info.
-o:minimal and -microarch:native is essentially free

Small programs

Compilation speed of large projects is probably the most important thing when measuring compiler performance.

However in Odin, I often write many small utility “scripts” instead of using shell/batch/python. It’s very convenient, I get access to the core libraries running is simple as odin run my_tool.

So let’s consider the following program. It’s just a simple hello world, but all small tools need to print to console so it’s at least a little representative of the real world.

1import "core:fmt"
2
3main :: proc() {
4    fmt.printfln("Hello %s %i!", "World", 123)
5}

To get the compilation speed, we can run odin build with the -show-timings. On my machine, result is something like this:

Total Time                         -   270.198 ms - 100.00%
initialization                     -     7.618 ms -   2.81%
parse files                        -    12.968 ms -   4.79%
type check                         -    66.492 ms -  24.60%
LLVM API Code Gen (   54 modules ) -    82.512 ms -  30.53%
msvc-link                          -   100.602 ms -  37.23%

This is not too bad! But interestingly, compiling some of my projects (with tens of thousands lines of code, and a lot more in dependencies) don’t compile that much slower.

So can we do better?

Linker

In the example above, almost 40% of the time is spent in the linker. So let’s try a different one.

Using -linker:radlink gets it down to about 15% on my machine. If you’re on linux you can also try mold.

In general, a faster linker can also improve compile times when the binary gets large (e.g. large dependencies, or big #load files).

More Timings

There is a -show-more-timings which lists timings for all the internal compilation stages.

In most cases it doesn’t tell you that much (it’s mostly valuable for compiler developers), and most time is spent in LLVM Object Generation stage.

However, you still might want to look at the results if you need to debug why something is compiling slower than expected - sometimes you hit a slow path.

For example, I ran into a case parsing a file with gigantic embedded arrays of integer constants generated by sokol shader compiler was adding ~3 seconds to my compile time.
Always profile, especially when things go wrong.

Internal Debug Messages

There is a hidden flag called -show-debug-messages, and while it’s not intended for regular users, it’s extremely useful. It dumps a LOT of good statistics to stderr useful for debugging and profiling.

So let’s measure Hello World again:

odin build hello -show-timings -linker:radlink -show-more-timings -show-debug-messages

When you scroll a bit past the LOC/s sections (which are also extremely useful!), you’ll see something like this:

Peak Memory Size: 271.000 MiB

Total Lines     - 80727
Total Tokens    - 395951
Total Files     - 162
Total Packages  - 26
Total File Size - 2572637

This is an overview of all the things the compiler had to parse to compile your program.

But 26 packages and 80k lines of code seems like a lot for a Hello World. All of that was pulled in by core:fmt and it’s dependencies. Let’s see if we can do something about it.

Dependencies

A good way to diagnose the dependencies is to add a -keep-temp-files flag to the build command, and look at what .ll IR files are generated. The list of files tells you which packages got actually included in the build. And the file size roughly reflects the compile time.

`base:runtime` only

We can do something like the following to write data directly to stderr. The base:runtime is always included by default as it contains builtin implementations and other required features, so there’s not much better we could do.

1import "base:runtime"
2
3main :: proc() {
4    runtime.print_string("Hello World 123!")
5}

The results look a lot nicer:

Peak Memory Size: 76.703 MiB

Total Lines     - 9211
Total Tokens    - 59321
Total Files     - 31
Total Packages  - 3
Total File Size - 290122

Total Time                         -    58.230 ms - 100.00%
initialization                     -     7.189 ms -  12.34%
parse files                        -     2.853 ms -   4.90%
type check                         -     5.993 ms -  10.29%
LLVM API Code Gen (   31 modules ) -    14.681 ms -  25.21%
rad-link                           -    27.510 ms -  47.24%

Only 9k LOC, 3 packages, and 60 milliseconds to compile the entire program! That’s really nice.

But there’s a problem: we lost all the nice core:fmt’s formatting functionality. This is a big issue, because it’s very cumbersome to do all the formatting by hand.

μ-fmt experiment

This lead me to write an experimental ufmt (micro-fmt) package. It does only the bare minimum, but it covers 90% of my own core:fmt use-cases.

1import "ufmt"
2
3main :: proc() {
4    ufmt.printfln("Hello %s %i!", "World", 123)
5}

The entire implementation is <200 lines of code and depends only on base:runtime. It also compiles in 60 milliseconds and includes only 9k LOC.

There is only tprintf, printf and printfln. Only supported format qualifiers are the following:

%s: string and cstring, no cstring16
%i: all integer types
%x: all integer types in hexadecimal, always zero padded
%f: 16, 32 and 64 bit floats, with basic NaN/Inf detection
%%: to print literal % characters

Here’s the initial ufmt version as a github gist.

Update: I expanded the initial version and now it lives in the source tree of my game engine. I added very lightweight RTTI crawl for printing any values including structs, arrays, etc automatically. It’s not 100% complete, but it’s reasonably usable and I’ll continue to improve it. Here’s the link.

Of course, using this has no effect if you import a package which depends on core:fmt. This is pretty annoying, currently there is no good way to see what exactly is your program importing and why, apart of looking at temp obj files.
On my branch of the compiler I experimented with printing a graphviz graph of all the packages with their includes, but that’s not official. Update: see this PR for the graph generator, and a userspace solution using core:odin/parser.

LLVM

In general the slowest part of the compilaton pipeline is the LLVM backend. Even with all optimizations disabled it takes quite a while.

Odin already codegen’s each package independently to utilize all the CPU threads (LLVM cannot be multithreaded with better granularity). And as a general rule, LLVM scales very poorly with the codegen amount. For example -disable-assert and -no-bounds-check can help by a very tiny bit because it’s slightly less things to generate (so I don’t think it’s worth it for debug builds).

On the other hand, -debug has a HUGE impact. Generating the PDB is no small feat, and it can make compile times take 20-80% longer in my experience.

Optimization

Obviously, it’s not a good idea to compile with -o:speed/size/aggressive if you want good compilation speed. But I still want debug builds that run fast!

However -o:minimal is almost completely free, all it does is enable inlining of #force_inline procedures. It’s the default when NOT compiling with -debug.

Similarly, using -microarch:native adds no overhead. I’m pretty sure it just triggers a slightly different code path in the LLVM lowering passes, but could possibly yield some runtime perf benefits.

Parapoly Variants

Another issue which can appear in larger codebases is a duplicate explosion from parametric-polymorphism variants.

For example if you have code like this:

1some_large_procedure :: proc($N: int, ...) {
2    // ... lot of code
3}

The compiler will literally need to generate the code for each N separately. So while the parsing cost is paid once, the type checker and the backend has to work extra hard.

The overhead is totally negligible even with hundreds of variants when the procedure body is small. But once the codegen amount gets larger you start running into all kinds of bottlenecks, one of them being LLVM (as always).

What to do about this

There isn’t any kind of diagnostics in the Odin compiler itself to check if this is happening. What you can do however is compile your code and check the symbols. For example like this on Windows:

odin build my_program -build-mode:lib
dumpbin /SYMBOLS my_program.lib > symbols.txt

Or with objdump on Linux. You’re looking for a long list of repeated procedure symbols, something like: my_package::my_procedure:proc(my_param:$$value, ...).

To help mitigate this issue, you could try putting all the common code into a non-parapoly procedure and calling it indirectly. Or, if you don’t actually need the parapoly specialization and only use it for “optimization”, just get rid of it.

Conclusion

GingerBill says the Odin compiler can still be a lot faster. And while there’s plenty of room for optimization, it’s not too bad in the current state.

For a simple hello world, we got nearly 5x faster compile times by compiling with the right flags and being careful about dependencies. Of course, it’s different in real, bigger projects but I hope I shed some light on the various ways you could debug these issues, and some general rules to follow.

I recomment the following command as a reasonable default for compiling lightweight tools:

odin build my_tool -linker:radlink -microarch:native

Thank you for reading!

Also big thanks to all my Patrons! <3

Voodoo51
p1xelHer0
Coedo
Filip Aničić
Moritz Falk
Lion Schitik
Ondřej Jamriška
Alastair Marshall

Edit 2026-01-22: Added a section about parapoly, and more -o:minimal info.

Edit 2026-01-23: A small section about checking dependency IR

Edit 2026-02-10: Linked graph generator and new ufmt version