Yep. A linker in the best case would run as fast as cat. Paste the binaries together, done. Disk I/O was a problem back when we used spinning rust but less so now.
What takes time is rewriting stuff as you go. Running the relocation tables to insert addresses into the code is cheap. Deadstripping sections is fairly cheap, deadstripping individual basic blocks within functions takes a lot more analysis and thus time.
Deduplicating constant strings is a good idea but involves streaming them all into a hashtable of some sort. Maybe you want them to share common suffixes, more work.
Deduplicating, deadstripping, rewriting debug information takes time. Debug builds can feature many gigabytes of dwarf to rewrite.
Oddly enough that the linker is scriptable, as in you can give it a program that it interprets, doesn't seem to be a significant cost. Probably because the script in question is quite short and somewhat limited in functionality.
Historically lld was very fast because it didn't bother doing any of the debug munging or other deduplication. Lld ran fast but the output binary was big.
I'm several years out of the linker performance game now so don't know the current status. In particular I don't know where mold or lld are in terms of quality of output vs their own performance.
What takes time is rewriting stuff as you go. Running the relocation tables to insert addresses into the code is cheap. Deadstripping sections is fairly cheap, deadstripping individual basic blocks within functions takes a lot more analysis and thus time.
Deduplicating constant strings is a good idea but involves streaming them all into a hashtable of some sort. Maybe you want them to share common suffixes, more work.
Deduplicating, deadstripping, rewriting debug information takes time. Debug builds can feature many gigabytes of dwarf to rewrite.
Oddly enough that the linker is scriptable, as in you can give it a program that it interprets, doesn't seem to be a significant cost. Probably because the script in question is quite short and somewhat limited in functionality.
Historically lld was very fast because it didn't bother doing any of the debug munging or other deduplication. Lld ran fast but the output binary was big.
I'm several years out of the linker performance game now so don't know the current status. In particular I don't know where mold or lld are in terms of quality of output vs their own performance.