I think this finally answers a question I've had since I started learning programming. All my life I've been told "you can't convert a shared library to a static library". When you try to probe for more details, all you get is some mumbling about how a shared library is already prelinked for mapping into memory so the relocation table is lost. When you press further about the precise information loss that prevents you from reconstructing it, you get the equivalent of "you just can't ok!".
That was never satisfying to me because recompilation (decompiling to source code or IR that can be recompiled for another platform) is an academic field of study [1], so the problem of converting a shared library to a static one should be easier. After all, you have all the assembly and symbol information right there, it seemed like all you needed to do was massage it the right way.
Shared objects are compiled in a way that the exact same code works when loaded at any address, which typically requires base+offset addressing and is less performant, but the same physical pages holding the object can be mapped into different processes at different addresses, ie shared.
Static object addresses the code and and data directly and does not waste a register and requires the addresses in its code to be modified depending on where it is loaded, and the code cannot be shared between different processes.
You can't convert a static library to shared because you can't.
You can convert a shared library to static but there is no point.
My understanding a shared library is closer to an executable than it is to an object file, so it is in that sense that it is "prelinked and ready to map into memory" just as an executable is. (Once loaded dyld still has to do fixups in the GOT, but that's not touching the text segment.)
I mostly agree with your definitions, but you _can_ create a shared library out of a static library assuming the static library was compiled with fPIC, no? I mean I've never tried it but most linkers seem to allow it [1], and given that creating a full executable is similar to created a shared library, I don't see why it wouldn't work.
And if the code wasn't compiled with fPIC, I still feel that it should be possible in theory since the whole point of object files is that they support relocation. Linking a non-PIC object to produce a shared object would just require patches within the text segment at runtime right? Modern linkers probably don't like doing this for security reasons. But otherwise seems isn't some fundamental limitation, just modern OSs and linkers not supporting an obscure use-case with security risk.
When people say that you can't convert a shared library to a static one (or equivalently statically link against a shared library), they usually cite the lack of relocation table info as the reason. Like they have some hierarchy in mind that goes from source code -> object file -> executable/shared library, and so they say it's "obvious" that you can't go the other way. Which is mostly true in terms of the "golden path" and what you'd want to limit yourself to for production, but in a hacker sense you can obviously go the other way with assemblers and decompilers.
So if someone claims that it's impossible to convert a shared library into a static library, there better be good justification for such a hard claim. And on the face of it, the claim doesn't seem watertight because the _stronger_ claim that you can't convert an executable back into recompilable code is falsified by the existence of decompilers that lower down to LLVM IR.
I don't think the answers from others in this thread are particularly satisfying either, at the time I'm writing this.
Thinking about it a little bit, I'd say the major challenge for converting from shared to static library is that shared libraries have resolved relocations from text to other segments.
In order to create a static library, you need to make it so that text and data can be moved relative to each other again.
At a minimum, this requires disassembling the text segment to find PC-relative addresses and recreate relocations for them. That doesn't sound impossible, but there may be pitfalls I'm not thinking of right now that make it hard to find all necessary fix ups.
My interpretation is it’s basically a hack because C programmers can’t make their libs reliable/memory safe, and it throws some logs under attacker’s feet.
There was this software called MagicErmine or statifier that could convert the two.
I don't see what ASLR has to do with the inability to convert a shared to static library though. In fact a shared library must be position independent so if anything it would make the job easier. For going from static (non-PIC) to shared, W^X is probably why modern dynamic linkers don't support patching text segment.
Tools like Statifier to convert an executable + its dylibs into a statically linked executable are sort of an in-between hack that does the equivalent of prelinking. But that's not converting a shared library to a static library, because the result isn't an object file that can be linked.
I guess I never stated the punchline explicitly, but boricj's tool seems to prove that there is in fact no such theoretical limitation in converting from shared -> static library. It's a bit hacky of course, but you can indeed reconstruct relocation tables with decompiler-like analysis to go back from executable -> object file for any function, and this also implies you can go from shared library -> static library.
> I guess I never stated the punchline explicitly, but boricj's tool seems to prove that there is in fact no such theoretical limitation in converting from shared -> static library. It's a bit hacky of course, but you can indeed reconstruct relocation tables with decompiler-like analysis to go back from executable -> object file for any function, and this also implies you can go from shared library -> static library.
As far as the traditional linker goes, sure. This whole shtick relies on the fact that a platform has ABI conventions, so with a naive toolchain any translation unit (and function) compiled in isolation must be interoperable as-is, because the linker is completely oblivious to whatever's going on within an object file section (relocations excluded). Even then, platform-specific considerations might get in the way.
For artifacts that went through inter-procedural and especially link-time optimization, this breaks down. The former will take advantage of the fact that anything goes as long as the visible interface of an object file section is ABI compliant, the latter effectively turns an entire program into one big translation unit. Among other things, functions may exhibit non-standard calling conventions, which wreaks havoc as soon as you start trying to meld delinked bits with freshly built ones, because the optimizer is unlikely to make the same choices in its custom calling conventions.
I have a user who is decompiling a link-time optimized artifact built over 15 years ago. In order to graft newly compiled code, they binary patched the original compiler to basically add support for __usercall, in order to coerce the toolchain into making the same optimization decisions. If you're up against an artifact built with a non-traditional toolchain with optimizations on, delinking alone most likely won't cut it and additional magic will be required.
Thank you for chiming in! LTO wouldn't really matter for "delinking" exported symbols of shared libraries though, would it? The exported functions must necessarily follow platform ABI convention, and so long as those are copied over, things would seem to work fine.
I guess the one catch with shared libraries is that if shared library A itself depends on another dylib B, then calls within A to functions of B would go via PLT indirection. So then creating an object file out of this would involve not just moving code and creating relocation entry but also possibly patching the assembly itself to remove the call indirection.
If you're delinking the entire LTO-optimized shared library as one big blob, then sure. LTO effectively turns a program/library into one huge translation unit, but if you're not cutting across it then the platform ABI mandates the observable boundary.
Static libraries can't be converted to shared because all shared library code on x86 needs to be compiled with the -fPIC (position-independent code) flag.
Shared libraries could be converted to static. You would lose the ability of static libraries to only include part of the library.
You can make a shared library without PIC, however most *nix systems no longer allow that and try to prevent it in various ways.
Windows in 16 and 32bit code defaults to non-PIC code and thus considerable work has been done to ensure system libraries do not overlap in addresses.
What happens when addresses overlap? When runtime linker detects conflicts (or just wants to load a non-PIC library/executable at different address), it utilizes "relocation data" that is shipped with the code which contains essentially information on "where are pointers and how to rewrite them when you move the code"
That was never satisfying to me because recompilation (decompiling to source code or IR that can be recompiled for another platform) is an academic field of study [1], so the problem of converting a shared library to a static one should be easier. After all, you have all the assembly and symbol information right there, it seemed like all you needed to do was massage it the right way.
[1] https://rev.ng/