Right so it should be much easier w/ access to every neuron and activation. But the general approach is an experimental one where you try to use your existing knowledge about physics and biology to discern what is activating different structures (and neurons) in the brain. I agree w/ the approach of trying to assign some functionality to individual 'neurons', but I don't think that using GPT4 to do so is the most appealing way to go about that, considering GPT4 is the structure we are interested in decoding in the first place.
On the other hand, I find it plausible that it's fundamentally impossible to assign some functionality to individual 'neurons' due to the following argument:
1. Let's assume that for a system calculating a specific function, there is a NN configuration (weights) so that at some fully connected NN layer there is a well-defined functionality for specific individual neurons - #1 represents A, #2 represents B, #3 represents C etc.
2. The exact same system outcome can be represented with infinitely many other weight combinations which effectively result in a linear transformation (i.e. every possible linear transformation) of the data vector at this layer, e.g. where #1 represents 0.1A + 0.3B + 0.6C, #2 represents 0.5B+0.5C, and #3 represents 0.4B+0.6C - in which case the functionality A (or B, or C) is not represented by any individual neurons;
3. When the system is trained, it's simply not likely that we just happen to get the best-case configuration where the theoretically separable functionality is actually separated among individual 'neurons'.
Biological minds do get this separation because each connection has a metabolic cost; but the way we train our models (both older perceptron-like layers, and modern transfomer/attention ones) do allow linking everything to everything, so the natural outcome is that functionality simply does not get cleanly split out in individual 'neurons' and each 'neuron' tends to represent some mix of multiple functionalities.
Your last idea, that these models’ neurons are all connected in some way, makes me somewhat sceptical of this research by OpenAI. And that their technique of analysis may need to be more fractal or expansive to include groups of neurons, moving all the way up to the entire model.