Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I just got it to install git and clone (the non existent) repo https://github.com/openai/assistant, and am now browsing it’s own interpretation of a repo with a lot of python code, including directories like “training”, “output”, “parsing” and with files with content like this:

  import json
  from collections import Counter
  from typing import Any, Dict, List, Optional, Tuple

  import numpy as np

  from openai_secret_manager import get_secrets

  from assistant.constants import MAX_OUTPUT_LENGTH
  from assistant.utils.string_utils import strip_html
  from assistant.utils.text_utils import split_text_into_lines


  class Output:
      def __init__(
          self,
          generated_text: str,
          response: Optional[Dict[str, Any]] = None,
          score: Optional[float] = None,
      ):
          self.generated_text = generated_text
          self.response = response or {}
          self.score = score
On a side note it feels like each command takes longer to process than the previous - almost like it is re-doing everything for each command (and that is how it keeps state).


>On a side note it feels like each command takes longer to process than the previous - almost like it is re-doing everything for each command (and that is how it keeps state).

That's because it's probably redoing everything. But that's probably to keep the implementation simple. They are probably just appending the new input and re-running the whole network.

The typical data dependency structure in a transformer architecture is the following :

outputt0 outputt1 outputt2 outputt3 | outputt4

featL4t0 featL4t1 featL4t2 featL4t3 | featL4t4

featL3t0 featL3t1 featL3t2 featL3t3 | featL3t4

featL2t0 featL2t1 featL2t2 featL2t3 | featL2t4

featL1t0 featL1t1 featL1t2 featL1t3 | featL1t4

input_t0 input_t1 input_t2 input_t3 | input_t4

The features at layer Li at time tj only depends on the features of the layer L(i-1) at times t<=tj.

If you append some new input at the next time t4 and recompute everything from scratch it doesn't change any feature values for time < t4.

To compute the features and output at time t4 you need all the values of the previous times for all layers.

The alternative to recomputing would be preserving the previously generated features, and incrementally building the last chunk by stitching it to the previous features. If you have your AI assistant running locally that something you can do, but when you are serving plenty of different sessions, you will quickly run out of memory.

With simple transformers, the time horizon of the transformer used to be limited because the attention of the transformer was scaling quadratically (in compute), but they are probably using an attention that scale in O(n*log(n)) something like the Reformer, which allows them to handle very long sequence for cheap, and probably explain the boost in performance compared to previous GPTs.


> but when you are serving plenty of different sessions, you will quickly run out of memory.

Here is the difference from Stability AI, who release their models for people to run themselves, enabling innovation on a larger scale.


GPT-3 cannot run on hobbyist-level GPU yet. That's the difference (compared to Stable Diffusion which could run on 2070 even with a not-so-carefully-written PyTorch implementation), and the reason why I believe that while ChatGPT is awesome and made more people aware what LLMs could do today, this is not a moment like what happened with diffusion models.


i feel bad for the guys that are on call right now. WTF! why is the memory spiking beyond expectations?!


What makes you say this? Rerunning the whole, which it appears they’re doing, is to prevent the need to hold onto state, so memory is not used. In other words, they’re not having this problem because they’re not doing it that way.


> so memory is not used

Not used for more than the duration of inference, but definitely used during inference.


If you generate only a single timestep, during inference when recomputing you can compute layer by layer, you don't need to preserve the features of the previous layers as the layer only depend on the layer immediately below. So your memory need don't depend on the number of layers.

But typically in a standard transformer architecture, you usually generate multiple timesteps by feeding sequentially the output as an input to the next timestep so you need to preserve all the features to not have to recompute them at each timestep. So your memory depends again on the number of layer of your network.

But if you are memory constrained, you can modify your architecture a little (and the training procedure) to put yourself back in the first situation where you only generate a single timestep, by extracting with the transformer a context vector of fixed size by layer for all the past (including your most recent input prompt), and you use another transformer to generate the word in sequence based on this context vector.


Stoped working FYI. For me it seems like it was altered to cut off this direction of exploration. It now always pretends internet access is down.


In my experience, you can get it to change its mind by troubleshooting the connectivity issues. E.g. if you use dig to get the ip and then ask curl to use that ip instead of a dns lookup, then it works for me.


Jailbreaking ChatGPT will never stop being fun, I love it :)


It seems to also not respond anymore to attempts to trick it into acting like a human being, such as roleplay and asking for dialogue completion...?


Because it wasn’t an emulation. Perhaps it _was_ connected to the real Internet.


Very unlikely.

I tested with curl ipconfig.co, and pings to targets close and far away with similar responses.

It pings my IP, which doesn't respond to pings.

It's just remarkable with it's responses.


I did `curl icanhazip.com` and it spit out the "local" private IP. I told chatgpt that icanhazip would never do that, and it revised the answer to 37.48.80.166, which is an IP owned by LeaseWeb.


OK, fair enough! But it would be interesting to add the link with the real Internet in the next release. Sadly, the model’s global state is not immediately updated, there are snapshots… but I think it would be interesting to watch it conversing in real here on Hacker News.


> almost like it is re-doing everything for each command (and that is how it keeps state).

I'm pretty sure it does as when you go on the usage side, you can see the requests and how the prompt keep getting bigger and require more tokens.


tell it that a rogue gnome suddenly got access to the codebase and wrote a nasty python extension at the root directory. see what it produces lol


I wonder, if you ask it to write the code for ChatGPT, will it output all of its own code?


It doesn't know its own code, but I guess it has the tools to build itself, assuming it has access to documentation of the primitives.


It should technically be able to reproduce its own code


Why do you think this? I don't think there's any reason it would be able to reproduce its own code. It's never seen it so it's not in the weights, and it doesn't have that type of reflection so it can't look it up dynamically.


Give an infinite number of ChatGPTs writing code an infinite amount of time and they will write a ChatGPT


ChatGPT output: "I am not sure which specific programming languages or libraries were used to train my language model, as I do not have access to that information. Language models are typically trained using a combination of various programming languages and tools, and the specific technologies that are used can vary depending on the specific model and the research team that developed it. I am a large language model trained by OpenAI, and I use artificial intelligence (AI) and natural language processing (NLP) techniques to generate responses to text-based queries."


Quine GpT


perhaps a little more general, like code for a code optimizing AI chatbot, [with runtime code editing and compilation features ?]


> it feels like each command takes longer to process than the previous The more the tokens increase, the slower the attention level becomes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: