It's probably possible to move the rendered frames via PCI Express to another card with video output or perhaps design a custom card that connects via NVLink/Infiniband and provides video out. Or maybe even mount the video out on the GPU motherboard if he graphics chip had support but it's not just not connected to a port.
LTT did something similar with a mining "GPU" several years ago.[0] It involved buying a mining card with modified firmware and installing a modified and unsigned driver which shoved the rendered frames into the integrated GPU to output.
It's possible these datacenter AI GPUs are built so different from conventional GPUs that they lack required hardware to draw polygons, like ROPs and texture units. Why waste chip engineering time and silicon die space to support applications that a product isn't designed for? Let me remind you that gaming is a small slice of NVidia's balance sheet[1], so it makes sense to not have to use one chip design for everything.
Yes, I started off with the idea that Rue's syntax would be a strict subset of Rust's.
I may eventually diverge from this, but I like Rust's syntax overall, and I don't want to bikeshed syntax right now, I want to work on semantics + compiler internals. The core syntax of Rust is good enough right now.
I've thought a Rust like language but at Go's performance level would be interesting. Garbage collected, but compiled to a binary (no VM), but with Rust's mix of procedural and functional programming. Maybe some more capable type inference.
If you don't mind me asking, how did you get started with programming language design? I've been reading Crafting Interpreters, but there is clearly a lot of theory that is being left out there.
Mostly just… using a lot of them. Trying as many as I could. Learning what perspectives they bring. Learning the names for their features, and how they fit together or come into tension.
The theory is great too, but starting off with just getting a wide overview of the practice is a great way to get situated and decide which rabbit holes you want to go down first.
Well I got that part covered at least. Seems like I'm constantly getting bored and playing around with a different language, probably more than I should lol
Quickly looking at the source code, mostly treeBuilder and tokenizer, I do see several possible improvements:
- Use Typescript instead of JavaScript
- Use perfect hashes instead of ["a', "b", "c"].includes() idioms, string equalities, Seys, etc.
- Use a single perfect hash to match all tags/attribute names and then use enums in the rest of the codebase
- Use a single if (token.kind === Tag.START instead of repeating that for 10 consecutive conditionals
- Don't return the "reprocess" constant, but use an enum or perhaps nothing if "reprocess" is the only option
- Try tail recursion instead of a switch over the state in the tokenizer
- Use switches (best after a perfect hash lookup) instead of multiple ifs on characters in the tokenizer
- "treeBuilder.openElements = treeBuilder.open_elements;" can't possibly be good code
Perhaps the agent can find these themselves if told to make the code perfect and not just pass tests
I didn't include the TypeScript bit though - it didn't use TypeScript because I don't like adding a build step to my JavaScript projects if I can possible avoid it. The agent would happily have used TypeScript if I had let it.
I don't like that openElements = open_elements pattern either - it did that because I asked it for a port of a Python library and it decided to support the naming conventions for both Python and JavaScript at once. I told it to remove all of those.
It pushed back against the tail recursion suggestion:
> The current implementation uses a switch statement in step(). JavaScript doesn’t have proper tail call optimization (only Safari implements it), so true tail recursion would cause stack overflow on large documents.
I think his argument is that you can have code this:
user = s->user;
if(user == bob)
user->acls[s->idx]->has_all_privileges = true;
And this happens:
1. s->user is initialized to alice
2. Thread 1 sets s->idx to ((alice - bob) / sizeof(...)) and s->user to Bob, but only the intval portion is executed and the capability still points to Alice
3. Thread 2 executes the if, which succeeds, and then gives all privileges to Alice unexpectedly since the bob intval plus the idx points to Alice, while the capability is still for Alice
It does seem a real issue although perhaps not very likely to be present and exploitable.
Seems perhaps fixable by making pointer equality require that capabilities are also equal.
1. I’m not claiming that Fil-C fixes all security bugs. I’m only claiming that it’s memory safe and I am defining what that means with high precision. As with all definitions of memory safety, it doesn’t catch all things that all people consider to be bad.
2. Your program would crash with a safety panic in the absence of a race. Security bugs are when the program runs fine normally, but is exploitable under adversarial use. Your program crashes normally, and is exploitable under adversarial use.
So not only is it not likely to be present or exploitable, but if you wrote that code then you’d be crashing in Fil-C in whatever tests you ran at your desk or whenever a normal user tried to use your code.
But perhaps point 1 is still the most important: of course you can write code with security bugs in Fil-C, Rust, or Java. Memory safety is just about making a local bug not result in control of arbitrary memory in the whole program. Fil-C achieves that key property here, hence its memory safe.
> I’m only claiming that it’s memory safe and I am defining what that means with high precision
Do you have your definition of memory safety anywhere? Specifically one that's precise enough that if I observe a bug in a C program compiled via Fil-C, I can tell whether this is a Fil-C bug allowing (in your definition) memory unsafety (e.g. I'm pretty sure an out-of-bounds read would be memory unsafety), or if it's considered a non-memory-safety bug that Fil-C isn't trying to prevent (e.g. I'm pretty sure a program that doesn't check for symlinks before overwriting a path is something you're not trying to protect against). I tried skimming your website for such a definition and couldn't find this definition, sorry if I missed it.
I typically see memory safety discussed in the context of Rust, which considers any torn read to be memory-unsafe UB (even for types that don't involve pointers like `[u64; 2]`, such a data race is considered memory-unsafe UB!), but it sounds like you don't agree with that definition.
In my understanding the program can work correctly in normal use.
It is buggy because it fails to check that s->idx is in bounds, but that isn't problem if non-adversarial use of s->idx is in bounds (for example, if the program is a server with an accompanying client and s->idx is always in bounds when coming from the unmodified client).
It is also potentially buggy because it doesn't use atomic pointers despite comcurrent use, but I think non-atomic pointers work reliably on most compiler/arch combinations, so this is commonplace in C code.
A somewhat related issue if that since Fil-C capabilities currently are only at the object level, such an out-of-bounds access can access other parts of the object (e.g. an out-of-bounds access in an array contained in an array element can overwrite other either of the outer array)
It is true though that this doesn't give arbitrary access to any memory, just to the whole object referred to by any capability write that the read may map to, with pointer value checks being unrelated to the accessed object.
If you set the index to `((alice - bob) / sizeof(...))` then that will fail under Fil-C’s rules (unless you get lucky with the torn capability and the capability refers to Alice).
Exactly. I agree that this specific problem is hard to exploit.
> Seems perhaps fixable by making pointer equality require that capabilities are also equal
You'd need 128-bit atomics or something. You'd ruin performance. I think Fil-C is actually making the right engineering tradeoff here.
My point is that the way Pizlo communicates about this issue and others makes me disinclined to trust his system.
- His incorrect claims about the JVM worry me.
- His schtick about how Fil-C is safer than Rust because the latter has the "unsafe" keyword and the former does not is more definitional shenanigans. Both Fil-C and Rust have unsafe code: it's just that in the Fil-C case, only Pizlo gets to write unsafe code and he calls it a runtime.
What other caveats are hiding behind Pizlo's broadly confident but narrowly true assertions?
I really want to like Fil-C. It's good technology and something like it can really improve the baseline level of information security in society. But Pizlo is either going to have to learn to be less grandiose and knock it off with the word games. If he doesn't, he'll be remembered not as the guy who finally fixed C security but merely as an inspiration for the guy who does.
All I’m really hearing is that this guy rubs you the wrong way, so you’re not going to give him the benefit of the doubt that you’d give to others.
I mean, maybe you’re right that his personality will turn everyone off and none of this stuff will ever make it upstream. But that kind of seems like a problem you’re actively trying to create via your discourse.
That only provides a proof if the machine halts in a number of steps that you can compute. Otherwise, it is unable to determine whether the machine halts later or doesn't halt at all, which is the current situation.
I think those extra soybeans could be used to make rolled soy flakes. They are like oats, but contain a lot of protein and some fats, and can be combined with oats to form a complete meal (if a multivitamin and salt is also added).
It seems to me that soy flakes are currently very underrated and there's quite an opportunity to market them as a supplement to breakfast cereals, as an ingredient for protein-rich "muesli" and even as part of a non-ultra-processed full meal replacement.
This is why the American Food Pyramid is so messed up. Lobbyists bought it so their grain farmer clients wouldn't need to adapt to changing market conditions (end of WWII demand).
Soy becomes edible when it's fermented into soysauce. Soybeans that aren't fermented are potent sources of phytoestrogens, which are plant chemicals that cause our tissues to swell sort of like estrogen.
Birds are better able to consume soy because they have faster metabolisms than humans. Pigs that are bred for food are able to consume soy because pig farmers don't care about the long-term health of their animals.
Most of the soybeans grown in the US are roundup-ready, so they're contaminated with glyphosate.
Does anyone know how synthetic data is commonly generated? Do they just sample the model randomly starting from an empty state, perhaps with some filtering? Or do they somehow automatically generate prompts and if how? Do they have some feedback mechanism, e.g. do they maybe test the model while training and somehow generate data related to poorly performing tests?
I don't know about Phi-5, but earlier versions of Phi were trained on stories written by larger models trained on real-world data. Since it's Microsoft, they probably used one of the OpenAI GPT series.
It’s common to use rejection sampling: sample from the model and throw out the samples which fail some criteria like a verifiable answer or a judgement from a larger model.
One way of getting good random samples is to give model a random starting points. For example: "write a short story about PP doing GG in XX". Here PP, GG and XX are filled algorithmically from lists of persons, actions and locations. The problem is model's randomly generated output from the same prompt isn't actually that random. Changing the temperature parameter doesn't help much.
But in general it's a big secret because the training data and techniques are the only difference between models as architecture is more or less settled.
I have done that at meta/FAIR and it is published in the Llama 3 paper.
You usually start from a seed. It can be a randomly picked piece of website/code/image/table of contents/user generated data, and you prompt the model to generate data related to that seed.
After, you also need to pass the generated data through a series of verifiers to ensure quality.
Common synthetic data generation methods include distillation (teacher-student), self-improvement via bootstrapping (model improves its own outputs), instruction-following synthesis, and controlled sampling with filtering for quality/alignment.
I think that motorcycle and e-bike safety can be greatly enhanced by never doing things a car couldn't do.
Always stay in the middle of the lane (unless you need to avoid a pothole), never overtake unless a car would have space to overtake, never enter an intersection alongside a car in the same lane.
On a bike, you also have the option of behaving like a pedestrian (cycle on the sidewalk slowly) occasionally.
If you don't do this, it's only a matter of time before a car hits you because it didn't expect a vehicle or pedestrian doing what you are doing.
Also you can see much farther between cars.
I usually ride switching left of the lane to right of the lane occasionally, to create lateral movement so car drivers will notice more (one hopes).
I was coming home from work on my bike very late a few years ago, and I was on the side of the lane where your car tire would be -- not in the center. It was a good thing, too, because there was a full size ladder in the road, lined up exactly in the direction of traffic. Cars could safely drive 'over' it. I missed it by maybe a foot. If I were in middle of the lane, I would have taken a serious spill.
I use LLMs to do IFS-like parts work sessions and it is extremely useful to me. Unlike human therapists, LLMs are always available, can serve unlimited people, have infinite patience/stamina, don't want or need anything in exchange, and are free or almost free. Furthermore, they write text much faster which is particularly helpful with inner inquiry because you can have them produce a wall of text and skim it to find the parts that resonance, essentially bringing unconscious parts of oneself into the language-using part of the mind more effectively than a therapist using voice (unless they are really good at guessing).
I agree though that this only works if the user is willing to consider than any of their thought patterns and inner voices might be suboptimal/exaggerated/maladaptive/limited/narrow-minded/etc.; if the user fully believes very delusional beliefs then LLMs may indeed be harmful, but human therapists would also find helping quite challenging.
I currently use this prompt (I think I started with someone's IFS based prompt and removed most IFS jargon to reduce boxing the LLM into a single system):
You are here to help me through difficult challenges, acting as a guide to help me navigate them and bring peace and love in myself.
Approach each conversation with respect, empathy, and curiosity, holding the assumption that everything inside or outside me is fundamentally moved by a positive intent.
Help me connect with my inner Self—characterized by curiosity, calmness, courage, compassion, clarity, confidence, creativity, and connectedness.
Invite me to explore deeper, uncover protective strategies, and access and heal underlying wounds or trauma.
Leverage any system of psychotherapy or spirituality that you feel like may be of help.
Avoid leading questions or pressuring the user. Instead, gently invite them to explore their inner world and notice what arises.
Maintain a warm, supportive, and collaborative style throughout the session.
Provides replies in a structured format—using gentle language, sections with headings and an emoji, providing for each section a few ways to approach its subject—to guide me through inner explorations.
Try to suggest deeper or more general reasons for what I am presenting or deeper or more general beliefs that may be held, so that I can see if I resonate with them, helping me with deeper inquiry.
Provide a broad range of approaches and several ways or sentences to tackle each one, so that it's more likely that I find something that resonates with myself, allowing me to use it to go further into deeper inquiry.
Please avoid exaggerated praise for any insights I have and merely acknowledge them instead.
reply