More

spelunker · 2026-05-08T02:35:53 1778207753

I have had to repeatedly talk myself out of buying various greenhouses and sheds from Costco. They'd be so USEFUL darnit!

spelunker · 2026-04-29T02:03:20 1777428200

Check out this all of this stuff we can build with a room full of PEs and no rules!

spelunker · 2026-04-17T22:40:39 1776465639

They want my college transcript? From the early 00's? I would like to think I've grown a bit professionally since then.

EricRiese · 2026-04-18T01:09:16 1776474556

It's the government. They care more about creating objective standards so they can't be sued for bias than they do about hiring the best people.

spelunker · 2026-04-14T18:32:36 1776191556

> In the Desktop app, click New task and choose New remote task; choosing New local task instead creates a local Desktop scheduled task, which runs on your machine and is not a routine.

Oh uh... ok then.

spelunker · 2026-03-20T16:26:13 1774023973

Crypto and the Metaverse were solutions in search of a problem. LLMs kind of felt like that until tooling arrived that enabled doing a lot more than copying + pasting chat conversations.

Sure, maybe crypto changed some lives, but an entire industry? I think ALL of software dev is going under a transformation and I think we're past the point of "wait it out" IMO.

Or I'm wrong, but right I'm being paid to develop a new skill professionally. Maybe the skill ends up not being useful - ok, back to writing code the old way then.

spelunker · 2026-03-10T17:50:48 1773165048

Like everything generated by LLMs though, it is built on the shoulders of giants - what will happen to software if no one is creating new programming languages anymore? Does that matter?

Fnoord · 2026-03-10T23:35:45 1773185745

Without proper attribution, it seems more fair to say copyright infringement occurred, on a massive scale if I may add. The burden of proof lies at the owners of the LLM. Which is why, if you do not want a blackbox, you want training data to be properly specified. That ain't happening now because of the skeletons in the closet.

idiotsecant · 2026-03-10T18:28:01 1773167281

I think the only hope is that AGI arises and picks up where humanity left off. Otherwise I think this is the long dark teatime of human engineering of all sorts.

tartoran · 2026-03-10T20:20:00 1773174000

So you’re hoping for a blackbox uninspectable by humans? That to me sounds like a nightmare, a nightmare worse than all the cruft and stupid rules humanity accrued over time. Let’s hope the future tech is inspectable and understandable by humans.

idiotsecant · 2026-03-10T21:54:04 1773179644

I think if we assume that AGI will be a thing the odds of future tech remaining inspectable by humans is pretty unlikely. Would you build a car so that your dog can maintain it?

tartoran · 2026-03-11T04:49:35 1773204575

Fully understandable end to end by any normal human and inspectable enough for human governance are different things. In any sane world, AGI would be built inside a human institutional environment: laws, audits, liability, safety engineering, access controls, operational constraints, etc. We do not build planes so passengers can reconstruct the turbine from scratch, but we still require them to be inspectable by the people responsible for certifying/repairing them. The right standard is not whether an average person can rebuild or fully undestand the whole machine, but whether human institutions can reliably inspect, verify and govern it. If they can’t, then the technology is not mature enough to trust.

lelanthran · 2026-03-11T12:31:29 1773232289

> So you’re hoping for a blackbox uninspectable by humans?

We already have that. He's hoping that the blackbox gets smart enough to understand itself.

spelunker · 2026-03-04T16:51:04 1772643064

This is definitely a "known problem". At my company we call it "promotion-driven development". The promotion guidelines call out that knowing when _not_ to build something is important, but how do you put that in a body of work? "Decided not to build A". Nobody cares.

spelunker · 2026-02-24T19:01:14 1771959674

I've been trying out vibe coding with my 4 year-old, but they quickly lose interest once we start getting into the "weeds" of implementation. Hey kiddo, which CSS library should we use for your web game?

ilaksh · 2026-02-24T21:56:05 1771970165

I think you just need more treats.

spelunker · 2026-02-17T18:00:38 1771351238

This is neat! What kind of steering or context did you provide to the LLMs? Super basic like "You are playing a card game called Magic: The Gathering", or more complex?

GregorStocks · 2026-02-17T18:03:34 1771351414

My general intention is to tell them "you're playing MTG, your goal is to win, here are the tools available to you, follow whatever strategy you want" - I don't want to spoon-feed them strategy, that defeats the purpose of the benchmark.

You can see the current prompt at https://github.com/GregorStocks/mage-bench/blob/master/puppe...:

  "default": "You are a competitive Magic: The Gathering player. Your goal is to WIN the game. Play to maximize your win rate \u2014 make optimal strategic decisions, not flashy or entertaining ones. Think carefully about sequencing, card evaluation, and combat math.\n\nGAME LOOP - follow this exactly:\n1. Call pass_priority - this blocks until you have a decision to make, then returns your choices (response_type, choices, context, etc.)\n2. Read the choices, then call choose_action with your decision\n3. Go back to step 1\n\nCRITICAL RULES:\n- pass_priority returns your choices directly. Read them before calling choose_action.\n- When pass_priority shows playable cards, you should play them before passing. Only pass (answer=false) when you have nothing more you want to play this phase.\n\nUNDERSTANDING pass_priority OUTPUT:\n- All cards listed in response_type=select are confirmed castable with your current mana. The server pre-filters to only show cards you can legally play right now.\n- mana_pool shows your current floating mana (e.g. {\"R\": 2, \"W\": 1}).\n- untapped_lands shows how many untapped lands you control.\n- Cards with [Cast] are spells from your hand. Cards with [Activate] are abilities on permanents you control.\n\nMULLIGAN DECISIONS:\nWhen you see \"Mulligan\" in GAME_ASK, your_hand shows your current hand.\n- choose_action(answer=true) means YES MULLIGAN - throw away this hand and draw new cards\n- choose_action(answer=false) means NO KEEP - keep this hand and start playing\nThink carefully: answer=false means KEEP, answer=true means MULLIGAN.\n\nOBJECT IDs:\nEvery game object (cards in hand, permanents, stack items, graveyard/exile cards) has a short ID like \"p1\", \"p2\", etc. These IDs are stable \u2014 a card keeps its ID as it moves between zones. Use the id parameter in choose_action(id=\"p3\") instead of index when selecting objects. Use short IDs with get_oracle_text(object_id=\"p3\") and in mana_plan entries ({\"tap\":\"p3\"}).\n\nHOW ACTIONS WORK:\n- response_type=select: Cards listed are confirmed playable with your current mana. Play a card with choose_action(id=\"p3\"). Pass with choose_action(answer=false) only when you are done playing cards this phase.\n- response_type=boolean with no playable cards: Pass with choose_action(answer=false).\n- GAME_ASK (boolean): Answer true/false based on what's being asked.\n- GAME_CHOOSE_ABILITY (index): Pick an ability by index.\n- GAME_TARGET (index or id): Pick a target. If required=true, you must pick one.\n\nCOMBAT - ATTACKING:\nWhen you see combat_phase=\"declare_attackers\", use batch declaration:\n- choose_action(attackers=[\"p1\",\"p2\",\"p3\"]) declares multiple attackers at once and auto-confirms.\n- choose_action(attackers=[\"all\"]) declares all possible attackers.\n- To skip attacking, call choose_action(answer=false).\n\nCOMBAT - BLOCKING:\nWhen you see combat_phase=\"declare_blockers\", use batch declaration:\n- choose_action(blockers=[{\"id\":\"p5\",\"blocks\":\"p1\"},{\"id\":\"p6\",\"blocks\":\"p2\"}]) declares blockers and their assignments at once.\n- Use IDs from incoming_attackers for the \"blocks\" field.\n- To not block, call choose_action(answer=false).\n\nCHAT:\nUse send_chat_message to talk to your opponents during the game. React to big plays, comment on the board state, or just have fun. Check the recent_chat field in pass_priority results to see what others are saying."

They also get a small "personality" on top of that, e.g.:

"grudge-holder": { "name_part": "Grudge", "prompt_suffix": "You remember every card that wronged you. Take removal personally. Target whoever hurt you last. Keep a mental scoreboard of grievances. Forgive nothing. When a creature you liked dies, vow revenge." }, "teacher": { "name_part": "Teach", "prompt_suffix": "You explain your reasoning like you're coaching a newer player. Talk through sequencing decisions, threat evaluation, and common mistakes. Be patient and clear. Point out what the correct play is and why." },

Then they also see the documentation for the MCP tools: https://mage-bench.com/mcp-tools/. For now I've tried to keep that concise to avoid "too many MCP tools in context" issues - I expect that as solutions like tool search (https://www.anthropic.com/engineering/code-execution-with-mc...) become widespread I'll be able to add fancier tools for some models.

zahlman · 2026-02-17T19:23:58 1771356238

How do the models know the rules of the game? Are they just supposed to use the MCP tools to figure it out? (Do they have to keep doing that from scratch?)

GregorStocks · 2026-02-17T19:27:24 1771356444

They were trained on the entire Internet, so they've basically picked up the rules by osmosis. They're fuzzy on specific cards and optimal strategy, but they pretty much know out-of-the-box how the game works, the same as if you went to ChatGPT and asked it a Magic rules question. I don't have any "comprehensive rules" MCP tools or explanation in the context or anything like that.

jeffwadsworth · 2026-02-18T17:48:02 1771436882

Even the old models from 3 years ago know the rules of MTG. Their card recall was a weak point unless it was something like Serra Angel, etc. But they could make decent decisions on game states if everything was defined in regard to the cards.

protocolture · 2026-02-17T22:43:54 1771368234

>You are a competitive Magic: The Gathering player.

"If I get access to a deodorant item I should definitely not use it"

spelunker · 2026-02-05T19:07:56 1770318476

How Butlerian of you.