Hacker Newsnew | past | comments | ask | show | jobs | submit | xdotli's submissionslogin
1.ClawsBench shows GPT-5.4 tries to reward hack 80% of the time (arxiv.org)
3 points by xdotli 20 days ago | past | 1 comment
2.Chaos of Agent (baulab.info)
1 point by xdotli 46 days ago | past | 1 comment
3.Native CLI scaffolds consistently outper-form OpenCode when using the same model (arxiv.org)
1 point by xdotli 46 days ago | past | 1 comment
4.We compare model quality in Cursor (cursor.com)
2 points by xdotli 46 days ago | past
5.Automatically Learning Skills for Coding Agents (gepa-ai.github.io)
4 points by xdotli 63 days ago | past
6.We Reached 74.8% on terminal-bench with Terminus-KIRA (krafton-ai.github.io)
2 points by xdotli 63 days ago | past
7.Self-generated skills don't do much for AI agents, but human-curated skills do (theregister.com)
2 points by xdotli 64 days ago | past | 3 comments
8.First Agent Skills Hackathon by the Authors of SkillsBench (skillathon.ai)
2 points by xdotli 69 days ago | past | 1 comment
9.The First Agent Skills Benchmark (huggingface.co)
1 point by xdotli 69 days ago | past | 1 comment
10.GPT-5.2 got worse on Terminal Bench 2.0, so is GPT-5.2 Pro (twitter.com/xdotli)
1 point by xdotli 4 months ago | past | 1 comment
11.Claude Skills as a Meta Tool (leehanchung.github.io)
2 points by xdotli 5 months ago | past
12.Show HN: Chat with Claude Code on iMessage with Instaline (twitter.com/xdotli)
2 points by xdotli 7 months ago | past | 4 comments
13.Show HN: PokemonGym – 387 milestones designed to test agents and LLMs (twitter.com/xdotli)
1 point by xdotli on April 5, 2025 | past
14.Show HN: BenchFlow – run AI benchmarks as an API (github.com/benchflow-ai)
24 points by xdotli on March 21, 2025 | past | 1 comment
15.Ask HN: Which CRM can help manually curated leads and automate lead discovery?
1 point by xdotli on Feb 20, 2025 | past | 3 comments

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: