Hacker Newsnew | past | comments | ask | show | jobs | submit | rao-v's commentslogin

I got to say people also seem to be missing really simple tricks with RAG that help. Using longer chunks and appending the file path to the chunk makes a big difference.

Having said that, generally agree that keyword searching via rg and using the folder structure is easier and better.


> I got to say people also seem to be missing really simple tricks with RAG that help. Using longer chunks and appending the file path to the chunk makes a big difference. > > Having said that, generally agree that keyword searching via rg and using the folder structure is easier and better.

It depends on the task no? Codebase RAG for example has arguably a different setup than text search. I wonder how much the FS "native" embedding would help.


Wait, so they don’t have test parity with git? How do they know that they, umm … did the actual thing they were trying to do?

I have heard that you can speed up your favorite compression algorithm by 1000x, if you are not so concerned about what happens when you try to decompress it.

Also gotta love the write-only disk as a hardware analogy. Insane write speeds and infinite capacity...

It's just a lossy compression scheme.

I ran the test suite specifically for git's CLI as that was the target I wanted to build towards (Anthropic's C compiler failed to make an operating system since that was never in their original prompts/goals)

The way it gets organized is there are "scripts" which encompass different commands (status, diff, commit, etc) however each of these scripts themselves contain several hundred distinct assertions covering flags and arguments.

The test suite was my way of validating I not only had a feature implemented but also "valid" by git's standards


This is the same as all the folks asking for and hawking quantized models.

It doesn't matter if the parent model is GPT GOD mode mythos opus 100x Ultra. What matters is the performance of the quantized model.


Hey - I'd love for you to add a documented / standard way to use this inside dockers so we can use build on it for various agentic efforts. I've solved getting bubblewrap to work inside a docker once for the nanobot project, but the folks there are dragging their feet on incorporating sandboxing.

https://github.com/HKUDS/nanobot/pull/1940


I've been testing this on Docker today, including the credential injection, env vars, net calls control. I will add more docs but one interesting use case would be to have something like `zerobox --profile nanoclaw -- nanoclaw`, or something similar.

I'd like to hear your thoughts.


I'll give it a shot later today, but basically you need a pretty specific seccomp profile (see my example - I pulled from the podman repo) to allow bubblewrap to run inside an unpriviledged docker.

I’d love to see this. It was a frustrating learning curve for me to realize that I couldn’t STEP export work from OpenSCAD to something like Fusion.

Build123d is much better (supports STEP export and import) but a tightly integrated CAD frontend would be ideal!


PythonSCAD now has STEP export:

https://github.com/pythonscad/pythonscad


Wait to usefully import and export STEP you need to be BREP based right? I thought SCAD’s engine was fundamentally incompatible (only really one open source BREP engine out there - OpenCascade)

Pretty much, that's why for DXF export I rolled my own:

https://github.com/WillAdams/gcodepreview/blob/main/dxfexpor...

(shows a DXF w/ arcs)

https://github.com/WillAdams/gcodepreview/blob/main/gcpdxf.p...

(shows a .py file for making DXFs)

See the PDF at the top level for more information.


Yes, AFAIK you are correct that SCAD is incapable of outputting clean STEP files.

I messed with this at one point and gave up when I realized every device would have a permanent externally addressable IP within a block that is basically linked to me (good luck trying to change your IPv6 /48 every month or whatever you get with consumer IP addresses)

It’s probably not a big deal and NAT etc. is no protection but it gave me the heebie jeebies.


You know your external IPv4 address rarely changes and also basically linked to you too, right?

> your external IPv4 address rarely changes

Bad generalization. I'm sure policy about this differs a lot, but my consumer ISP definitely reassigns my home's v4 address periodically. I don't track it closely, but it seems that when my ONT power cycles more often than not it pulls a new v4 address.

Now, basing my privacy/security on this would be bad, but to GP's point, if I was using a static v6 block, not only would this address never change, each device in my LAN would have an extra identifier attached to it. External hosts wouldn't merely be able to identify "my house", but traffic from "my phone", "my kid's switch", and "my spouse's phone" would all have distinct addresses.

Of course, my ISP doesn't do v6 at all, so there's no dilemma :')


That's also a poor generalization, though. Some ISPs rotate customer subnets, and devices can rotate their randomized IPs.

That's why I specified if one was using a static v6 network. There are several reasons why this might not be true, from ipv6 CGNAT like what cell providers do, to ISP rotation, to randomization in your own network, to NATing from the private network if you wanted.

But it does seem like it would be far more likely de facto for an ISP to not randomly rotate v6 networks, except maybe to discourage hosting on consumer connections?


> using a static v6 block, not only would this address never change, each device in my LAN would have an extra identifier attached to it.

This is not true.

IPv6 stack allocates at least 3 addresses:

- Link-local - "Permanent" Address derived from the subnet and MAC - Temporary address that changes several times per day

The default address for new connections is always the temporary address. So IP-based tracking from outside your network will be no better than it was before from one day to the next—the /64 will be the only constant here, just as your router's WAN IPv4 is for v4 connections.


Ah, handy! Though it can't always be true, at least for manual configuration ;-) I have two VPSes with v6 addresses (the others don't have it configured...), and both only have LL and their permanent Internet addresses.

My understanding is v6 has two different autoconf schemes, DHCPv6 and a more "native" solution. Do these both always result in interfaces having multiple (routable) addresses?

Most of my IPv6 experience has been setting it up on aforementioned VPS, and being rewarded with slow OS updates, since NetBSD's default CDN, Fastly, blackholes PMTUD, so I had to drop the MTU on the interface just to get v6 TCP connections to work at all[0]. And for point-to-point networking in an overlay VPN, where I just discovered that Chromium has an 11-year outstanding "bug" where it refuses to perform AAAA lookups if you don't have public IPv6 routing.

[0] I could switch mirrors, but the bandwidth drop isn't quite bad enough for me to bother...


Man... I typed that reply on my phone and dropped the ball formatting it lol.

> My understanding is v6 has two different autoconf schemes, DHCPv6 and a more "native" solution. Do these both always result in interfaces having multiple (routable) addresses?

The answer to that is "yes," but only insofar as DHCP is _not_ the norm for IPv6 networks. If you're planning to use DHCP to assign network addresses in an IPv6 range, you would run it in addition to using automatic configuration, and DHCPv6 would be responsible only for the "permanent" IPv6 address. Automatically-configured addresses (via RA with SLAAC or whatever) would still create the temporary address that you'd use for outbound internet connectivity, and the DHCP address hangs around for your use in DNS and for hosting "permanent" services like a webserver or whatever.

You've hit on one of the subtler problems of IPv6 being that it requires more things being let through the edge firewall[0], but given a stateful IPv6 firewall on the client side, the onus is on the hosting service's admin to ensure that works correctly (AFAIK).

[0]: http://shouldiblockicmp.com/


If you had v6, they'd probably also reassign your IPv6 prefix delegation, too.

Also, v6 supports "privacy extensions", essentially randomizing the host portion of the address and periodically rotating it, so it is not accurate to say your address would never change.


Is there a decent open video embedding model out there? I’d love to play with this without uploading video.

I put together a little script to search for and list installed litellm versions on my systems here: https://github.com/kinchahoy/uvpowered-tools/blob/main/inven...

It's very much not production grade. It might miss sneaky ways to install litellm, but it does a decent job of scanning all my conda, .venv, uv and system enviornments without invoking a python interpreter or touching anything scary. Let me know if it misses something that matters.

Obviously read it before running it etc.


Is there a non-tranformer based entity extraction solution that's not brittle? My understanding is that the cutting edge in entity extraction (e.g. spaCy) is just small BERT models, which rock for certain things, but don't have the world knowledge to handle typos / misspellings etc.

Exactly. I genuinely do not understand how any significant user of python can handle white space delimitation. You cannot copy or paste anything without busywork, your IDE or formatter dare not help you till you resolve the ambiguity.

One day https://github.com/mathialo/bython one day!


> You cannot copy or paste anything without busywork

Sounds like a tool issue. My editor (Neovim with a few plugins) can handle copying/pasting with indentation just fine.


> your IDE or formatter dare not help you

Get the ones that do help you! Problem solved, enjoy your clean reading experience!


The problem is that if you copy random code from the internet it cannot figure out the right indentation level - whitespace has meaning in python. What IDE can automagically handle this?

Do you mean "figure out that I'm pasting at level 3, so all pasted code should have +3 levels of indents" like plugins like this one do?

https://marketplace.visualstudio.com/items?itemName=hyesun.p...

(Sublime has the same, so does vim see comment above, so do many real IDEs)

Or do you mean something different?


This is nice, but it's not always the case that +3 indent is the right solution (e.g. if I'm copying already indented code it may be over indented).

It's basically a non problem in most other languages, and a IDE formatter hook will always clean up the code and organize it correctly in a way that you cannot get in Python.


Have you not used any of such IDEs/plugins? It's not X+3 indent, it's "starting at +3", so if you have lines with +10 indent (overindented) copied and paste them at +3 indent, they all get indents cut by 7 levels and end up at the same +3 level as expected.

looks cool..

Alternatively, I've several times used 'pass' as block terminator for my personal code.


I tried crush when it first came out - the vibes were fun but it didn’t seem to be particularly good even vs aider. Is it better now?


Disclaimer: I work for Charm, so my opinion may be biased.

But we did a lot of work on improving the experience, both on UX, performance, and the actual reliability of the agent itself.

I would suggest you to give it a try.


Will do thanks - any standout features or clever things for me to look out for?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: