It provides failover between multiple providers (of both the same model, with routing based on price, performance or availability, and you can define presets that allows automated failover to entirely different models if you want), middleware/plugins - such as optionally adding auto-compaction or web search, detailed logging, making all the models available via either OpenAI or Anthropic compatible API endpoints.
GC threads are generally often useful on multi-tenant systems or machines with many cores, as Java will default-size its thread pools according to the number of logical cores. If the server has 16 or more cores, that's very rarely something you want, especially if you run multiple JVMs on the same host.
Not JVM options, but these are often also good to tune:
You can get into difficulty with kubernetes here, as your jvm will detect all cores on the node but you may have set a resources limit on the pod/whatever, so it’ll assume it can spend more time doing stuff than it actually can, so often times it’s quite necessary to tune some things to prevent excessive switching etc.
Modern JVMs will detect orchestrator-set cgroup limits and size themselves accordingly. If you, for example, set a cpu limit for a pod to “1”, the JVM will size itself as if it was running on a single core machine.
Nah they fixed the JVM to be container aware some versions ago - I do remember dealing with this in early Java 8 days, think Java 10 is when it got fixed, and then it was backported to later releases of Java 8.
I can organize related windows by task, so if I have two things going on which both involve say a Finder window, a Safari window, and some other assorted things, I can switch between tasks as a group with one gesture instead of cmd-tab which will pull up both Safari windows or both Finder windows, and then maybe needing to cmd=` to switch to the correct one.
When I'm in the appropriate space with only those related windows, the exposé gestures are also much more usable than when everything is jumbled together.
Because it makes you have to think before moving. If I am on Chrome and want to go to my code editor, I have to press CMD+Tab, see what position the code editor is in and press CMD+Tab x times to go there.
If I uses spaces, I know exactly where my editor is, where my browser is, it is one key press away and it is always there. I use aerospace and I divide my spaces using Alt+ the qwerty keys. Q=chrome W=code editor E&R=programs open for what I am working aka Postman or Obsidian and T=MS Teams.
My dock on MacOS is always hidden because I don't need it and now I have more screen realestate.
For me, I use spaces constantly to help me organise/compartmentalise what I’m doing. It lets you group related windows, where command tab only brings you one window at a time.
One example would be if I’m working on a document that draws on others I have written. Put all three in a space and that piece of work is nicely organised.
When I have all my windows in one space I find it messy and stressful and it’s harder to find what I want.
Overall spaces are more compatible with the way I think than command tab.
I personally don't, even when I'm doing heavy multitasking on a 13" laptop. Only exception is if something needs to be full-screen.
It can make sense if you're keeping a lot of non-full-size windows on a larger screen and working on separate tasks that are in the same application, meaning cmd-tab won't help.
None of them, but prefer ones written with engineering rigor and security in mind. Having an unvetted plugin ecosystem with code that runs unsandboxed is laughably naive
"Better" isn't just about increasing benchmark numbers. Often, it's more important that a system fails safely than how often it fails. Automatic speech recognition that guesses when the input is unclear will occasionally be right and therefore have a lower word error rate, but if it's important that the output be correct, it might be better to insert "[unintelligible]" and have a human double-check.
It's better in terms of WER. It's not better in terms of not making shit up that sounds plausible.
Probably the answer is simply to tweak the metric so it's a bit more smart than WER - allow "unclear" output which is penalised less than actually incorrect answers. I'd be surprised if nobody has done that.
Ideally, you'd be able to specify exactly what you want - do you want to write-out filled pauses ("aaah", "umm")? Do you want to get a transcription of the the disfluencies - re-starts, etc. or just get out a cleaned up version?
TL;DR: you don't need to do any treasure hunt on your notes by just typing stuff into the search bar. Having your own graphRAG system + LLM on your notes is basically a "Google" but then on your own notes. Any question you have: if you have a note for it, it will bubble up. The annoying thing is that false positives will also bubble up.
----
Full reaction:
Yes but perhaps not in a way you might expect. Qwen's reasoning ability isn't exactly groundbreaking. But it's good enough to weave a story, provided it has some solid facts or notes. GraphRAG is definitely a good way to get some good facts, provided your notes are valuable to you and/or contain some good facts.
So the added value is that you now have a super charged information retrieval system on your notes with an LLM that can stitch loose facts reasonably well together, like a librarian would. It's also very easy to see hallucinations, if you recognize your own writing well, which I do.
The second thing is that I have a hard time rereading all my notes. I write a lot of notes, and don't have the time to reread any of them. So oftentimes I forget my own advice. Now that I have a super charged information retrieval system on my notes, whenever I ask a question: the graphRAG + LLM search for the most relevant notes related to my question. I've found that 20% of what I wrote is incredibly useful and is stuff that I forgot.
And there are nuggets of wisdom in there that are quite nuanced. For me specifically, I've seen insights in how I relate to work that I should do more with. I'll probably forget most things again but I can reuse my system and at some point I'll remember what I actually need to remember. For example, one thing I read was that work doesn't feel like work for me if I get to dive in, zoom out, dive in, zoom out. Because in the way I work as a person: that means I'm always resting and always have energy for the task that I'm doing. Another thing that it got me to do was to reboot a small meditation practice by using implementation intentions (e.g. "if I wake up then I meditate for at least a brief amount of time").
What also helps is to have a bit of a back and forth with your notes and then copy/paste the whole conversation in Claude to see if Claude has anything in its training data that might give some extra insight. It could also be that it just helps with firing off 10 search queries and finds a blog post that is useful to the conversation that you've had with your local LLM.