Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm building something similar[1] but for the web. Imagine creating your own command palette/context menu for any web page. ChatGPT really opened up a lot of mind blowing possibilities and the speed of innovation in this space makes me both excited and anxious.

Now if the Chrome store stops taking 3-4 days to approve an update, that'd be great!

[1] https://sublimegpt.com



How about the inverse? Having ChatGPT be able to open a browser and perform actions (like visit a website, summarizing it, giving you the best answer, triggering actions, setting up an ads campaign, replying to comments on a social media platform, etc). Then you don’t even need to open a browser window, the agent can do it for you headless. You only need to tell it to do it via chat. Kind of like on-demand, chat-triggered selenium tests.


> How about the inverse?

Having GPT fill in (or 'out', if you prefer) the myriad of web-forms I'm currently wrestling with as a job-seeker would be amazing. There's only so many times each day I can copy and paste from my CV (résumé) without turning to drink.


Sure, wire a language model to the internet and allow it to run arbitrary code. What could possibly go wrong!


I think https://uilicious.com is trying to do something like that. Haven’t tried it yet but they market themselves as “GPT writes selenium tests for you”


Cool. I wonder if there’s going to be a marketplace for ChatGPT/LLM tools.

Then you download/subscribe/connect your LLM interface, to your tools. I guess that’s kind of like Slack.


I think that's what Langchain is trying to do. You can use Zapier's natural language actions with Langchain to connect with thousands of services.


Is the webpage content passed to ChatGPT, or is this more intended to be a way to easily use chatgpt?

On the first part: I've been trying to build a tool that parses webpages using ChatGPT, but I'm struggling to figure out the best way to pass the website content over. Some options I have tried:

* Raw HTML - expensive, and in a lot of cases doesn't fit in prompt input

* OCR - works better than I would have expected, but can struggle with certain fonts, and a lot of the webpage structure is lost



Let me know if you got it working. I'm looking for such a thing too!

Maybe stripping the styling and Javascript from webpages would work? Did you do the OCR as part of the complete model or did you make it a separate step? Machine learning is usually much better in one step.


I did OCR as a separate step (essentially 1. load webpage, 2. screenshot, 3. ocr, 4. ocr output + question into chatgpt). What does it mean to do it all as one step / how would I got about doing that with ChatGPT?

For more context: I have this setup as an api that I feed url + typescript definitions to, and have chatgpt output information from the website in the specified typescript definition.

For example, I can use {product_price: float, product_name: str} + a url as the input, and fairly accurately get product price info across ALL product websites. It's kind of amazing that it's able to do this much just based upon the typescript variable names + raw OCR output.


> What does it mean to do it all as one step / how would I got about doing that with ChatGPT?

Wait till they make the image input available via the API, I guess


That makes sense, and was my plan, but the costs for chat-gpt-4 are a bit higher than is economically viable for most of my use cases.


have you already tried this: https://github.com/mozilla/readability ?


https://github.com/Nemoden/gogpt

https://github.com/NotBrianZach/bza (my concept, wip, read along repl as opposed to io focused cli tool)


Fascinating stuff, thank you for sharing.

You could create a graph of the intersections/references of meaning across different books. Those could represent an external memory for the agent/LLM, that it can retrieve/navigate via prompts.


my brother thought something like that might be useful for playing dnd with it.

he says it tends to lose character cohesion as conversation goes on

and of course obviously could be useful if a character shows up in a scene it just looks up their wiki entry essentially

sounds complicated lol

i just wanna read math book with it

i could imagine it would be fun to create a wiki with friends and then like randomly roll characters into your session from your shared universe or something

a lot of possibilities




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: