Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm attempting to replicate this. Searching the last sentence (which is behind the paywall) brings it up in google right away, so I think you're right. However, using Googlebot's user agent doesn't work, so it must be slightly more sophisticated. The result in Google is also not-paywalled, though going directly to the link is. So maybe they use a simpler strategy, and just mess with the parameters. This is the result from google: http://online.wsj.com/news/articles/SB1000142405270230388060...


Searching at Google for

    "cache:http://online.wsj.com/news/articles/SB10001424052702303880604579405852448992982?"
gets me the full text article, they could just be stripping the header from the page and displaying that? I know it does detection of cached Google pages in some circumstances.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: