I'm attempting to replicate this. Searching the last sentence (which is behind t... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		fabulist on Feb 26, 2014 \| parent \| context \| favorite \| on: Mt. Gox Receives Subpoena From Federal Prosecutor:... I'm attempting to replicate this. Searching the last sentence (which is behind the paywall) brings it up in google right away, so I think you're right. However, using Googlebot's user agent doesn't work, so it must be slightly more sophisticated. The result in Google is also not-paywalled, though going directly to the link is. So maybe they use a simpler strategy, and just mess with the parameters. This is the result from google: http://online.wsj.com/news/articles/SB1000142405270230388060...

nwh on Feb 26, 2014 [–]

Searching at Google for

    "cache:http://online.wsj.com/news/articles/SB10001424052702303880604579405852448992982?"

gets me the full text article, they could just be stripping the header from the page and displaying that? I know it does detection of cached Google pages in some circumstances.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact