Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Edit: In particular, it seems to me that papers in their control group are available on SciHub (correct me if I’m wrong), just nobody bothered to download them in the time window they analyzed. Which directly contradicts the “limited access” part of their claim.

I was not sure about this, but if this is the case then their conclusion seems to be unrelated to their findings.



I'm pretty sure about this now.

I just looked into the actual data set[1], and analyzed a few papers from it, e.g. [2] the first paper in scopus_nature.csv. They're definitely available on Sci-Hub at least at the moment [3], and there's no reason to believe they weren't available back then.

Also, here's what they say about the dataset, emphasis mine:

> As a quality control, we performed a random sampling of all the articles retrieved, excluding those already present in the first data set. As in this second data set, the number of Sci-hub downloads is precisely equal to zero, we regard it as a control group (nC = 4,015) from which we are going to estimate comparisons for our experimental group (nE = 4,646).

Downloads equal to zero != not available.

[1] https://osf.io/xb9yn/

[2] https://www.scopus.com/inward/record.url?eid=2-s2.0-85016141...

[3] https://sci-hub.tw/10.1038/541123a


So it seems that this papers result comes to "more downloads => more citations".

Of course, that is hardly a surprising result. I think most authors have read most papers they cite (although things like reviewer/editor shenanigans could make this less than 100%).

Of course, such a pedestrian result does not grab attention.


> Downloads equal to zero != not available.

I'm wondering about this too. What could be the possible other reason for the papers not being downloaded? Boring titles? Preprints easily available? Open access?

> and there's no reason to believe they weren't available back then.

Is it possible that the articles really were not available, as they used the data from sep-2015 to feb-2016, early version of SciHub?


> Is it possible that the articles really were not available, as they used the data from sep-2015 to feb-2016, early version of SciHub?

That's within the realm of possibilities but sure doesn't sound like the case the way they described their methodology. That would have required a catalog of available papers on Sci-Hub too, but the dataset they used only contained download logs, so they couldn't have known whether a zero-download paper was available or not (unless I misunderstood what the first dataset is about).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: