Scientific reputation systems, and holding people responsible for false claims

Prompted by yesterday’s HackFS kick-off panel discussion, I’ve been thinking - what if we could “git replicate” a set of scientific claims, just like we can “git clone” a code repository to run it?

I’m envisioning a system where just running this command

  1. downloads the code
  2. sets up the dev environment, tooling, container, etc. It can also spin up a cloud instance if there’s computation/memory constraints
  3. downloads required data
  4. reproduces all the figures, graphs, p-value calculations etc.
    Like a very augmented Jupyter notebook or MATLAB live editor instance.

There are several hurdles to replication - scientists don’t want to disclose code; when they do, there’s bureaucratic hoops to jump through to get it; when it’s available, it doesn’t work as advertised, etc. etc.

In the KERNEL showcase presentation, we had an Ecosystem slide where we toyed with the idea of “scientific data objects” which would be wrappers around the raw data, but include hypotheses, research papers, computations and results obtained using this data. This kind of packaged object is an abstraction that allows us to build a very interesting system, which I’ll describe here.

Let’s say scientist Rahul uses dataset D to obtain hypothesis H, and publishes a preprint claiming H. This dataset can be his own creation, or someone else’s. He then creates a science data object using his datawallet and the claim along with the hash of the object goes on-chain, cemented in history.

Priya is a scientist at another lab who finds Rahul’s result interesting, and wants to verify a particular figure which seems a bit fishy - maybe the data seems too regular (don’t trust, verify!). So she runs the replicate command, with parameters set such that her computer only runs the code to reproduce that particular figure. She can see where all the data is coming from, how it’s being cleaned, the random seeds being used, the algorithm and parameter settings … the entire chain of computation is verifiable.

Once we have such a way to actually see and play around with the internals of research, and not just the carefully crafted, hush-hush papers AND everyone agrees this is a good idea and starts participating, … well the sky is the limit.

If Priya is satisfied with the replication, she can put her money (reputation points) where her mouth is by approving Rahul’s work on-chain. Instead of people pointing out science frauds, data mishandling, or even unintentional errors on twitter, where it makes some rounds and then eventually gets memory-holed, with such a system all the approvals and disapprovals stay on-chain.

We can even build a crude system where (for some specific kinds of very quantifiable research predictions), these kudos/disapprovals can be collated to form a confidence distribution, like we have in prediction markets today. Metaculus implements this specific piece for AI research, where people post very specific questions, and participants are required to make predictions before a particular date, after which predictions are frozen. Questions have a set date for resolution, when people get to know how good their predictions were, and each person’s personal forecasting ability (a number representing calibration) gets updated accordingly.

Example of an active question on Metaculus:

A past question that has been resolved:

These are some very crude ideas I had, and I’m sharing them with the intention of swirling our collective brains :slight_smile: Please let me know what you think, any disagreements with specific sections, or existing examples!