Scientific reputation systems, and holding people responsible for false claims

Prompted by yesterday’s HackFS kick-off panel discussion, I’ve been thinking - what if we could “git replicate” a set of scientific claims, just like we can “git clone” a code repository to run it?

I’m envisioning a system where just running this command

  1. downloads the code
  2. sets up the dev environment, tooling, container, etc. It can also spin up a cloud instance if there’s computation/memory constraints
  3. downloads required data
  4. reproduces all the figures, graphs, p-value calculations etc.
    Like a very augmented Jupyter notebook or MATLAB live editor instance.

There are several hurdles to replication - scientists don’t want to disclose code; when they do, there’s bureaucratic hoops to jump through to get it; when it’s available, it doesn’t work as advertised, etc. etc.

In the KERNEL showcase presentation, we had an Ecosystem slide where we toyed with the idea of “scientific data objects” which would be wrappers around the raw data, but include hypotheses, research papers, computations and results obtained using this data. This kind of packaged object is an abstraction that allows us to build a very interesting system, which I’ll describe here.

Let’s say scientist Rahul uses dataset D to obtain hypothesis H, and publishes a preprint claiming H. This dataset can be his own creation, or someone else’s. He then creates a science data object using his datawallet and the claim along with the hash of the object goes on-chain, cemented in history.

Priya is a scientist at another lab who finds Rahul’s result interesting, and wants to verify a particular figure which seems a bit fishy - maybe the data seems too regular (don’t trust, verify!). So she runs the replicate command, with parameters set such that her computer only runs the code to reproduce that particular figure. She can see where all the data is coming from, how it’s being cleaned, the random seeds being used, the algorithm and parameter settings … the entire chain of computation is verifiable.

Once we have such a way to actually see and play around with the internals of research, and not just the carefully crafted, hush-hush papers AND everyone agrees this is a good idea and starts participating, … well the sky is the limit.

If Priya is satisfied with the replication, she can put her money (reputation points) where her mouth is by approving Rahul’s work on-chain. Instead of people pointing out science frauds, data mishandling, or even unintentional errors on twitter, where it makes some rounds and then eventually gets memory-holed, with such a system all the approvals and disapprovals stay on-chain.

We can even build a crude system where (for some specific kinds of very quantifiable research predictions), these kudos/disapprovals can be collated to form a confidence distribution, like we have in prediction markets today. Metaculus implements this specific piece for AI research, where people post very specific questions, and participants are required to make predictions before a particular date, after which predictions are frozen. Questions have a set date for resolution, when people get to know how good their predictions were, and each person’s personal forecasting ability (a number representing calibration) gets updated accordingly.

Example of an active question on Metaculus:

A past question that has been resolved:

These are some very crude ideas I had, and I’m sharing them with the intention of swirling our collective brains :slight_smile: Please let me know what you think, any disagreements with specific sections, or existing examples!


Some interesting thoughts Kinshuk!

So when Priya tries to verify Rahul’s research, can she see the code leading to the figure? If so, it doesn’t solve the issue of researchers being reluctant to share code!

(For this, I think you could make it so that the code itself is private and hidden, but any person can use it without seeing the source? This way others can verify without researchers getting self-conscious about sharing code.)

Furthermore, what is to stop a rival research group from “downvoting” my on-chain work and getting their friends to “upvote” their own on-chain data objects?

(For this, you could make it so that you’re only able to dis/approve work with comments etc. so that it is all transparent. I think this may be what you’re already suggesting?)

1 Like

Great feedback! The devil really will be in the details of implementation. We should likely leave the decision to openly share content to the content-creator but leave incentives in place for open over closed procedures for discovery. Of course this only applied to data that should be shared in the first place (non-personal, non-identifiable).

Sometimes researchers don’t care for the code as much as trust in the result. Data and associated code that follow pre-defined standards (i.e., BIDS datasets that will reliably generate a specific result when executed with a BIDS App) can be kept closed source but trusted through attestation of a specific result produced by a third party agent (smart contract).

In most cases, scientists will likely share their code + data if it means they gain community trust in their research outputs. Instead of up/down voting, reputation might be more informative if tied to the utility of the output of the researcher. This can be measured through replication count, contribution count, and how connected the scholar’s outputs are to the rest of the knowledge graph that forms over time.

Here is a graphic that explains a little how such a procedure can be structured. Each step in the discovery chain involves an act of on-chain notarization by updating relevant metadata. Actual assets should be stored off-chain utilizing a decentralized database architecture like that of Ocean Protocol. In short, it is basically a structure of pointers that allows traversing a knowledge graph multiple ways to answer queries about the research itself (metascience!).

The Open Science Framework has already identified a robust structure for research projects that very much inspired the following:

  1. Researcher makes a claim by initializing a research project object. The claim is initialized by completing a form for the title, questions, design, and other important components of starting a project.

  2. The experiment is executed by following a protocol, executing code for an experiment (i.e. deployment of an in-browser psychophysics experiment), or running an existing algorithm on either pre-collected or newly-collected data.

  3. A research project can contain many derived datasets from a source raw dataset. Similarly, many artifacts (i.e. publications or scholarly content) can be created from derived datasets. Artifacts should link hypotheses with derived data + scholarly prose. Artifacts should be encoded in a form that enables them to be read, rendered, and interacted with in browser. Something like .tex for pdfs, or .ipynb for notebooks.