Technical Excursions on Simulations for Science Token Communities

Technical Excursions on Simulations for Science Token Communities

The OpSci Open Web Fellowship is a 12-week program that provides a stipend for talented undergraduates, students, and postdoctoral scholars to work on open-source software development that align with OpSci’s mission.
Jakub is an Open Web Fellow and works on simulations for science communities that coordinate value flow using abstracted representations of digital science objects, called tokens. You can find a spotlight on Jakub’s experience in the OpSci Open Web Fellowship here.

What is OpSci?

OpSci is a community of inspiring people who actively work on solving the challenges facing science. Looking into the future, OpSci is synonymous with a framework in which science is and should be conducted, leading to a fair playing field where researchers are rewarded for collaboration and not constrained by institutions.

About Me

I am currently an undergraduate mathematics and physics student with particular interest in neuroscience and machine learning. I found out about OpSci through my KERNEL interview and have been instantly captivated by its mission and the incredibly smart people in the community. I believe the decentralization of science will play a key role in future scientific advancements and am eternally grateful to play a small part in this journey.

Project Summary

If you haven’t already, please read the non-technical community spotlight before going further as it includes important background on the project.

Implementation

This research falls within the field of token engineering and as in other engineering disciplines, my project mainly followed three main steps: analysis, design, and verification. For more details on token engineering, I recommend the blog post Towards a Practice of Token Engineering by Trent McConaghy which has served as a starting point for this project. The verification tool of choice was the TokenSPICE simulator, which was originally developed for running simulations of the Ocean Protocol, but is general enough to be adapted for almost any token engineering use case. I have been participating in the weekly TokenSPICE hacking sessions hosted by the Token Engineering Academy to get familiar with the tool and get feedback from the community on my work.

A significant part of my fellowship was devoted to the analysis of the current science value flow, consisting of conversations with my mentors, participating in community discussions, and research of funding data from organizations such as the NIH or NSF. This analysis formed the basis for the development of the baseline model in TokenSPICE, which reflected the characteristics of the current science ecosystem (e.g. previously funded research projects are more likely to get funded in the future, see success rate plot below).
Renewal_SR_vs_New_SR
Research proposal success rates for renewal vs new proposals (data available at https://reporter.nih.gov/)

All subsequent models were designed with the intention to solve the limitations of the baseline model. All designs were first captured as a schema depicting the agents within the ecosystem and the value flow between them (see example below).


Schema of the public/private open science model

In the schema above, researchers fall into one of three categories, namely data, algorithm, or compute provider, denoting the type of knowledge asset they produce. Public researchers submit proposals for research funded by the DAO Treasury (hence by the open science community) which requires that all value created by the research project is publicly available (grant funding is equivalent to public goods funding in this case). The bottom part of the schema shows researchers in the private sector who participate in the open science ecosystem to gain new access to previously locked data silos and to get some value out of their own research assets.

All stakeholders in the schema were then transformed into agents in TokenSPICE and connected in netlists, specifying the value flow between individual agents. Each simulation had a KPIs.py script that specified which metrics we want to keep track of. Below is an example of the code in the three main scripts forming a simulation netlist.

KPIs.py specifies which key performance indicators we want to track during the simulation. These are then logged into a csv file which can then be used to create plots.

def netlist_createLogData(state):
   """pass this to SimEngine.__init__() as argument `netlist_createLogData`"""
   s = [] #for console logging
   dataheader = [] # for csv logging: list of string
   datarow = [] #for csv logging: list of float
 
   r_dict = {}
   for r in state.public_researchers.keys():
       r_dict[r] = state.getAgent(r)
       s += ["; %s OCEAN=%s" % (r , prettyBigNum(r_dict[r].OCEAN(),False))]
       s += ["; %s proposals=%s" % (r, r_dict[r].no_proposals_submitted)]
       s += ["; %s proposals funded=%s" % (r, r_dict[r].no_proposals_funded)]
       s += ["; research type=%s; asset type=%s" % (r_dict[r].research_type, r_dict[r].asset_type)]
       dataheader += ["%s_knowledge_access" % r]
       datarow += [r_dict[r].knowledge_access]

SimStrategy.py is used to set the default parameters for a simulation, including the number of researchers, the prices of knowledge assets, the length of the simulation, etc. Think of it as a configuration file where you can change different variables to see how the simulation changes depending on the initial state.

class SimStrategy(SimStrategyBase.SimStrategyBase):
   def __init__(self):
       #===initialize self.time_step, max_ticks====
       super().__init__()
 
       #===set base-class values we want for this netlist====
       self.setTimeStep(S_PER_HOUR)
       self.setMaxTime(30, 'years') #typical runs: 10 years, 20 years, 150 years
 
       #===new attributes specific to this netlist===
       # self.TICKS_BETWEEN_PROPOSALS = 6480
       self.PRICE_OF_ASSETS = 1000 # OCEAN
       self.RATIO_FUNDS_TO_PUBLISH = 0.4 # 40% of grant funding will go towards "doing work" & publishing
       self.TRANSACTION_FEES = 0.1
       self.FEES_TO_STAKERS = 0.1
       self.NO_PUBLIC_RESEARCHERS = 5
       self.NO_PRIVATE_RESEARCHERS = 10

SimState.py is where the agents are initialized and “wired up”.

class SimState(SimStateBase.SimStateBase):
   '''
   SimState for the Web3 Open Science Public Funding Profit Sharing Model
   '''
   def __init__(self, ss=None):
       #initialize self.tick, ss, agents, kpis
       super().__init__(ss)
 
       #now, fill in actual values for ss, agents, kpis
       if self.ss is None:
           from .SimStrategy import SimStrategy
           self.ss = SimStrategy()
       ss = self.ss #for convenience as we go forward
       #Instantiate and connnect agent instances. "Wire up the circuit"
       # new_agents: Set[AgentBase.AgentBase] = set()
       researcher_agents = []
       self.researchers: dict = {}
       new_agents = []
       public_researcher_agents = []
       private_researcher_agents = []
       self.public_researchers: dict = {}
       self.private_researchers: dict = {}
 
       #################### Wiring of agents that send OCEAN ####################
       new_agents.append(VersatileDAOTreasuryAgent(
           name = "dao_treasury", USD=0.0, OCEAN=500000.0))
 
       # Public researcher agents
       for i in range(ss.NO_PUBLIC_RESEARCHERS):
           new_agents.append(VersatileResearcherAgent(
               name = "researcher%x" % i, evaluator = "dao_treasury",
               USD=0.0, OCEAN=10000.0, research_type='public',
               receiving_agents = {"market": 1.0}))
           researcher_agents.append(VersatileResearcherAgent(
               name = "researcher%x" % i, evaluator = "dao_treasury",
               USD=0.0, OCEAN=10000.0, research_type='public',
               receiving_agents = {"market": 1.0}))
           public_researcher_agents.append(VersatileResearcherAgent(
               name = "researcher%x" % i, evaluator = "dao_treasury",
               USD=0.0, OCEAN=10000.0, research_type='public',
               receiving_agents = {"market": 1.0}))

These three scripts are then specified in a netlist.py file, which is used to start the simulation.


"""

Netlist to simulate the Open Science Ecosystem, with no EVM

"""

#These puts all key interfaces in one module

#

#Users just refer to netlist.SimStrategy, netlist.SimState, etc., versus

# having to import directly from supporting modules.

from .SimStrategy import SimStrategy

from .SimState import SimState

from .KPIs import KPIs, netlist_createLogData, netlist_plotInstructions

An example of two plots from the public funding simulation is shown below.
Assets_in_Knowledge_Market_LINEAR
Number of knowledge assets published by researchers to the knowledge market. Note that public researchers funded by the DAO treasury (researcher 0 & 1 above) continuously generated new knowledge that is shared within the ecosystem, while the remaining private researchers only share their assets when they have economic incentive to do so.
Total_fees_collected_through_private_vs_public_market_LINEAR
Plot showing the value flowing back to the open science community to fund more public research projects. While the model provides incentive for researchers to participate in private exchange of knowledge on the private market, due to higher prices the public market is more active overall and hence collects more fees (for clarification, the public market doesn’t imply “free” services, but rather lower barriers to publishing and sharing, with all knowledge assets belonging to the open science community rather than a single private entity).

Model Summaries

1 Baseline

This model is based on the current status quo of scientific funding and value flow. It consists of researchers competing for funding from an exhaustible funding agency via research proposals. The research grant is then spent on 1. research costs (e.g. equipment, data, etc.) and 2. getting the research published in a journal.

In this model, the knowledge curators (journals) lock most of the value from the research and they have full control over who gets access to the knowledge assets that have been published. This in turn means that researchers who have been given a grant in the past have a much higher chance of receiving grants in the future and while this itself has multiple parameters (e.g. expertise in the field, the ability to produce high quality proposals, reputation, etc.), it is modeled by a single variable in the simulation called knowledge_access.


A schema of the baseline model (current scientific research pipeline)

This model has a number of limitations, including fixed parameters and overall low resolution, but it correctly shows the problems with the current scientific value flows (flow linearity, centralization of value).

2 Profit-Sharing Models

This model is the simplest representation of how a web3 scientific ecosystem could function. Essentially, it is a variation of the Web3 Sustainability Loop where researchers are still competing for funding from an exhaustible funding agency, but instead of publishing their results to centralized knowledge curators, they publish to the web3 knowledge market, which allows them to retain ownership of their data, articles, algorithms, etc. whilst still sharing your work with the scientific community.


schema of the web3 profit sharing model and a graph showing researcher tokens over time

The profit sharing models offer a considerably high resolution overview of how a decentralized scientific community could function. It shows that researchers can get the stability of a monthly income by participating in the open knowledge market which rewards them based on both the quality and quantity of the results they publish. Furthermore, they retain full ownership of the knowledge assets they produce.

These models inevitably have some limitations. In its current form, all researchers are almost guaranteed to get funded at least once (assuming they buy into the knowledge market to maintain a competitive knowledge_access index). While this ensures a fair competition, we are also assuming the researchers are all doing research of comparable quality and importance and that no researchers have malicious incentives.

3 Public Funding Models

This model takes what works from the profit sharing model, but applies it to a open science ecosystem in which funding is only contributed towards public research, i.e. research projects that don’t belong to any individual but are available for the entire community to use. Note that this does not mean the knowledge assets produced by these research projects are free (they are quite cheap though), it only means the assets are owned by the community, therefore whenever somebody buys access to public data, all of the tokens spent will go to the DAO Treasury (in future variations of this model, the tokens might be distributed across multiple stakeholders within the ecosystem).

schema showing one of the public funding models with community members

In this model, knowledge assets are split into three categories: data, algorithms, compute services, and each researcher is assigned one of these asset categories to produce. These new researchers are referred to as Data Providers, Algorithm Providers, and Compute Providers, respectively. In addition to setting the output knowledge asset of the specific researchers, these types also determine the assets that a researcher might need to buy. For instance, an Algorithm Provider could be a theorist trying to find a new pattern in other people’s data, so they would make use of the knowledge market to buy the data they need. On the other hand, a Data Provider might either collect new data themselves or they might transform their existing data with an algorithm from the marketplace. Lastly, the Compute Provider can be thought of as a private research organization that has collected a very large dataset (so large it would not be efficient to store on IPFS), so it allows other people to run computations on their data as a cloud service.

This model has so far been the most effective in terms of longevity, since after 30 years, the treasury is usually not depleted, however, it has some limitations that will be improved in newer versions:

  • private agents publish/buy assets until they run out of funds, which is not realistic
  • there is a fixed number of researchers, but in reality we should expect a growth of the community
  • we are not tracking most of the high resolution metrics that this model includes (like the performance of different types of researchers).

Closing thoughts

The work presented provides a foundation for future work on science token engineering. In particular, DARC-SPICE now comes with a number of environments that can be modified quite easily. Future work will focus on updating the baseline model to the resolution seen in some of the later netlists for better simulations of the current status quo of scientific funding. Furthermore, we are going to be updating DARC-SPICE soon to the current version of TokenSPICE with Brownie, which should make running simulations an easier process.

If you would like to get involved, make sure to join OpSci Discord channel or message me if you want to contribute to the development of DARC-SPICE.

Resources & Materials