Open Questions

Send your thoughts via twitter or mail

work in progress: A few things I'm thinking about.

What are the fundamental units of scientific contribution?

We need to break down what it actually means to contribute to science.

Should we, besides the research paper, count software and datasets as first class contributions? The creators of numpy and matplotlib had to drop out of academia to create what are now two of the most used software libraries in science. That’s a bad sign.

The creator of IPython (Jupyter) had a similar experience:

“I did get straight-out blunt comments from many, many colleagues, and from senior people and mentors who said: Stop doing this, you’re wasting your career, you’re wasting your talent. [...] go back to physics and mathematics and write papers.” - The Scientific Paper is Obsolete

Other than software, Internet-native publications like are becoming more accepted as actual scientific work. Data publishing or contributing directly to a public knowledge graph is still in the stone ages. Mathematica, as so often, has the most seamless, if insular, implementation of data publishing I’ve seen so far.

Nano-publishing in a computer readable format along the lines of BEL, biopax, micropubs or nanopubs are good developments.

Structuring knowledge is hard, even for experts. We haven’t found the right incentives to make encoding knowledge a cool activity and that’s why we can’t have nice things.

The 2014 Micropublications paper has many more figures and examples

Honorable mentions: Micropublication Biology, PolyPlexus

Serializing ourselves to computable formats

Is the convergence of methodology in science leading to a monoculture?

A lot of scientific fields seem to be turning into “just” applied math, using similar computer science and statistical methods.

If you saw the screens of an economist, neuroscientist and biologist with blacked out plot labels and legends you probably couldn't tell who is who. Yet there's only a few people that are moderately competent with current software tools (most of them hardly usable).

Should everybody else become irrelevant or be coerced into a makeshift data scientist after they spent a decade becoming a domain expert? What types of ideas become less thinkable after such a shift?

When and why is composing text from fragments so hard?

I have a lot of declarative, atomic ideas (statements) in my digital notes but composing transcluded text into a non-fiction essay takes a lot of work. I don't expect abstraction and composition to be as powerful for writing as they are for programming, but I’m surprised how hard it is neither less.

Good prose needs a lot of narrative glue in between the ideas. Context, perspective, a narrative arc, the red thread all seem to clash with composing. It’s not easy to get from composed text fragments to something that’s cohesive. Is there a first principles explanation when and why text composition is difficult?

If you have solved this and thread seamlessly through intertextual hyperspace, send me a note.

Will end-user applications ever be truly programmable? If so, how?

I share this question with Patrick Collison:

End-user computing is becoming less a bicycle and more a monorail for the mind.
As a consequence, we need ever more domain-specific software.  
[..] the popularity of macros and browser plugins strongly suggest that users are smart and want more control.

One problem is that the computer-human merge was so far dominated by computer primitives. Are we becoming more like computers from the 1950s than computers are becoming like humans from 2020?

Human-computer interaction (HCI) seems stuck and hasn’t translated well outside of the lab. When was the last time a media lab made something that people actually use?


Developing good abstractions, notations, visualizations, and so forth, is improving the user interfaces for ideas. This helps both with understanding ideas for the first time and with thinking clearly about them.
[...] Distillation is also hard. It’s tempting to think of explaining an idea as just putting a layer of polish on it, but good explanations often involve transforming the idea. - Research Debt

Where are the innovations in desktop interfaces?

Maybe the Silicon Valley canon of “easy to use” and “intuitive interfaces” is a defensive measure for not having good enough ideas for consumers to actually care about learning them.

For example, is the desktop metaphor suitable for the 21st century? Do you have thousands of interlinked photos, files, conversations and notes on your physical desk top? I don’t think so.

Should we just assume that the average consumer is a sheep and too lazy to learn new ways to use her computer? As an artist metaphors are my currency. Still, I don’t want my imagination to be a captive of the tyranny of inferential steps.

Yes, people are lazy but they also want to be productive. Non-technical people are adopting developer workflows. Command-line inspired interfaces are on the rise. GUIs with hundreds of features crammed into toolbars with arbitrary taxonomies seem antiquated and don’t scale well.

My new approach is to make tools for pros and let the rest follow.


What are possible economies of creating communal computer-readable knowledge graphs?

How will we reward contributors and investors in community-owned distributed public knowledge graphs?

WIP: I haven’t been able to think trough this yet and it shows...

If we run knowledge graphs on smart contracts then there’s no third party that can remediate disputes about who owns a piece of knowledge, an assertion [1], and who should get paid for its usage.

If it’s a permission-less blockchain based system and I copy structured data from a public knowledge base (like Yago) and claim ownership before the actual owners know about the network, I’ll get the benefits in perpetuity if the network succeeds. If Yago doesn’t participate in the protocol they get ripped off.

Digital Rights Management (DRM) can’t do much if it’s anonymous and un-erasable on a strong enough blockchain [2]. In Milton Friedman’s words “We have to make it profitable for bad people to do right thing”.

If the community decides that’s wrong, it has to do a hard fork of the project.

In one scheme where contributors get payed depending on how often their contribution was leveraged to answer queries from users then contributing common sense and known knowledge - the roots of the epistemological tree - becomes very lucrative.

Will there be a gold rush dynamic where, as soon as one project seems to be winning, people will rush to fill in the gaps ignoring copyright and ownership of whatever they upload?  Then again the market might even that out: common knowledge is less of a business advantage.

I don’t see a clear solution to this, but I also don’t know much about crypto. Proof of stake and proof of work don’t seem to make sense if the resource (future human knowledge) is, not like a good currency, technically infinite. Paying a gas price to upload knowledge seems unworkable.

Projects could encode how long a piece of knowledge can have value. Something like: “10 years after upload it becomes a public good”.

Further Reading: The Underlay

[1] Imagine the most atomic, granular piece of knowledge: a relation between two things, a statement. Something like “the boiling point of water is 100C” which results in a triple of {water, boiling-point, 100C}. We can uniquely identify this triple. There’s not much ambiguity. Computers can build on that to answer questions, eg: “Will water boil at 50C ?”*. These atoms, among many other things, make services like Siri (Wolfram Alpha) and Google Assistant work.

These private collections of knowledge bases produce a lot of value for their owners and we should create more community-owned and extendable ones.

[2] I feel new work in Indistinguishability Obfuscation could open up new ways to ensure DRM. According to a friend it can make two programs of same size indistinguishable. “Give someone a capability and decide how they will use it.”. I haven’t made sense of it yet.