A Future Query

Welcome to Schnitzel, the arbitrarily named search engine of the soon-enough future.

Instead of diving head first into the puzzle pieces required for the next level of search – API standards, ++interoperability, fine-grained differential privacy, [..., ..., ...] – we start right at the end: a query.


I have been hosting book clubs on and off for years. The whole process, given current tools, has lots of friction and eventually a small fraction of people who could and would meet up around common interests, actually do. There is lots of potential intellectual energy in the system - how can we release more of it?

We could reframe "hosting a bookclub on meetup.com" as a search problem:

1 Hey Schnitzel, who is currently in Cambridge, UK and has read both "Sapiens: A Brief History of Humankind" and "Antifragile: Systems that Gain from Disorder"?

2 /Schnitzel: List of 85 Persons: [Pablo S., Maria R., Markus E., Xing Z., A. P., T. G., details hidden, details hidden, ...]

3 Who of the above is open for "learning groups" or "book discussions" or " coffee meetings"? (2)

/Schnitzel: List of 40 Persons : [...]

4 That's a lot. Filter out people with an event RSVP to show-up ratio of under 0.5

/Schnitzel: This special attribute about Entities in your query is aggregated between hosts who they previously accepted an invite from. It is anonymized. There are 25 Persons with non-null entries in the current List. You will be charged 15 credits total and data is only usable for this query context. Proceeds will go to a clean water charity. (3)

Yes, I agree

/Schnitzel: ... List of 28 Persons: [...]

5 Pick the Set of 15, including myself, that maximizes variance among reading histories

/Schnitzel: List of 15 Persons: [...]

6 Great, send them invites for the best common date in the next three weeks. Location Waterstone's Bookstore. Title: "Sapiens meets Antifragile"

/Schnitzel: Invites cost 1 credit per person, which will be restored when accepted. Invites flagged as inappropriate will result in a penalty.

Yes, I agree


Queries like this are not far out. The raw data to answer it is online in different places. We are already intending this exact information exchange, just not effectively.

If we have clear laws about data-ownership and ways to stream ours from the APIs of our apps, we can decide to expose slices of it. Call it YOU-DB – a service that exposes your merged data streams from apps and devices with highly granular, sane permission sets.

It only becomes spooky if access to personal data is not controlled by the individual.

Let's take a look at the query above. Where does the data come from?

1

  • Cambridge, UK – location that YOU-DB updates from your phone once a week
  • read book ( X and Y ) – gathered from Goodreads, Amazon / Kindle / Audible Orders, uploaded photo of bookshelf, or filled in by hand
  • if you replaced "read" with "interested in" it could map the query to your wishlist, or the fact your follow the author, or an article about it you clipped, bookmarked or annotated

2 I think the sanest preference, even for peer-to-peer queries, is to have your name hidden or only display your first name.

3 Here you choose how open you are about messages

4 This is tricky, because the data is about you, but you do not control it. Still it is information that an event creator has a right to share. A possible solution is that an attribute entry about a Person has to expire after a number of days

5 We want a diverse group. This operation needs to access some Schnitzel-internal or 3rd party service to get a quantified representation of every book read by the people in the possible set (perhaps some embedding of learned features). I believe those services will become commonplace.

6 Best date – find the most common open spot for people who expose parts of their calendar to queries.

Sending invites is sketches similar to LinkedIn's InMail. If it's spam the sender pays for it.

+ We could make our book club query more meaningful. So far we searched in rough semantic categories like books. Books just point to interest. A book might have hundreds of features that interest us to different degrees. We can build a model trained on many books and query it's feature embeddings. Now, we can go hyper-granular in finding awesome meetup guests. Eg: who read many hours of "contemporary tech-dystopian sci-fi"*  


For anybody that feels that this is far stretched, remember: when we put information about ourselves on the web, we often want to be discovered - by the right person or community. Right means to be above some threshold for mutual context. A 3am hand-waving definition of context is the distance between you and that person in some feature space.

The way we get to the next level in search is through personal privacy**. With that our knowledge, hobbies, interests, worries, feelings can be queried if we wish and connected to other's vastly better.


*The feature activations of a trained neural network don't map cleanly to our notions of contemporary, tech, dystopian or sci-fi, but for every trained model, we could batch and manually label some dimension X as roughly corresponding to a  human notion of Y.

**It is possible to single out individuals from aggregate queries, but there are hacks around it (injecting minimal noise in the result for example)

Show Comments

Get the latest posts delivered right to your inbox.