Guest post: Initial thoughts on process, requirements, and architecture

Overview: I suggest we start with Mediawiki as a base to avoid reimplementing the wheel, that our major change is to add federation and some thoughts on major styles for doing that, a couple initial thoughts on how the “ratings” / editorial system might work, and then a miscellaneous grab bag of some other thoughts are thrown in.

0. Start with Mediawiki / Wikipedia

I don’t think we need to reinvent the wheel: the Mediawiki code base has been proven at scale and its UI/UX is understood by a large base of users. In terms of both content and code, I think it makes the most sense to begin here. However, this does result in adopting their technology choices from the start.

I’m not previously familiar with Everipedia. At a glance at their Wikipedia page it may have some interesting technological elements, but I tend to be extremely skeptical of the utility of the “blockchain everything” approach. Also, it sounds like it is less suitable for a base content source from being a more stale fork but the design is worth considering.

1. Add federation

The core change vs Mediawiki which is desired is to not have a single centralized source of truth. Therefore, some variety of federation should be added. This can be on a couple levels: for instance, a single server can support having various forks / branches of any article, while different servers may choose to support different sets of the forks / branches. So there is a conceptual level, which may still be hosted together, as well as a network level, which may or may not differ in content.

From a functional level, I envision this as being similar to git in design: we have a large set of objects, which take significant storage and bandwidth, and we have branches / commits which are pointers to particular versions of these objects.

From an implementation perspective, the two major approaches I see are USENET or bittorrent: having either a decentralized (multiple large independent servers) or distributed (all or most users being part of the network) approach. It would also be possible to do a hybrid system with both elements. A USENET style system would be the simplest technologically – being able to presume we have enough space to store an entire set of the objects. With a bittorrent style system we have to deal with machines with intermittent availability and far more limited resources and so need to deal with sharding from the start and so forth.

2. “Ratings” / editorial control thoughts

Want to support both whitelist and blacklist modes for a given server – either take all submissions except for bad actors from list or behavior, or take only submissions from preapproved entities (other servers with a recognized key or users submitting with recognized key).

A given rating / editorial set is likely to only have an “opinion” on a small subset of articles and commits, so an overall view is likely to be composed of prioritized lists of these which are then combined: so if we have a view which has Alice, Bob and Carol in that order, and we wish to display the article for “Encyclopedias”, we first look to see if there’s a stated current version in Alice’s set, then Bob’s, etc. The first one with a pointer to a given version is used. We probably want to be able to compose such views with other views as well, so we might have that initial list alpha, and create a new level view which goes in order alpha, beta, gamma, etc. More sophisticated versions could also be imagined, such as voting weights rather than first view encountered or voting weights which vary by domain.

From a standards perspective, essentially want backend to be able to do an operation like a git synchronization: exchange objects and commits / branches / tags. From a UX perspective on reading, we want to select a particular authority source and then be able to browse as normal. For editing, in order to have a user not need to manage their own key (so they can edit from any device with username / password), they’ll need to have a particular server (or coordinated set of servers) which they log into which manages their key, and we can think of their identity then as being Server:Username, that is, “Username name From Server servername”.

Process / venue: I have said before that I think moderated wordpress is not a good tool for in-depth technical discussions. There are a variety of reasons that I think forums and mailing lists are far superior. While WordPress is good for presenting more finished products for a broader discussion, a forum or list is better for being able to have both more rapid and more permanent back-and-forth discussion for a variety of reasons. These issues will become more apparent as time goes on. However, I think it’s also clear that we’re going to be on moderated wordpress instead. This is barely workable but inefficient and less productive. Basic things like asking a simple question are made far slower and these barriers will lead to fewer people being involved and less discussion happening. This will result in meaningful progress being done by smaller groups in private instead of public discussions with results being presented as essentially a fait accompli with a public discussion period so other people still feel involved. This is a handwavey short version of the argument but just noting my general prediction here.

To be clear, we’re fine starting out here; I just think there will come a point as we actually get into this when we should migrate the leading edge technical discussion to something with less friction and so I’m making the point now so the idea is already planted once we start hitting some of that friction.

Separating process, requirements, and design: Although I have combined elements of all three here, I believe that in general things would be more effective if there are clear delineations between process, requirements, and architecture discussions: while people like to just “jump in and go” because it feels more effective, in the long run process discussions are some of the most important choices to be made and should be highlighted and taken seriously and disseminated widely. Requirements should be accessible to non-technical audiences and codified ; again, while this type of work tends to be ignored in favor of “we know what we mean to do” approaches actually discussing and nailing down something here will have significant payback in the long-term. Architecture discussions are inherently technical and will have a smaller audience, but again are very important as a starting point for the project.

Importance of design and documentation: I’m going to be banging on this drum a lot – I am strongly in favor of actually planning what is to be done and documenting what has been done, how, and why. This will ultimately save a significant amount of effort, lead to far higher chances of actually building what’s intended, and create a higher quality and more maintainable result.

If anyone thinks this is obvious and non-controversial, then they do not understand what I am suggesting compared to the typical approaches of many modern software projects.

Integral nature of “reference client”: While I understand the desire to just make standards and have an ecosystem emerge, the initial implementation is inseparable from the standards project without making the standards meaningless, and that implementation is going to be almost certainly the primary one used and a core driver of how the system operates in practice. This is not to negate the importance and desirability of having separate, high quality standards ; such standards fit in well with my view of the importance of design and documentation even without any other implementation. But standards “on paper” always miss something found when actually trying to implement and run them.

Out of town for a week: On a personal note, I’m going to be out of town for about a week starting tomorrow. So these thoughts are less polished than I might like and I won’t follow up as quickly to responses as I’d like, but I wanted to put something together and get it out there as an initial test of the process as much as anything. Consider this a mic test. I’ll catch up on what’s been written once I get back.

Categorized as Strategy

By coinaday

Interested in requirements, architecture, code review and QA.


  1. Hi Coinaday, first, thanks for writing our second long blog post (not by me). I really appreciate the time and thought you put into this.
    I wish I didn’t have to sound so negative in the following, but I guess I must. Still, I’m grateful you raised a lot of very important issues that we can discuss at length.

    The problem is that the most basic features of the Encyclosphere, as an encyclopedia network, are already decided. I mean, I’m not interested in working on anything that doesn’t have the following features:

    • Is a network of encyclopedic content. In other words, it ties together articles from all over the Internet, not just one domain or one organization.
    • Supports, from the beginning, all existing encyclopedic content.
    • Is based on neutral standards.
    • Does not force users to use one piece of software, because that exerts a centralizing tendency.
    • Is fully decentralized and leaderless.

    MediaWiki. If that’s all true, then it doesn’t make a lot of sense to say that we should begin with MediaWiki. We want to support existing encyclopedias from the get-go, and we don’t want to force users to use one piece of software anyway.

    Federation vs. atomization. As to federation, that’s one way to decentralize. The other way is—for lack of any term I know of—to atomization. A federated system involves having multiple independent servers that work similarly and are interoperable. A network of MediaWikis looks like an interesting idea; and since there are already lots of installations of MediaWiki it would look like a natural way forward.

    The problem with this idea, however, is that MediaWiki forces users and project managers into many choices they might not want to make, and it simply isn’t very good software (IMHO), as fully-featured as it might be. It also excludes any encyclopedias that don’t use MediaWiki (or forces them into an ancillary role). Finally, and this might be the thing that bothers me most about the idea, it forces contributors to the Encyclosphere to register with someone or other’s encyclopedia project, which they just might not want to do. I should be able to put my own damn encyclopedia article on my blog, or wherever, if I want to, without asking anybody’s permission.

    As the intro has it, this means we’ll have a Blogosphere-type system. This is sort of like Bittorrent, as you say. Could we hybridize this by including whole entire servers? Maybe. Let’s not rule it out. But the basic network is one of individual feeds, rather than of servers, I’d like to think. But let’s keep thinking about this, it’s not easy.

    Ratings & editorial control. Ratings, yes. Editorial control, only via decentralized ratings and individual choice of app builders. If there were servers, then it would make sense for one to blacklist another. But if we’re talking about an “atomized” and not federated system that permits individuals to participate without asking permission of anyone, then what do we do with illicit content (copyvio, privacy vio, illegal, etc.)? I have an idea, FWIW: we create data fields permitting feeds and/or user accounts to specify particular articles and feeds/accounts as being illicit, for some specified reason. So both articles and feeds (maybe, later, users) would have addresses/IDs of some sort, and I might then put into my feed: “ 0 copyvio”. That would indicate that that particular article is illicit (0) for the reason “copyvio”. I’m not saying this is the format or anything, but something roughly like that would enable decentralized oversight without privileging anyone.

    I was thinking that we would have ratings as part of feeds, along with everything else. This keeps the locus of control on the individual user. Then there is no need for the Knowledge Standards Foundation to decide on how the ratings are to be used: that is up to app developers. KSF will not be developing an app. If you want to develop an app that makes use of the rating data in just the way you please, go right ahead!

    Discussion venue. I’ve used and moderated both mailing lists and forums a lot, but both are increasingly ill used and understood by the next generation, especially mailing lists. My thinking is that group blog posts will draw new people in, letting them understand that they can contribute.

    I know some people really don’t like moderation. It does add friction, for sure. But the conversations we’re having about the standards are necessarily long-form and long-term. Adding some moderation will incentivize people to stay polite and on-topic. Moderation is also just necessary, because unmoderated discussions always suck.

    We have an unmoderated (but monitored) chat room for shorter, more free-form chat, useful for more in-the-moment planning. But long-form…well, just see what you think after a month or two. We can revisit.

    That’ll have to be enough for now. Much more to do!

  2. the atomization seems better suited to face the necessary opposition coming from existing sources that are already dominated by some bigger players who have agenda to push around.

    I like the idea of starting us with somehow collecting all existing encyclopedia content – this is simple the fastest way to get meaningful weight.

    The question is how? who decides on standards how the blogsphere content is allowed to be of the encyclopedic quality/type.

Leave a comment