Guest post: Initial thoughts on process, requirements, and architecture
Overview: I suggest we start with Mediawiki as a base to avoid reimplementing the wheel, that our major change is to add federation and some thoughts on major styles for doing that, a couple initial thoughts on how the “ratings” / editorial system might work, and then a miscellaneous grab bag of some other thoughts are thrown in.
0. Start with Mediawiki / Wikipedia
I don’t think we need to reinvent the wheel: the Mediawiki code base has been proven at scale and its UI/UX is understood by a large base of users. In terms of both content and code, I think it makes the most sense to begin here. However, this does result in adopting their technology choices from the start.
I’m not previously familiar with Everipedia. At a glance at their Wikipedia page it may have some interesting technological elements, but I tend to be extremely skeptical of the utility of the “blockchain everything” approach. Also, it sounds like it is less suitable for a base content source from being a more stale fork but the design is worth considering.
1. Add federation
The core change vs Mediawiki which is desired is to not have a single centralized source of truth. Therefore, some variety of federation should be added. This can be on a couple levels: for instance, a single server can support having various forks / branches of any article, while different servers may choose to support different sets of the forks / branches. So there is a conceptual level, which may still be hosted together, as well as a network level, which may or may not differ in content.
From a functional level, I envision this as being similar to git in design: we have a large set of objects, which take significant storage and bandwidth, and we have branches / commits which are pointers to particular versions of these objects.
From an implementation perspective, the two major approaches I see are USENET or bittorrent: having either a decentralized (multiple large independent servers) or distributed (all or most users being part of the network) approach. It would also be possible to do a hybrid system with both elements. A USENET style system would be the simplest technologically – being able to presume we have enough space to store an entire set of the objects. With a bittorrent style system we have to deal with machines with intermittent availability and far more limited resources and so need to deal with sharding from the start and so forth.
2. “Ratings” / editorial control thoughts
Want to support both whitelist and blacklist modes for a given server – either take all submissions except for bad actors from list or behavior, or take only submissions from preapproved entities (other servers with a recognized key or users submitting with recognized key).
A given rating / editorial set is likely to only have an “opinion” on a small subset of articles and commits, so an overall view is likely to be composed of prioritized lists of these which are then combined: so if we have a view which has Alice, Bob and Carol in that order, and we wish to display the article for “Encyclopedias”, we first look to see if there’s a stated current version in Alice’s set, then Bob’s, etc. The first one with a pointer to a given version is used. We probably want to be able to compose such views with other views as well, so we might have that initial list alpha, and create a new level view which goes in order alpha, beta, gamma, etc. More sophisticated versions could also be imagined, such as voting weights rather than first view encountered or voting weights which vary by domain.
From a standards perspective, essentially want backend to be able to do an operation like a git synchronization: exchange objects and commits / branches / tags. From a UX perspective on reading, we want to select a particular authority source and then be able to browse as normal. For editing, in order to have a user not need to manage their own key (so they can edit from any device with username / password), they’ll need to have a particular server (or coordinated set of servers) which they log into which manages their key, and we can think of their identity then as being Server:Username, that is, “Username name From Server servername”.
Process / venue: I have said before that I think moderated wordpress is not a good tool for in-depth technical discussions. There are a variety of reasons that I think forums and mailing lists are far superior. While WordPress is good for presenting more finished products for a broader discussion, a forum or list is better for being able to have both more rapid and more permanent back-and-forth discussion for a variety of reasons. These issues will become more apparent as time goes on. However, I think it’s also clear that we’re going to be on moderated wordpress instead. This is barely workable but inefficient and less productive. Basic things like asking a simple question are made far slower and these barriers will lead to fewer people being involved and less discussion happening. This will result in meaningful progress being done by smaller groups in private instead of public discussions with results being presented as essentially a fait accompli with a public discussion period so other people still feel involved. This is a handwavey short version of the argument but just noting my general prediction here.
To be clear, we’re fine starting out here; I just think there will come a point as we actually get into this when we should migrate the leading edge technical discussion to something with less friction and so I’m making the point now so the idea is already planted once we start hitting some of that friction.
Separating process, requirements, and design: Although I have combined elements of all three here, I believe that in general things would be more effective if there are clear delineations between process, requirements, and architecture discussions: while people like to just “jump in and go” because it feels more effective, in the long run process discussions are some of the most important choices to be made and should be highlighted and taken seriously and disseminated widely. Requirements should be accessible to non-technical audiences and codified ; again, while this type of work tends to be ignored in favor of “we know what we mean to do” approaches actually discussing and nailing down something here will have significant payback in the long-term. Architecture discussions are inherently technical and will have a smaller audience, but again are very important as a starting point for the project.
Importance of design and documentation: I’m going to be banging on this drum a lot – I am strongly in favor of actually planning what is to be done and documenting what has been done, how, and why. This will ultimately save a significant amount of effort, lead to far higher chances of actually building what’s intended, and create a higher quality and more maintainable result.
If anyone thinks this is obvious and non-controversial, then they do not understand what I am suggesting compared to the typical approaches of many modern software projects.
Integral nature of “reference client”: While I understand the desire to just make standards and have an ecosystem emerge, the initial implementation is inseparable from the standards project without making the standards meaningless, and that implementation is going to be almost certainly the primary one used and a core driver of how the system operates in practice. This is not to negate the importance and desirability of having separate, high quality standards ; such standards fit in well with my view of the importance of design and documentation even without any other implementation. But standards “on paper” always miss something found when actually trying to implement and run them.
Out of town for a week: On a personal note, I’m going to be out of town for about a week starting tomorrow. So these thoughts are less polished than I might like and I won’t follow up as quickly to responses as I’d like, but I wanted to put something together and get it out there as an initial test of the process as much as anything. Consider this a mic test. I’ll catch up on what’s been written once I get back.