E-publishing on the Web:
Promises, Pitfalls, and Payoffs for Bioinformatics

an editorial appearing in Bioinformatics in June 1999 (15: 429-431)

Communication lies at the heart of every scientific discipline, and the central method of scholarly communication has traditionally been the journal. Consequently, it goes without saying that on-line publishing will have a great impact on scholarship in general, perhaps as much as the invention of the printing press centuries ago. As part of this larger development, bioinformatics will, of course, be greatly affected, but perhaps less than other disciplines as it is such a new field and already well integrated with the on-line world.

There are many aspects of journals and publishing -- refereeing and editing, the visual presentation of a single article, organizing many articles together into an issue, and distributing the journal. Most directly, the web changes distribution, making it possible for an article to be delivered at much lower cost and much more quickly than through the mail. This will have some obvious effects on how we read and access articles -- e.g. reducing library Xeroxing or sending out reprint-request cards. However, it will also encourage a host of other changes affecting all aspects of scientific communication. Some are subtle incremental improvements while others are fairly radical. Let's go through some of these, focusing on the least radical and most likely scenarios.

How E-publishing will Affect the Editing of an Article

One of the most important services academic journals perform is quality control, which takes place by refereeing and editing. How will the web change this? At some level it will speed things up, e.g. helping move articles to reviewers more quickly. However, the limiting step will still be the time it takes someone to give careful thought to the submitted work.

LARGE DATASETS. Moreover, the complexity of on-line information may actually make the reviewing process more involved and time-consuming. Increasingly publications are associated with large datasets or complex websites that only meaningfully exist on-line. Examples include complete genomes, crystal structures, prediction servers, alignment databases, simulation codes, etc. Often these are THE substantial contribution of the publication, rather than the writing itself. How are they reviewed? Currently, not as thoroughly as the article text. Thus, a referee will probably more often overlook broken links in a web server than trivial typos in an abstract.

As noted in earlier editorials, quality control issues are particularly important with regard to genomics. When a new genome is sequenced, the substance of the work is not the short summary in a glossy magazine, but the actual sequence and functional annotation for thousands of genes, which reside in website datafiles. What if there are substantial mistakes in these datafiles, incorrectly predicted genes or wrongly assigned functions? Beyond being errors in themselves, these mistakes will readily propagate to future analyses, as genome annotation is largely a process of cross-referencing and transferring functional information from the literature.

NO REVIEW. This discussion of refereeing, of course, presupposes that on-line journals will continue with anonymous peer review and that this system can be successfully extended to deal with the complexity of on-line information. There is no reason why it cannot. However, there are a number of more radical ideas to use the transition to web publishing to substantially change (or even abolish) traditional review. Some proposals envision articles being submitted to newsgroups or websites, and then third-party reviewers (not necessarily anonymous) stamping selected ones with an academic society's seal of approval. In the extreme, formal scholarly review could disappear altogether and the quality of articles would be assessed purely in terms of their marketplace utility. The e-biomed system suggested by the NIH is a hybrid of some of these elements, having both refereed and unrefereed sections.

How E-publishing will Affect the Presentation of an Article

HYPERTEXT. Once an article has been submitted, reviewed, and accepted, the next role journals have is standardizing its appearance, format and vocabulary to make it comprehensible to a wide range of readers. How will on-line publishing affect this? Because of the much lower distribution costs of the internet, the length of on-line articles will be less restricted, even in the most selective journals. Hopefully, this will encourage a more thorough and explanatory writing style. Counterbalancing this tendency to verbosity are the possibilities of hypertext, which will allow authors to link their articles to supplementary material on their own websites or external databases. This will enable them to condense the main text, making it less technical and moving details to linked sections. New web technologies, such as XML, may also encourage a more segmented and structured, 'fact-box' style of presentation.

The dynamic nature of the web will also enable authors, editors, and readers to continually "update" an article -- allowing its message to evolve over time -- by posting criticisms, corrections, and links to new results. Journals may have to work hard to ensure that this complex tangle of links and content remains stable over time. This is relevant to the review process as one can erase a mistake after publication on one's own website (cheating in a sense) but not in a journal article.

ARCHIVING. One important function of a paper journal is archiving, making available today's knowledge for the future. Will a web journal be able to perform this essential function? The important issue here is the actual computer format of the article. How can we be assured that an article written today in format X will be interpretable a half-century from now? Archiving is especially relevant to Adobe's PDF format, which is currently the format of most on-line journals (and to a lesser extent to HTML). PDF is owned and controlled by the Adobe company. What will happen if Adobe goes out of business? Or if future PDF versions don't sufficiently support the current features?

How E-publishing will Affect the Organization of Many Articles

INTERACTIVITY. Once an article has been accepted and laid out in standard fashion, the next issue in publishing is how to position it with respect to the multitude of other articles in the journal and in the literature overall. For print journals, the title page, table of contents and index accomplish this task in a limited way, ordering articles by date, author, and subject. However, the presentation is static. On the web, one might imagine a dynamic table of contents, arranging articles according to the reader's research interests, download frequency, and (combining these) projected preferences based on similar readers. Download frequency (number of times an article is downloaded -- promises to be an especially powerful organizing statistic. However, keep in mind it is open to distortion: articles may be downloaded without being read, e.g. by web "spiders". This envisioned "meta-journal" presentation seamlessly flows into overall literature search services, and some aspects of what is to come is already available from the search services in Medline or Amazon. (For instance, the Amazon on-line bookstore constantly prompts one with "readers who liked this booked also liked...")

DATABASE VS JOURNAL. More generally, web publishing will increasingly blur the distinction between journals and databases. What will be the difference between retrieving an on-line journal article versus getting a free-text "report" from a database? One distinction may be the audience: man or machine. To some degree, database records are more suitable for computer parsing, and articles, for human reading.

On a practical level how might the blurring of journal and database affect future information resources? We may see database sites organized increasingly like journals and database curators increasingly perform similar functions to journal editors and reviewers. Eventually, large central databases, such as the PDB or SwissProt, may develop into integrated information resources, encompassing both standardized tabular data and free-text articles. As there are considerably fewer crystal structures or complete genome sequences than articles about these rich datasets, the database report on a structure or genome makes a particularly convenient place to "link" the many articles and "boutique" databases annotating and referring to it. In this way, one may see entries in large central databases becoming "portals" into biology -- in the same sense that Yahoo and other search sites have accomplished this function for the web as a whole.

Economics, The Stumbling Block To Getting Journals onto the Web

When will we get rid of print journals altogether and move completely to the web for scholarly publishing? This will take a while. Some of the reasons are technological -- e.g. waiting for faster networks, "books-on-demand" printers and "e-paper" displays. However, the most substantial barriers are economic.

In the long-term the shift to electronic journals offers great economic efficiencies to scientists, potentially saving them and their patrons money. However, in the short-term, it may redistribute funds in the delicate world of academic publishing. Consequently, it is only being marginally embraced by commercial journals, which are still trying to figure out how to make money in the on-line environment.

READERS PAY. How will we collectively finance on-line journals -- and databases? There is a succession of scenarios that distribute the bill over progressively larger groups of readers or users: (1) pay per-byte, where one pays for exactly what is downloaded and read; (2) individual subscriptions, the current system for most glossy magazines; (3) institutional subscriptions, with access granted to whole universities; (4) society dues; and (5) direct government support, which is currently used to give free access to most large public databases, such as the PDB. The choice of scenario is crucial with regard to creating incentives for certain technologies and to issues of fairness. For instance, a disadvantage of large-scale society or government support is that people who never look at a particular journal or database would effectively be asked to cover some of its costs. However, the advantage is that it makes occasional reading or browsing much easier, particularly by those outside a specialist community. This is important in that a principal goal of government-funded basic-science research is the free dissemination of ideas to the broad public. Large-scale support is also fairer in the sense of not favoring better funded labs.

WRITERS PAY. Rather than asking readers to support a journal, one could ask writers through page charges (and companies, as well, through product advertisements). At first glance, page charges may appear rather "tacky" in that they effectively make a research paper into an advertisement. However, they has some compelling advantages. They would simplify copyright and ownership issues, readily allowing authors to own the copyright to their articles and redistribute them from their own websites or via on-line indexes, and thus provide a straightforward approach towards information dissemination to the broad public. More practically, page charges would also mean that the presentation, organization, and access of on-line material would not be hindered by complex "login" and payment systems. Getting these systems to work seamlessly is, in fact, as complex as implementing many of the ambitious on-line features envisioned here.

With whatever financing scenario adopted, the transition to on-line publishing will certainly take place. As a concrete demonstration of this reality, it is worth noting that most physics disciplines, with the prominent exception of biophysics, have already moved from traditional print journals to a hybrid system, where electronic versions of papers are made freely available from an on-line preprint server at Los Alamos as well as from archival journals.

ACKNOWLEDGEMENT. Thanks to Pat Fleming, Werner Krebs, Ronald Jansen, Cyrus Wilson, Adina Spiro, Alessandro Senes, and Matthew Day for helpful suggestions.

Mark Gerstein
Molecular Biophysics & Biochemistry, Yale U,
New Haven, CT 06520
http://bioinfo.mbb.yale.edu