At-home digitization for fun and profit

I spent the second half of December 2017 digitizing family photos. Something something busman’s holiday, but it was my brother’s idea; he likes a project. After a few overwhelming days surrounded by stacks of baby albums, we decided to set ourselves a more manageable task: only the photos taken during our five-year sailboat trip around the world between 1997 and 2002, and of those, only the best ones. Here’s what we started with:

  • Probably somewhere in the neighborhood of 5,000 photos, taken by the five members of my immediate family. (There would have been more, but my mother switched to a digital camera in 2000, and my father usually shot slides. We ignored these formats entirely.)
  • A Canon MX920 photo scanner, which works at a rate of about 3 photos per minute.
  • Several (too many!) different potential sources of metadata, about which more later.

Step 1: Selection

This was messy, because we, like most people, did not store our photos in carefully labeled boxes. We had some in communal best-of-the-best albums, some in personal albums, some in a chest in the living room (still in their original cases or envelopes), and lots more in a box in the garage. (This is a bad idea!) As a first step, we sat on the living room floor conducting triage on our personal collections. This was easy for me, as the youngest child; for at least half the trip my camera and my photography skills were bad enough to render almost any subject worthless. Once we’d each picked our own favorites, we brought everything together for a final coordinated review, which often involved choosing between two people’s almost identical pictures of the same island. Some lessons here:

  • Children aren’t very good at judging which of their photos will be interesting in the long term. My brother’s and my carefully curated albums were full of would-be artistic bits of scenery, most of which have aged badly; our photos of people were mostly in the boxes of rejects. There’s a technology shift involved here, of course; we don’t need our own pictures of the Pyramids anymore, because we can easily find better ones online. I’m tempted to generalize that any selection process is inherently fallible and must consequently be reversible, but the comforting truth is that my older sister seems somehow to have judged everything perfectly. Hire her for your next project!
  • Developing quality matters. We had film developed all over the world, and we never tried particularly hard to find the best place in town for it. This means that for stupid, arbitrary reasons, our pictures of Greece are gorgeous and saturated and crisp, while our pictures of Sri Lanka are not. We thought this was OK at the time, because we still had the negatives. Theoretically we could still try to redevelop the negatives, but I don’t think we’re likely to.

At the end of the selection process, we had about 900 photos to scan.

2. Scanning

This took place more or less simultaneously with the final selection stage, actually. I had three albums and two boxes spread out in front of me, and I picked photos out and handed them to my brother for scanning. He or my sister made a note of the location and date of capture (insofar as we knew or could guess it) in a spreadsheet as we went. Once a photo had been scanned, I replaced it in the box or album it came from. My guess is that at least 5% of the time I got this part wrong.

Side note: these photos were all 4×6. We’d previously done a bit of scanning of older family photos, which were more irregularly sized, and I spent a long time with Imagemagick one day trying to crop off the patchy gray borders. Mostly it worked, but in some cases it didn’t. I don’t know what to tell you about that except that it still makes me angry whenever I think about it.

3. Metadata

Oh God.

The main thing we wanted was to be able to sort our photos by capture date rather than scan or upload date. The second thing we wanted was to be able to search for particular countries or cities. I could have told myself from the start that the metadata stage would be extremely time-consuming, but even by my standards it was ridiculous.

We ruled out the GUI option right away, because several of us had already run into problems editing date and location information in the Google Photos interface. (If you change the date, for example, that change isn’t preserved if you download and then reupload the photo–and there’s no way to change location at all.) So we turned to Exiftool, a command-line tool I’d read about but never used before. We built a big metadata spreadsheet and then used a script to turn the spreadsheet data into Exiftool commands. I’m used to doing this at work by concatenating an even-bigger spreadsheet into a shell script, because I’m very bad at at writing Python scripts. In this case, however, my brother wrote a Python script for us. I felt temporarily dejected about this but I admit it was for the best.

What we found is that once you start adding metadata, it’s hard to stop. For one thing, the technology we were using favored more precise data. For instance, we wanted to sort things by country, but after a bit of experimentation we found that the only meaningful places to store location information were GPSLongitude and GPSLatitude, which had to be city- or at least island-specific. Also, the EXIF Create Date tag requires the format YYYY:MM:DD HH:MM:SS, and while we rounded to the nearest day, it seemed wrong to round further than that, especially when we knew we could get the day right if we tried hard enough.

Which brings us to the main reason why we couldn’t stop adding metadata, which has to do with deep-seated personality flaws. Over the course of the boat trip we created a lot of data about it: log books, journals, chronicles of my sister’s days written minutely into a series of wall calendars (black pen for port stops, blue for passages). In the intervening fifteen years we did not miraculously become the sort of people who didn’t mind about months and days as long as we got the year right.

If you’re interested, these are the metadata sources we had at our disposal:

  • captions we’d written on the backs of the photos. We didn’t have many of these, and only a few of them included dates more specific than e.g. “late October”.
  • dates printed on the photos themselves. Do you remember this feature? About a third of our photos had printed dates, and about half of them were obviously wrong.
  • our handwritten log book, which my father had scanned a few years ago and turned into a set of PDFs. This contained dates of arrival and departure and GPS coordinates for every port we visited, as well as details of any in-port boat maintenance.
  • our journals, although really only my sister’s journal, because she was the only one keeping strict and extremely detailed records of everything we did, almost every day, for the entire trip. She often got very behind in this, however, and when she was writing about things that had happened several months before, she occasionally got a few dates or names wrong. Also, she mainly wrote in cursive, and at great length, so she was the only one who could efficiently scan her journals for the details we needed.
  • my sister’s calendars. Starting in 1999, she documented each day in minuscule writing on a wall calendar. Blue writing for passages, black for ports. Every place name is there, as is everything we ate for dinner, every game we played, every time my sister went sailing or practiced the clarinet. Every day’s weather, too, although most are described as simply “beautiful”. “Beautiful. Emma sick.” “Beautiful. Up early. Burn trash with Jeff.” (I too kept a calendar, as did my brother, but mostly what I did was get terribly behind on my calendar and then copy the information I’d forgotten from my sister’s. She seemed to find this both pointless and annoying; little did either of us know it was proto-LOCKSS.)

Because the pictures were all still in order, and because of my siblings’ impressive memories, we could mostly tell where each one had been taken, and we could figure out the precise location and date fairly well from there. (The exception here were the boat-interior and nondescript-tropical-island photos, which were often hard to date precisely.) We didn’t end up doing exactly this–we tended to cluster photos around certain dates, without caring that a particular photo might have been taken on the last date of our visit to Bora Bora rather than the first–but we got close.

Once we had the locations, I halfheartedly investigated pulling in GPS coordinates from somewhere clever, like GeoNames, but I soon decided it would be easier just to Google them. There’s not much to say about this; it was boring. The only mildly interesting things were a) the lack of a good online tool for converting place names to GPS  coordinates and b) the revelation that my siblings and I were surprisingly bad at spelling place names as children, especially considering that so many of the names were phonetic transliterations of foreign alphabets.

4. The finished product

Throughout the metadata phase of this project, my mother kept asking us what the end product was going to be. I felt very proud when my brother explained to her that we were creating metadata for the interface we might one day have, rather than the one we had now (which is what I mutter to myself every time I accidentally look at Digital.Bodleian). Even so, I have to admit that the end product was a little disheartening. Google Photos can’t make you a handy interactive map with all your photos in it. Neither can Apple Photos. What we ended up with is photos that appear in the right chronological order, and places that show up correctly in the metadata sidebar, but that’s it. Part of me was slightly pleased that commercial photo organizing tools are as bad as digital library platforms at their job; most of me was not.

(After the holiday my coworker told me about Google Fusion Tables, so I tried uploading our metadata spreadsheet there, and it created a heatmap for me based on the place names. It did a bad job, however, because Google tends to assume that when you say “Bangkok” you mean “Bangkok, Indiana”, or whatever. You can avoid this by telling it in advance where your locations are, broadly speaking (e.g., Asia), but that doesn’t help if your locations are everywhere. I would have tried harder with this–maybe tried to get it to index the GPS coordinates instead of the place names–but I couldn’t be bothered, because Google Fusion Tables is also not very nicely integrable into your regular photo-viewing interface.)


So what did we learn in the end? That the technology for easily editing and displaying photo metadata isn’t quite there yet; that right now in the year 2018 a project like this pretty quickly hits the point of diminishing returns; and that processing your own metadata for digitization is a good way to get up close and personal with aspects of yourself you would perhaps prefer to forget. In conclusion, here is a photo of me looking for all the world like an eleven-year-old computer savant, and not at all like the sort of person who will one day leave the script-writing to her brother.


It’s not just because we want cookies

Yesterday, if you didn’t notice, the rare books & archives Twitter account for McMaster University in Canada got unusually venomous (for a rare books Twitter account) in response to a historian whose tweet about “unearthing” a letter by Bertrand Russell had gone viral. The letter is one in which Russell denounced fascism; the historian was circulating it, arguably somewhat reductively, as “a model of how to say no to someone you loathe.” He had not credited the archive or given any information about where, exactly, he had “unearthed” the letter. He still has not.

I always get upset about unattributed cultural heritage images (I suspect the Toast editors knew me primarily as “the person who keeps bugging Mallory about putting citations in her art history posts” and only secondarily as “the Mountain Goats weirdo”), but usually I feel a little silly for getting upset. After all, what does it matter, really, if somebody is out there circulating unattributed pictures of Bodleian marginalia? Yes, attribution means recognition, and yes, recognition means funding, and yes, if we let people take our work for granted they might end up doing something like, I don’t know, closing thousands of public libraries and staffing the remainder with volunteers. But even so, the stakes in my particular line of work aren’t really all that high. Suppose the entire high-end digitization enterprise gets shut down; suppose a few paleographers have to spend a bit more on plane travel; suppose we never get around to digitally mapping medieval scriptoria; so what?

In the case of the Bertrand Russell letter, though, the stakes are very high. The stakes, in fact, are exactly why the tweet went viral in the first place, in a bizarre sort of self-righteousness ouroboros that has been distracting me from my work all day.

The work of archives and libraries is to preserve information and make it accessible. People usually forget about the second part, because the cliché of a medieval chained library dies hard, but making information accessible—by cataloguing it, creating and publicizing finding aids, and teaching information discovery and analysis skills—is the only way to make it worth preserving. But because people forget it, libraries struggle to fund it. The Bodleian has millions of uncatalogued holdings. Many more are catalogued only in print, or in handwriting, or in Latin, or the catalogue entries are so brief as to be almost useless unless you already know exactly what you’re looking for. The whole top-down system of archival cataloguing, in fact, seems to be based on the principle that you’ll never have enough money to catalogue everything at item level. I doubt many people realize that even a relatively wealthy library like the Bodleian has to apply to external donors for cataloguing funding–and even then, the money is hard to get, because people simply don’t realize cataloguing needs to be done. People think that acquisition is the end of the story, rather than the beginning.

The thing is, though, the Russell letter is catalogued. That single letter has its own online catalogue entry, as part of a larger (and truly very impressive) publicly searchable Bertrand Russell Archives database. Which isn’t a coincidence; McMaster University Library produced this database because they recognized the value of Bertrand Russell’s archive and wanted to make it available to everyone. That’s why we have the Mosley letter, a record of the moral clarity and courage of one of the greatest minds of the modern era. The argument here isn’t some fuzzy “Well, but if you think about it, isn’t preserving 15th-century Latin glossaries as good a way as any to defend democracy?” It hits much closer to home than that.

Fascism is on the rise now in large part because of mis- or disinformation. We have leaders peddling lies and voters being persuaded by Russian Twitter bots. We’re in the age of the decontextualized quote, the decontextualized image, the Flat-Earther Facebook group; an age where the necessary invention of the term “fake news” was followed almost immediately by its willful misappropriation and misapplication (by the American president, no less). Fascists are trying to undermine and discredit what we know about the world we live in, exactly as they have always done. And we know that’s what they’ve always done because libraries and archives have preserved their history. We know Bertrand Russell resisted fascism because an archive bought, preserved and catalogued his letters and then went to the trouble of making them available to us. If they hadn’t done so, we wouldn’t be able to post pictures of his resistance on Twitter; we wouldn’t be able to prove–or even to know–he’d resisted at all.

Legal deposit, then and now

The Bodleian Libraries, where I work, acquire 80,000 monographs each year through legal deposit. This presents a significant logistical challenge–my colleagues have stories about cataloguing children’s chemistry sets–so I was interested to read, in The Stationers’ Company: A History of the Later Years (Robin Myers, 2001), about the different challenges faced by libraries and publishers in the early years of legal deposit. In the early 19th century, there were 11 legal deposit libraries (today we only have six). In order to obtain copyright protection for their books, publishers had to enter the titles in the Stationers’ Register and send copies to all 11 libraries. Apparently they often didn’t; Myers quotes John Oates saying that publishers “‘commonly neglected to enter large and learned works, such as the universities in particular wanted, and entered only such potential bestsellers as might attract piratical publication, these being for the most part books of dubious use to academics'” (p. 60). In the later 19th century, legal deposit libraries began hiring “agents” to enforce the Copyright Acts of 1836 and 1842. This was essentially a circumvention of Stationers’ Hall; under the Imperial Copyright Act of 1842, works still had to be registered with the Stationers’ Company in order for the owners to sue pirates or plagiarists, but–for reasons The Stationers’ Company doesn’t really explain, except by saying that the Company was rather lax in enforcing or verifying registration–registration often didn’t occur, so the libraries took matters into their own hands. When the commissioners of a royal report on the situation asked the libraries if they were aware that “‘not one in 20 of the books published is registered'”, the Bodleian replied coolly that “‘The books not sent to us are comparatively few'” (p. 70). The Stationers’ Registry was finally discontinued in 1911, when a new Copyright Act came into force.

Flash forward 105 years and, chemistry sets aside, we seem to have the printed-work side of legal deposit pretty much under control. So what’s next? The Legal Deposit Libraries Act of 2003 left the question of electronic legal deposit more or less open for future legislation. In 2013 the Legal Deposit Libraries (Non-Print Works) Regulations new legislation gave the legal deposit libraries access to electronic materials, but with a number of restrictions. For example, provision 23 states that “A deposit library must ensure that only one computer terminal is available to readers to access the same relevant material at any one time.” This is a particularly frustrating limitation. Regulations that restrict the use of electronic materials to a specific location ignore the defining virtue of electronic materials, which is that they can be–and are–accessed anywhere. Who is going to go to a library to read an e-book on a public computer? I’m not, and I work in a library.

Knowing the historical context of legal deposit legislation, it seems more obvious that we are currently in another phase of growing pains, where publishers resent having to give things away for free and libraries struggle to keep their mountains of new acquisitions safe and accessible.

In which I don’t like a book

A panopticon! Image via Wikimedia Commons.

In Delete: The Virtue of Forgetting in the Digital Age (Princeton University Press, 2009), Viktor Mayer-Schönberger argues that the ease of storing and retrieving information in the digital age is having—forgive me—deleterious effects. Specifically, he posits that when information is indiscriminately archived and then retrieved without context, we are unable to adequately weigh or filter it in order to make good decisions. He also suggests that the indefinite archiving of personal information online—whether social media posts or search histories—creates a sort of ultra-panopticon, in which “our words and deeds may be judged not only by our present peers, but also by all our future ones” (p. 11).

To combat these problems, Mayer-Schönberger weighs a few possibilities, including a) more stringent legal protection of information privacy rights and b) “cognitive adjustment”, whereby paradigms of memory shift and we all get used to digital remembering. But the proposal he puts forward is that we attach expiration dates to all the information we create. Photos on our computers, files uploaded to the cloud, Amazon purchases, Google search queries—we specify an expiration date for each of them, and on that date they are permanently deleted.

Mayer-Schönberger devotes a few pages to trying to iron out the logistical problems with this proposal: how to make the process of expiration date creation user-friendly, for example, and how to ensure that expiration dates are adhered to. He admits that an expiration date does not very well mirror the process of actual human forgetting; we don’t remember things completely for a specified period of time and then completely forget them, and the length of time for which we will remember a given thing is not determined at the moment of memory creation, but rather constantly redefined and extended whenever we make use of that memory. To better approximate human forgetting, Mayer-Schönberger suggests that we impose a “rusting” process on our digital information: “We could envision…that older information would take longer to be retrieved from digital memory, much like how our brain sometimes requires extra time to retrieve events from the distant past…. A piece of digital information—say a document—could also become partly obfuscated and erased over time” (p. 193).

I don’t think so. I don’t think the eventual solution to the problem of the digital panopticon will involve waiting five minutes for my computer to open a photo. I don’t think it makes sense to “reviv[e] forgetting” by storing emails in a hard drive that you actually physically keep in an attic (p. 209). As a librarian, I definitely am not on board with systematically degrading documents. Mayer-Schönberger’s proposal strikes me as skeuomorphism taken to an absurd height: just as early printed books resembled manuscripts, and early app icons resembled actual buttons (in each case at the expense of visual clarity), digital storage must resemble an amalgamation of the worst attributes of a human brain and a filing cabinet.

It’s not that Mayer-Schönberger doesn’t raise interesting questions; he does. How do we preserve context and background for digitally retrieved information? Should we impose more stringent information privacy laws? How do we do that, if users don’t actually seem to care that much about protecting their personal information? If the reason they don’t care is that they don’t understand how much they’re sharing or what impact it might have on their lives, how do we educate them? How do we encourage prudent social media use, and how much should we judge our friends and colleagues and employees based on their online presences? Given the virtually infinite supply of digital information, how do we help information seekers to focus on what they need, and how do we ensure that analog materials aren’t lost in the flood? Can we algorithmically emulate the ways in which human memory chooses what information to preserve? If we can, do we want to?

Call me a “teenage Internet nerd” (p. 4), but I don’t think the answers to these questions involve intentionally crippling digital information storage and retrieval. Or rather, it’s possible that one answer to one question does—although I can’t imagine which—but Mayer-Schönberger’s mistake is to conflate all types of information, and all types of information anxiety, into a single entity: personal banking details = browser cookies = unprofessional Facebook posts = files from ten years ago that you don’t need anymore but can’t be bothered to delete. They’re all the same, and we have to get rid of all of them. When he notes that the concept of expiration dates is already in use, one of the examples he cites is the cloud storage service, which allows users to specify expiration dates for files they upload. Well, for one thing, doesn’t seem to exist anymore, but for another, I’m willing to bet that most people who used the expiration date function weren’t interested in preserving the art of forgetting; rather, they had limited storage on the site, and when they uploaded a file to share with someone else, for example, they wanted the file to be automatically deleted after the other person had retrieved it. If had been a service that gradually corrupted users’ only copies of files, I suspect it would have been sued. There are compelling reasons to delete data—information privacy being most likely foremost among them—but the desire to approximate the foibles of human memory should not, I think, be one of them.

TL;DR: the whole history of information technology has been one long crusade to overcome the weaknesses of human memory. The fact that we are now closer to that goal than we have been before should not immediately frighten us into hobbling the technology that got us here.

Open Access in the humanities

I am enjoying this book so much:

How has the popular reputation of the humanities – a frequent topic of lament – suffered from an inability of the public easily to read research work (in both the sense of impeded access and the sense of the unreadable complexity of the language of research)? [p. 22]

…the idea of creating a chain of verification whereby the claims upon which the new work rests can be checked…is potentially significantly enhanced in an open-access world. Although checking others’ use of sources is currently a far less common practice than might be hoped, if all research were open access and the necessary technological infrastructure was put into place, an environment could exist in which this kind of checking could be instantaneous: a linked click. Of course, much humanities writing requires a more totalised understanding of the work than just a link to a single paragraph – it requires the argument, the aesthetic and the context – but this does not impinge upon the potential supplementary benefits of such a system. This could be available not only to those established within universities, though, but rather to anybody with access to the internet. This could range from independent researchers through to those fresh out of their degree. In much the same way as it becomes easy to spend hours following links that look interesting on Wikipedia, a world could be possible where the same is true of an interlinked network of high-quality scholarly papers. Of course, just because OA might offer the possibility of such a system existing does not mean that it would spontaneously burst into existence; new publisher labour would be required to implement the linking, format migration and any supplemental technologies that might facilitate this. [p. 29]

Reading about the book industry


I’ve read two things so far this evening: the final chapter of James Raven’s The Business of Books, and an interview with Martin Paul Eve on open access in the humanities. I have not yet read the rest of Raven’s book, or any of Eve’s book, so stay tuned for a corrections & omissions post, but in the meantime, I was intrigued by how both authors address the economic and commercial aspects of publishing.

Raven traces the impact on publishing of a number of economic factors: the strategy of cross-subsidy, which allowed publishers to increase the range of their offerings; the decline of transport costs in the 18th and 19th centuries, which allowed books to be more widely distributed; and the importance of external or familial wealth in financing publishing ventures. He portrays bookselling as a business for entrepreneurs and cutthroat businesspeople:

Booksellers had to identify profitable texts, undertake alarming financial risks that required large capital investments, and be shrewd in judging the market. [The Business of Books, 2007, p. 357]

And although Raven juxtaposes the idea of the book as a commercial consumer object–a subject of taxation and a potential source of wealth–with the idea of the book as a political or ideological entity (“Books and pamphlets are not neutral objects” (p. 360)), he goes on to say that “In the eighteenth century in particular, valuable intellectual properties resided in the hands of people guided not by ideas but by profits” (p. 365). In fact, he quotes an 18th-century writer on this subject, who compares booksellers to cooks who “provoke the appetite of their customers, without troubling their heads about the effects that these may afterwards have upon their constitutions” (p. 365).

By way of contrast, here is a quote from an interview with Martin Paul Eve (“Open access in humanities and social sciences: Visions for the future of publishing”, in College & Research Libraries News, 76(2), February 2015):

…my vision for open access in the humanities is best summed up in the philosophy of OLH: cooperation. Much of our practice in the contemporary university is predicated on competition. We compete for students, for grant funds, and for faculty, among other areas. We know, though, that in the publishing world, the system of markets and competition has failed us. This is because it is not really a market: there is no substitute product for a book or article when a researcher needs it. There is, though, a great deal of competition among academics to publish in high-prestige venues, which means these often-commercial entities have a high level of market power.

Eve confirms Raven’s characterization of publishing as a capitalist endeavor, but he argues that this is a flawed model. He does so, however, only within the scope of academic publishing, which is not something Raven specifically addresses. Raven’s main concern is trade publishing, or publishing for recreational reading, which Eve discusses only in terms of its potential as an income source for academic authors.

I am firmly on board with open access for research, and I’m looking forward to reading Eve’s book (which, wonderfully, is itself available open-access). But I’m interested in whether the principles of open access can be applied to trade publishing. In the world of academic research, Eve is very likely right that there is no substitute for the correct book or article. Research needs are specific and urgent enough for this to be the case. In the world of recreational reading, however–which is mainly what Raven is talking about–things are not so clear-cut. Recreational reading is still about the exchange of knowledge and the fulfillment of information needs, but those needs are nebulous. The perception that a book is right for a specific reader’s needs is largely based on how that book is and was marketed: the cover art, the media buzz, the review excerpts on the flyleaves. Books are expensive to produce–or at least they used to be–and we have learned to view them as luxury goods and to base our investment in them on perceived market value.

In practice, of course, this means that we only ever interact with the narrow, shallow subsection of literature that has been deemed commercially viable. Consequently, while there may be a perfect book out there for my needs right now, I will probably never cross paths with it. Electronic self-publishing may be changing this a little, but not a lot, or at least not yet; I don’t want to read a bad book, so even though I know that the publishing industry is nepotistic and whitewashed and hostile to anything that can’t be neatly categorized, I’m probably not going to bother reading something it rejected. There is also the question of author livelihood; academic researchers profit indirectly from their books and articles, which increase their prestige and thereby their salary, but novelists and trade authors are more likely to depend on direct income from their work.

I’ll get back to you once I’ve read more, or alternatively once I’ve learned to write shorter blog posts…

Authorship, Jonathan Franzen, and “Signing in with Google”

I wanted to illustrated this post with a misattributed quote gif, but I couldn’t find any that were labeled for reuse, so here is Marie Antoinette, who did not say “Let them eat cake.” Image via Wikimedia Commons.

“The author is the principle of thrift in the proliferation of meaning,” says Foucault: a social construct whereby a modern society regulates meaning and punishes transgression. Authors are held accountable for their works, and associating a work with a given author affects our interpretation and valuation of that work.

In a sense, authorship is now more specific and explicit than it ever was. Social media allows authors to document their identities so thoroughly online that we need never read anything without having already formed a judgment of its author. This can backfire, as with Jonathan Franzen, who is, I suspect, more read about than read; in general, however, the author’s identity is an effective marketing tool, allowing readers overwhelmed by choice to entrust their time and money to the authors whose online voices appeal to them. In this way–and to Foucault’s disappointment, no doubt–authorship has grown if anything increasingly political, as we are more likely now than ever before to know the ideologies and even the presidential candidates espoused by the authors of the books we choose to read.

Even in Internet years, however, this is a relatively recent phenomenon. The early years of social media were more like the early years of printing, when a book’s title page–if there was a title page at all–was more likely to include the name of the bookseller than the author, when authorial misnomers and pseudonyms (for instance, a book on change-ringing attributed only to “A Lover of that Art”) ran rampant, and when the sheer expense of getting something printed meant that a sixteenth-century poet might circulate his works only in manuscript form for most of his life. In the first decades of the Internet, authorial identity was similarly diffuse. I believe we can all still remember a time when no one’s email address was (or the even more modern, and when anyone who linked their blog to their Twitter account was considered to have an insufficient regard for their own safety. These days we are encouraged to consolidate our personal brand by using the same username  everywhere–which is easy to do, as so many sites allow us to log in via Google or Twitter or Facebook. A single identity is convenient, and it may also be a savvy marketing tool, but equally importantly, it ensures accountability. As Foucault says, authors are “subject to punishment.” I don’t know if Google considered the punitive nature of authorship when it started encouraging YouTube users to sign in with their G+ accounts, but it’s a modern truism that there is no community more vicious than an anonymous comment thread.

Of course, authorship in the Internet age has also become more slippery. The ease of copying and pasting a piece of text brings with it the risk of misattribution and plagiarism (to the point that the misattribution of famous quotes has become a meme), and the potential for a piece of misattributed content to go viral means that it is difficult for an author to retain control over their work. This rightly causes anxiety among authors and publishers and librarians alike, but there is also great value in relaxing authorial control; open-source software, for example, allows developers to hugely increase the potential reach and functionality of their work. Depending on the context, authors may seek anything from total anonymity to total control over their work, and information professionals must find ways of balancing data protection, information access and authorial intent in a wide variety of situations.

I will leave you with another Foucault quote, which, amid the proliferation of born-digital data, should strike fear into the hearts of librarians and archivists everywhere:

How can one define a work amid the millions of traces left by someone after his death? A theory of the work does not exist, and the empirical task of those who naively undertake the editing of works often suffers in the absence of such a theory. [“What Is an Author,” in Aesthetics, Method, and Epistemology, pp. 207-208)