I spent the second half of December 2017 digitizing family photos. Something something busman’s holiday, but it was my brother’s idea; he likes a project. After a few overwhelming days surrounded by stacks of baby albums, we decided to set ourselves a more manageable task: only the photos taken during our five-year sailboat trip around the world between 1997 and 2002, and of those, only the best ones. Here’s what we started with:
- Probably somewhere in the neighborhood of 5,000 photos, taken by the five members of my immediate family. (There would have been more, but my mother switched to a digital camera in 2000, and my father usually shot slides. We ignored these formats entirely.)
- A Canon MX920 photo scanner, which works at a rate of about 3 photos per minute.
- Several (too many!) different potential sources of metadata, about which more later.
Step 1: Selection
This was messy, because we, like most people, did not store our photos in carefully labeled boxes. We had some in communal best-of-the-best albums, some in personal albums, some in a chest in the living room (still in their original cases or envelopes), and lots more in a box in the garage. (This is a bad idea!) As a first step, we sat on the living room floor conducting triage on our personal collections. This was easy for me, as the youngest child; for at least half the trip my camera and my photography skills were bad enough to render almost any subject worthless. Once we’d each picked our own favorites, we brought everything together for a final coordinated review, which often involved choosing between two people’s almost identical pictures of the same island. Some lessons here:
- Children aren’t very good at judging which of their photos will be interesting in the long term. My brother’s and my carefully curated albums were full of would-be artistic bits of scenery, most of which have aged badly; our photos of people were mostly in the boxes of rejects. There’s a technology shift involved here, of course; we don’t need our own pictures of the Pyramids anymore, because we can easily find better ones online. I’m tempted to generalize that any selection process is inherently fallible and must consequently be reversible, but the comforting truth is that my older sister seems somehow to have judged everything perfectly. Hire her for your next project!
- Developing quality matters. We had film developed all over the world, and we never tried particularly hard to find the best place in town for it. This means that for stupid, arbitrary reasons, our pictures of Greece are gorgeous and saturated and crisp, while our pictures of Sri Lanka are not. We thought this was OK at the time, because we still had the negatives. Theoretically we could still try to redevelop the negatives, but I don’t think we’re likely to.
At the end of the selection process, we had about 900 photos to scan.
This took place more or less simultaneously with the final selection stage, actually. I had three albums and two boxes spread out in front of me, and I picked photos out and handed them to my brother for scanning. He or my sister made a note of the location and date of capture (insofar as we knew or could guess it) in a spreadsheet as we went. Once a photo had been scanned, I replaced it in the box or album it came from. My guess is that at least 5% of the time I got this part wrong.
Side note: these photos were all 4×6. We’d previously done a bit of scanning of older family photos, which were more irregularly sized, and I spent a long time with Imagemagick one day trying to crop off the patchy gray borders. Mostly it worked, but in some cases it didn’t. I don’t know what to tell you about that except that it still makes me angry whenever I think about it.
The main thing we wanted was to be able to sort our photos by capture date rather than scan or upload date. The second thing we wanted was to be able to search for particular countries or cities. I could have told myself from the start that the metadata stage would be extremely time-consuming, but even by my standards it was ridiculous.
We ruled out the GUI option right away, because several of us had already run into problems editing date and location information in the Google Photos interface. (If you change the date, for example, that change isn’t preserved if you download and then reupload the photo–and there’s no way to change location at all.) So we turned to Exiftool, a command-line tool I’d read about but never used before. We built a big metadata spreadsheet and then used a script to turn the spreadsheet data into Exiftool commands. I’m used to doing this at work by concatenating an even-bigger spreadsheet into a shell script, because I’m very bad at at writing Python scripts. In this case, however, my brother wrote a Python script for us. I felt temporarily dejected about this but I admit it was for the best.
What we found is that once you start adding metadata, it’s hard to stop. For one thing, the technology we were using favored more precise data. For instance, we wanted to sort things by country, but after a bit of experimentation we found that the only meaningful places to store location information were GPSLongitude and GPSLatitude, which had to be city- or at least island-specific. Also, the EXIF Create Date tag requires the format YYYY:MM:DD HH:MM:SS, and while we rounded to the nearest day, it seemed wrong to round further than that, especially when we knew we could get the day right if we tried hard enough.
Which brings us to the main reason why we couldn’t stop adding metadata, which has to do with deep-seated personality flaws. Over the course of the boat trip we created a lot of data about it: log books, journals, chronicles of my sister’s days written minutely into a series of wall calendars (black pen for port stops, blue for passages). In the intervening fifteen years we did not miraculously become the sort of people who didn’t mind about months and days as long as we got the year right.
If you’re interested, these are the metadata sources we had at our disposal:
- captions we’d written on the backs of the photos. We didn’t have many of these, and only a few of them included dates more specific than e.g. “late October”.
- dates printed on the photos themselves. Do you remember this feature? About a third of our photos had printed dates, and about half of them were obviously wrong.
- our handwritten log book, which my father had scanned a few years ago and turned into a set of PDFs. This contained dates of arrival and departure and GPS coordinates for every port we visited, as well as details of any in-port boat maintenance.
- our journals, although really only my sister’s journal, because she was the only one keeping strict and extremely detailed records of everything we did, almost every day, for the entire trip. She often got very behind in this, however, and when she was writing about things that had happened several months before, she occasionally got a few dates or names wrong. Also, she mainly wrote in cursive, and at great length, so she was the only one who could efficiently scan her journals for the details we needed.
- my sister’s calendars. Starting in 1999, she documented each day in minuscule writing on a wall calendar. Blue writing for passages, black for ports. Every place name is there, as is everything we ate for dinner, every game we played, every time my sister went sailing or practiced the clarinet. Every day’s weather, too, although most are described as simply “beautiful”. “Beautiful. Emma sick.” “Beautiful. Up early. Burn trash with Jeff.” (I too kept a calendar, as did my brother, but mostly what I did was get terribly behind on my calendar and then copy the information I’d forgotten from my sister’s. She seemed to find this both pointless and annoying; little did either of us know it was proto-LOCKSS.)
Because the pictures were all still in order, and because of my siblings’ impressive memories, we could mostly tell where each one had been taken, and we could figure out the precise location and date fairly well from there. (The exception here were the boat-interior and nondescript-tropical-island photos, which were often hard to date precisely.) We didn’t end up doing exactly this–we tended to cluster photos around certain dates, without caring that a particular photo might have been taken on the last date of our visit to Bora Bora rather than the first–but we got close.
Once we had the locations, I halfheartedly investigated pulling in GPS coordinates from somewhere clever, like GeoNames, but I soon decided it would be easier just to Google them. There’s not much to say about this; it was boring. The only mildly interesting things were a) the lack of a good online tool for converting place names to GPS coordinates and b) the revelation that my siblings and I were surprisingly bad at spelling place names as children, especially considering that so many of the names were phonetic transliterations of foreign alphabets.
4. The finished product
Throughout the metadata phase of this project, my mother kept asking us what the end product was going to be. I felt very proud when my brother explained to her that we were creating metadata for the interface we might one day have, rather than the one we had now (which is what I mutter to myself every time I accidentally look at Digital.Bodleian). Even so, I have to admit that the end product was a little disheartening. Google Photos can’t make you a handy interactive map with all your photos in it. Neither can Apple Photos. What we ended up with is photos that appear in the right chronological order, and places that show up correctly in the metadata sidebar, but that’s it. Part of me was slightly pleased that commercial photo organizing tools are as bad as digital library platforms at their job; most of me was not.
(After the holiday my coworker told me about Google Fusion Tables, so I tried uploading our metadata spreadsheet there, and it created a heatmap for me based on the place names. It did a bad job, however, because Google tends to assume that when you say “Bangkok” you mean “Bangkok, Indiana”, or whatever. You can avoid this by telling it in advance where your locations are, broadly speaking (e.g., Asia), but that doesn’t help if your locations are everywhere. I would have tried harder with this–maybe tried to get it to index the GPS coordinates instead of the place names–but I couldn’t be bothered, because Google Fusion Tables is also not very nicely integrable into your regular photo-viewing interface.)
So what did we learn in the end? That the technology for easily editing and displaying photo metadata isn’t quite there yet; that right now in the year 2018 a project like this pretty quickly hits the point of diminishing returns; and that processing your own metadata for digitization is a good way to get up close and personal with aspects of yourself you would perhaps prefer to forget. In conclusion, here is a photo of me looking for all the world like an eleven-year-old computer savant, and not at all like the sort of person who will one day leave the script-writing to her brother.