Reading design from a distance: Mining environmental design web archives

As important as it will be to future historians to access websites as they appear to us today, web archiving--the process of collecting, preserving, and enabling access to web material--also presents opportunities for different, novel forms of analysis that we can begin to explore and enable for future generations. The following, for instance, summarizes a small project that I recently engaged to collect websites of importance to architecture and the environmental design realm more generally, and to analyze them more as a collective corpus of data than as discrete artifacts. I hope that it provokes a few new ideas among the many diversely focused collectors as they seek to curate and preserve the richest possible stores of information for future research into this field’s growth and change.

This spring I convened a panel of librarians and archivists to discuss the present and prospective futures of digital design documentation collections at Natural Connections, the third joint conference of the Art Libraries Society of North America (ARLIS/NA) and the Visual Resources Association (VRA). Stewards of practitioner, teaching, and historical repositories shared their work to ensure the accessibility of born-digital and hybrid analog+digital collections of design documents and information resources in all of their challenging contextual as well as technical complexity. To understand these efforts and hear their discussion, you can watch a complete recording of the event here: Terra Fluxus: Surveying the Digital Information Landscape of Environmental Design.

For my own part, I wanted to stoke the discussion among presenters and attendees by speculating a little bit about the design researcher of the near future and how she or he might prefer to explore our contemporary born-digital design records and resources, preserved and served to them in all of that rich complexity, in order to better understand the state of environmental design today. My curiosity originates from my own small experience stewarding the archives and information resources of environmental design firms founded on either side of the digital revolution, but extends even now into my work as a web archivist.

I work for Archive-It, the subscription web archiving service and partnership among hundreds of collectors around the world and the Internet Archive. Founded in 1996, ‘the Archive’ is in fact a brick-and-mortar public library in a converted Christian Science church in San Francisco’s Inner Richmond district. Stop by on any given Friday and you can get a tour of the building and collections with our founder. What you will see in place of book stacks and acid-free boxes, however, are the many racks of servers--some of them nestled into the church’s original niches--from which which we share millions of digital books, videos, and audio recordings.

The Archive is now also increasingly collecting software and software-dependent resources as well. In addition to the Internet Arcade, full of your favorites 80’s and 90’s video games, this work extends to the preservation and emulation of whole operating systems and constituent applications. I hope that this portends a fruitful collaboration and future service model for the computer-assisted design realm, but much of the foundational work to make fully functional repositories of obsolete and even current CAD software and files is still being done by the indispensible volunteers in bodies like the Software Preservation Network and the CAD/BIM Taskforce of the Society of American Archivists’ Architectural Records Roundtable.

Still, the Archive’s defining service may yet be the Wayback Machine--the world’s oldest and largest collection of archived web pages, now nearing 500 billion individual URLs from the web as they first appeared, going back to the Internet Archive’s founding in 1996. Browse the collection and you’ll find lots of interesting little artifacts from the web of the past, including the public presences of environmental design firms and their observers. Far beyond mere artifacts Web 1.0 design--glorious as those may be--I argue that these records fill important gaps in the documentation of why, where, and how designers practice that are left between and beyond the slick monographs that researchers can pull from design library shelves.

Environmental design’s history on the web is nonetheless quite incomplete. While the Wayback Machine’s automated web crawler casts a very broad net across the web at any given time, that same breadth means that necessarily few if any design websites can be archived to completeness or with the temporal coherence that researchers will need to fully review the ideas and online discussions that shaped our environment. It’s a broad but often shallow collection, one which may not take the researcher far beyond a given site’s landing page, and which can provide them no wayfinding towards those critical resources buried deep within a trove as vast in scope as the World Wide Web.

To address this, Archive-It empowers libraries and archives to use the Internet Archive’s technologies to create and curate focused collections of their own--collections of websites that compliment their thematic collecting strengths in traditional media, and which they can then preserve and serve to their patrons long after those sites have changed or disappeared. I like to cite the illustrative example of the Francine and Sterling Clark Art Institute Library towards this end. Since 2013, the Clark has archived the websites of and related to the Venice Biennale in order to accommodate the future historians of the event and its legacy. Because of their efforts, these researchers 5, 10, hopefully 50 years from now will enjoy the same unencumbered access to online ephemera as they do to the paper-bound records of the event.

I took inspiration from the Clark’s collecting efforts to build a similar archive, and beyond that to posit how such a future researcher may prefer to engage with it. It’s scope is  the Chicago Architecture Biennial, which ran for three months this past fall and winter in my newly adopted hometown. The first such endowed and international exhibition of architectural ideas in North America, its organizers proudly purported that it exhibited the state of the art of architecture. Naturally, my immediate response as an archivist was to say, ‘well, if it’s the state of the art, then surely someone ought to capture and preserve it for future reference.’

The Chicago Architecture Biennial 2015 web archive therefore includes all of the likely online reference points that a future researcher would desire: the event’s official website as it appeared throughout that timespan; press coverage from the newspapers, magazines, and blogs that extensively covered it; the web pages built by exhibitors and participants to represent, advertise, and document their contributions.

Critically, I think, this archive also includes the social media that surrounded the event: the tweets, Instagram posts, and Facebook feeds through which official partners as well as attendees shared their observations, hashtagged #ChicagoBiennial.

Individually, these artifacts will be accessible to that researcher of the future in their native browser-bounded format and can be understood and interpreted as such. However, as an archive of all of the source code, text, and graphic material that constitutes websites, pages, and social media feeds, other forms of access, borrowed largely in this case from our colleagues in the digital humanities, begin to emerge. As a large corpus of text, for instance, one can analyze the collection for otherwise obscured patterns representative of overarching themes.

One theme posited by the Los Angeles Times architecture critic Christopher Hawthorne, for instance, was that the Biennial’s vision of the state of the art eschewed the work of well-known ‘starchitects’ of our age in favor of the more emergent generation of designers. To experiment with this thesis, I used a suite of services available to Archive-It partners to derive and analyze the key data and metadata from the comparatively massive web archive right on my laptop. Specifically, I was able to extract all of the “named entities”--persons, places, and organizations--from the collection and deposit them in a structured data format from which I could query and retrieve the most frequently recurrent terms.

The results of this little test bear out some aspects of Hawthorne’s thesis while complicating others. Certainly names like Gang, Gramazio & Kohler, Pezo & von Ellrichshausen, and Gil were new to many among the contemporary audience, though they differ entirely from those that Hawthorne specifically observes were “given pride of place” at the exhibition hall: Andres Jaque, Tatiana Bilbao, Bjarke Ingels, Junya Ishigami, and Sou Fujimoto. And while Hawthorne rightly observes that archetypal starchitect Frank Gehry was given no place at all there, his enduring presence on the list above indicates just how difficult it may yet be to conduct these conversations online without invoking him. At the other end of the spectrum, Hawthorne contends that another architect not exhibiting in the official program “hovers above it as a kind of glimmering presence.” No, it’s not Mies, but instead David Adjaye, whose name ranks 24th in the index. Adjaye’s concurrent exhibit at the Art Institute of Chicago might herald a vitality to the state of the art that the Biennial’s curators missed, but this archive of their exhibition instead strongly suggests that to know what they considered the state of the art in 2015, a future architecture scholar would do well to brush up on more modernist and distinctly midwestern icons.

Of course there’s a logical extension to this very focused kind of collecting and data mining. If we can do this at the scale of a discrete event, that is, then what about the rest of the field around it? Could we collect, archive, and analyze the broader practice of environmental design as it is represented on the web in order to learn anything about its language, personalities, and priorities at a given time or timespan? To test the limits of this idea, I’ve begun building the Environmental Design Practices web archive. Far from yet being a comprehensive repository of the field, it is a modest collection of the official websites and social media feeds of roughly 150 firms that practiced architecture, landscape architecture, civil engineering, or similarly affiliated professions at the end of 2015. For the moment it includes a lot of big prestige firms--American in disproportion to true demographics of the field, most likely--but it provides a suitable sample on which to experiment.   

Using the same kind of named entities data that I extracted before, I this time focused on querying and ranking the geographic locations--urban areas, at least--mentioned across the entire collection. If we infer that these data can tell us, beyond just where their offices are located, where to find these firms’ projects and the works that they discuss and/or cite online as precedents, then we can begin to chart a novel geography of professional environmental design work up to the present time. The weighted heat map above, for instance, projects just the top 300 named urban areas in the collection, intensified relative to their frequency. While I’m sure only a rare few future researchers would be surprised to see New York City at the center of this design universe, there are nonetheless some other ideas that surface and may merit closer reading: northern California’s intensity eclipsing southern California’s; the distribution of Chinese coastal cities; the emptiness of portfolios in the global south. Above all, however, I’m eager to see how maps like this one shift and change with the collection as the latter is periodically enriched and expanded over time.

These are early and admittedly crude experiments, but I hope that they express the potential historical value of the many diverse and widely distributed environmental design web archives that we can begin building today. I’ll certainly continue to collect and to tinker, but a truly rich foundation for future research will depend upon the efforts and imaginations of design librarians and archivists everywhere.