Day of Digital Archives

I’m at MSA 2011 in Buffalo and participated in a great seminar today on Modernist Soundscapes. The sound resources that many of the participants mentioned were PennSoundUbu Web. Particpants criticized PennSound because of the lack of context on sound files that seemed, by some, to be “slapped” up on the web. The down side is that it seems like PennSound has very little apparatus but on the up side, PennSound has made a lot of resources available. In contrast, what about sound files at the Library of Congress or National Archives? These are artifacts that are more “officially” archived but their resources are more difficult to access online.

Of course, access to large-scale repositories of text opens larger questions about how literary scholars can use such repositories in their research. What are they accessing? For that matter, what are we accessing when we access sound? Just the context? John F. Sowa writes in his seminal book on computational foundations, that theories of knowledge representation are particularly useful “for anyone whose job is to analyze knowledge about the real world and map it to a computable form” (xi). Similarly, Sowa notes that knowledge representation is unproductive if the logic and ontology which shape its application in a certain domain are unclear: “without logic, knowledge representation is vague, Sowa writes, “with no criteria for determining whether statements are redundant or contradictory,” and “without ontology, the terms and symbols are ill-defined, confused, and confusing” (xii). Knowledge representation is the work of all scholars in digital humanities and these scholars must help determine the logics and ontologies that shape how we access this data.

Charles Bernstein has written that “[t]he relation of sound to meaning is something like the relation of the soul (or mind) to the body. They are aspects of each other, neither prior, neither independent (17). Scholars have not had the ability to analyze the features of text that correspond to aurality—their phonemes and prosodic elements—much less compare these features with similar features across collections.

What would I do if I could “access” sound in an archive of digital texts? Many scholars and poets have written about the remarkable experience of hearing Gertrude Stein’s texts read aloud. “Language poets” who emerged in the 1960s and 1970s and who form important scholarly communities today have adopted Stein as an early influence and a model. In part, the nature of this relationship has been ascribed to the indeterminacy and the manner of language play that Majorie Perloff and others see evinced in Stein’s writing, but the extent to which prosody and rhythm has also influenced these artists goes undocumented. Further, very few scholars have had the means to investigate the speech patterns (whether African American or German or French) that may have influenced a work such as Tender Buttons by Stein.

I am using data mining to examine clusters of patterns in Stein’s poetry and prose compared to those in non-fiction narratives and oral histories as well as those present in contemporary poetry. Taking advantage of pre-existing research and development with the Mellon-funded SEASR (The Software Environment for the Advancement of Scholarly Research) application, this work has included identifying OpenMary XML (a text-to-speech system that uses an internal XML-based representation language called MaryXML) output as a base analytic, producing a tabular representation of the data for clustering and predictive modeling that includes phonemic and syntactic elements, creating a routine in MEANDRE (a semantic-web-driven data-intensive flow execution environment) that produces this data and allows future users to produce similar results.

In the future, I want to do the same kind of analysis on sound files. I have to do the work to determine what it is I’m accessing when I’m accessing sound. What does this knowledge look like?

The salient point for me in the seminar today was the idea that it is not necessarily that interesting to discuss how much is out there that we could archive, that we could access, that we could use in our research. We have always made choices about what to archive and how. The same is true today. But today we have more ways of thinking about or modeling what we want to access. Maybe we can finally do something with sound. Maybe once we figure that out, we better know what to archive.

This entry was posted in Uncategorized. Bookmark the permalink.