Dan Cohen

The Digital Commons

Air Date: March 18, 2016

Digital Public Library of America director Dan Cohen talks about democratizing access to information.


HEFFNER: I’m Alexander Heffner, your host on The Open Mind. Today we resume a vital conversation launched with John Palfrey, inaugural chair of the Digital Public Library of America, now with its founding executive director, Dan Cohen, formerly leader of George Mason University’s Center for History and New Media. A one-stop portal for primary sources from the nation’s archives, libraries, and museums, DPLA is a public option to access the full breadth of human expression. Just this year, the New York Public Library released more than 180 thousand photographs, postcards, maps, and other public domain items from its special collections, adding to the DPLA’s 10 million aggregated items. Nevertheless the challenge to establish rules for orphan documents whose rights are untraceable is a barrier to full-scale digitization of the American cultural heritage.

“The incredible public commons from the past several millennia is weakened,” the library contends, “by a lack of common agreement over rights statements on these items.” It continues, “…Because of inconsistent international copyright law, risk aversion among many non-profit institutions, and the gray area of unclear ownership that many scanned materials fall into… these collections have too wide a variety of rights assigned to them and no clear pathway toward maximal openness…” This may answer a question I’ve long pondered, why the open source free encyclopedia of Wikipedia has not, at least not yet, converted each post into a more millennial-friendly visual-guided exhibition derived from its sources. And I thought Dan we would start on that note.

COHEN: Such a great, uh, prompt to talk about Wikipedia first because really what the Digital Public Library of America wants to do is to be a Wikipedia, a free resource, uh, at scale, but drawn from America’s libraries, archives, and museums, these institutions that have been around for hundreds of years and that really are in the business, like Wikipedia, of making materials available to everyone for free. And so we’re really a perfect partner if you think about it to provide in a sense the primary resources, materials that are in these great cultural heritage institutions, uh, available and so much of it is indeed visual, right, and can complement the text that is on Wikipedia. It includes photographs, a million and a half photographs from our nation’s libraries and archives. It includes artworks from our museums. It includes primary sources that include video and audio, we just brought online materials around the civil rights movement and some of the video from the public television station in Atlanta, Georgia during the sixties, so all this is incredibly rich and we think an incredible complement to what uh, the Wikipedians are doing in terms of making, uh, the past understandable and uh, freely accessible to everyone.

HEFFNER: We had Sue Gardner here, who told us about the long journey to being the verifiable and verified source of information…

COHEN: Yeah. Right.

HEFFNER: That Wikipedia has, has become.


HEFFNER: Our mutual friend John Palfrey as, ad nauseam about this,

COHEN: Yeah.

HEFFNER: …About how Wikipedia evolved into a reliable, the most reliable source of information on the web.

COHEN: Yeah.

HEFFNER: So to go back to that opening prompt, Dan, what do you think explains…

COHEN: Yeah.

HEFFNER: Um, in terms of rights and the legality around rights,

COHEN: Yeah.

HEFFNER: The unwillingness, um, or at least the ina—the inability to transform Wikipedia into what you’re trying to create now,

COHEN: Sure.

HEFFNER: Which is a document primary source-based, um,

COHEN: Right.

HEFFNER: Operation.

COHEN: Yeah, so you mentioned rights and we did an assessment of the right statements that are assigned to materials that are digitized, uh, at our contributing institutions. So you mentioned we have well over 10 million items from 18 hundred different, uh, institutions in the United States. Uh, we bring it all together at DP.LA. And we’ve done that, uh, on our website and also through our platform, our digital platform in a way that makes it broadly accessible, but when we looked actually at the rights statements that were assigned to those items, what we found is that there were tens of thousands of different statements, and you bring out the kind of friction or problems with making these items in a sense reusable in contexts like Wikipedia and the problem is that the general public, Wikipedians, uh, everyone really is quite unsure what they can do with these items. They’re thrilled that our institutions, our libraries, archives, and museums are in fact spending time and, and funds to bring this material online, but because of the complexity of rights around these items, they’re really not sure what they can do. And so we are doing an international project, um, in fact with partners in Europe, uh, and other, uh, countries around the world, uh, to in fact streamline these rights. Uh, the Knight Foundation has funded a project where we are in fact coming up with a much smaller set of rights than tens of thousands. We’re gonna streamline it down to about twenty and those twenty rights statements will in fact for the first time declare internationally how everyone can use every item in the DPLA and it means that once we have that in place, I think everyone will be able to much more securely understand, um, you know, what they can take and reuse in a, in a context like Wikipedia, use it in a school report, use it in a book. Um, this clarification of rights is really essential I think going forward to the overall landscape for learning and research.

HEFFNER: Will corporate America and the international community respect the convention that you establish?

COHEN: Mm. Right. Well we hope so and, and I should point out that these are rights statements, not legal documents, right? And I think you mentioned orphan works, which I think comes up a lot in this context. We have materials that are very clearly public domain, so all books published before 1923 in the United States are in the public domain. We provide access to a couple million of them through our site as well as about a half a million books from the past century, um, that are properly tagged so you can understand what you can do with them, but there’s a lot of materials actually between say, 1923 and the present where the rights are just simply unclear. We have lost, uh, contact with the creator of the, the resource, um, we, the institution that is maybe holding it and wants to digitize it is unclear, uh, what to do with it and whether they’ll be sued. And so there’s a great, I think as you mentioned, risk aversion out there in institutions like libraries, archives, and museums, and we want to work with them. I mean we are a national non-profit that really is working to maximize access and a real part of that is to clarify what best practices would be around these kind of materials that are in a gray area where we’re just unsure. Um, and so the rights statements will include, uh, in a sense, clear statements about materials in this gray area to say that we think there are no known rights, we think that the rights have likely expired, um, but you may wish to contact the institution or you should only use this in educational contexts or non-commercials contexts, so they’ll be very clear sort of tags for each item and I think that will make it a lot easier for everyone. We’re working, uh, as well with Creative Commons which is part of this rights work, and I think Creative Commons has done a great job on this very topic for the, the modern day, right? If I’m an author, I’ve written a few books, um, and one of them is actually published under a CC By license, a Creative Commons by license which means people can reproduce it as long as they credit me as an author, and I think that um, for contemporary materials, Creative Commons is a great way forward to clarify rights. What we’re trying to do and I think a good way to understand it is in a sense, Creative Commons-like rights statements for everything else, going back hundreds and indeed thousands of years to specify the way in which people can use these wonderful, rich materials.

HEFFNER: I’d encourage our viewers to go The Open Mind archive…

COHEN: Yeah.

HEFFNER: And view a program with Robert Darnton, the Harvard librarian emeritus. Google passed on this concept, right? The history of the Digital Public Library is that uh, the resources were going to be availed to Google but Google wanted a set of stipulations…

COHEN: Mm-hmm.

HEFFNER: That were not consistent with the values that the DPLA,

COHEN: Mm-hmm.

HEFFNER: Wants to perpetuate as it relates to these rights and, and to access.

COHEN: Right.

HEFFNER: How has, how has your aspiration for libraries…

COHEN: Yeah.

HEFFNER: How, how has that objective libraries for all,

COHEN: Sure.

HEFFNER: Been stymied?

COHEN: Yeah. Right, so you know, I should say that we have materials that were digitized by Google, for instance books that were digitized at Harvard, um, 400 thousand of them, um, that are made available through the DPLA so you know, we view commercial partners like Google as, as potentially part of the solution. I, I think as an evangelist for libraries, my key point here is that I feel this is, this is our shared culture and I feel that libraries and uh, uh, non-profit institutions like the Digital Public Library of America really need to be the stewards for this material over the coming decades and centuries. I don’t think that it’s healthy for our society to cede that to companies that are truly innovative, I mean Google is incredible, Amazon is amazing and uh, my hat is off to them for everything they do in terms of uh, immense digital infrastructure and their scanning projects and the way they enable people to read on, on new devices, um, that’s all great but I think at the end of the day really, it, it really is incumbent upon us I think to have this material in public institutions and to have public institutions work together because really that’s the only way that we can ensure that this material is around for the long run. I mean libraries are in the long run business, the forever business. They’re one of the few institutions we have in our society that um, you know, we dedicate funding to, to be around for everyone. And a word that I like to use a lot is they provide democratic access and I think that is really key. They provide access to all. Uh, Digital Public Library of America’s central office is actually housed within the Boston Public Library. Um, as it says over the door every morning I go in and it says “Free to All” … uh, right above the door where I walk in and I think, you know, free to all is really critical and I think devoting resources to something like the Digital Public Library of America and our many library, archive, and museum partners, uh, is really essential.

HEFFNER: What about those communities, scholarly and otherwise, that are not your partners yet?

COHEN: Yeah sure.

HEFFNER: How are you going to get them to become your partners?

COHEN: Right, well, yes.

HEFFNER: And what’s really the obstacle preventing them from your partners?

COHEN: Yeah, yeah. Well we are, we have grown extremely, uh, quickly so uh, you know, when, when John and I, um, you know, worked to launch this in 2013, I mean it hasn’t even been three years, um, we only had 500 contributing institutions. We’ve almost quadrupled that number in just two and a half years at this point, um, so we’re, we’re growing like Kudzu which is great, um, but there’s a long way to go. But the way that we’re doing it which I, I think is um, and, and John deserves a lot of credit and uh, Maura Marx, um, and others who worked early on to sort of plan the DPLA I think came up with a really great model. And that is that the central office is quite small, we’re only 15 people of librarians and software developers, uh, working at national scale, uh, with a very sophisticated infrastructure. But we rely on a very webby model, very 2016 model, which is that it’s very easy actually to join DPLA. We have hubs, what we call hubs across the country, uh, that help to bring small and mid-size institutions online, it works very much like the web if you think about how the web is connected together first at a local level and then at a kind of regional level and then through, uh, backbones, uh, in a sense across the United States and indeed around the world, so there’s many ways to hook into DPLA and to join up, and we are in the process still, uh, here in our third year of in a sense expanding that network across the United States and it’s a very exciting time, we can see really, um, that we will have a national network of a true digital library of America, um, in the next few years that covers every state, that allows every institution that wants to contribute to join in. And that includes a very wide variety of institutions, independent libraries, corporate archives, publishers, encyclopedias, I mean there are so many different kinds of institutions that really compose the DPLA.

HEFFNER: Considering that Wikipedia is the only non-profit website to be ranked among the most trafficked web destinations…

COHEN: Yeah, yes.

HEFFNER: Would you imagine ultimately as the success story a, you used to call them a companion website.


HEFFNER: The resources, um, that you provide being affixed to what has become the world’s most vibrant encyclopedia today?

COHEN: I think even more expansively than that, I would love to see that and we’ve spoken to Wikipedians and, and um, uh, several times and we’re working toward that. Um, and we know for instance that Wikipedians are in fact starting to use, uh, DPLA material. There’s actually a little widget for Wikipedia editors, um, that uh, you can install, uh, in your web browser. Um, it’s available on our site and a lot of Wikipedians have taken advantage of this, and when you’re editing an article it actually shows you up at the top items from DPLA that are related to the article you are currently editing. So we have this sort of cyclical process in place where uh, Wikipedians can add in DPLA material. But again, I think even more expansively than that. You know, we have a website at DP.LA and people can access the entire collection there, but we’re completely okay, in fact I think it would be a great win for us to be everywhere in a sense. Um, the website’s important but we actually have about three quarters of the use of Digital Public Library of America materials happen outside of our website, and I think that’s really unique. So we provide a digital platform where we can serve these items up through third-party sites, uh, very seamlessly and uh, we have something called an API or an Application Programming Interface, which is just a fancy technical way of saying that developers of other websites, developers of apps, uh, can in fact integrate DPLA at a core level into uh, their applications or their educational websites or other library websites, and people are in fact doing that. And it is not only okay, it’s more than okay with us if in fact we have most of these coming in a sense through integration rather than by directing people to our website. I think it would be a great outcome if we weren’t, I’d love to be a top ten website, that would be a great outcome as well. But it’s, it’s perfectly okay with us and in fact part of our mission that we wish to distribute this as widely as possible through these kinds of technical integrations that really provide access right where people in fact are looking for the material.

HEFFNER: What is the metric,

COHEN: Yeah. Yeah.

HEFFNER: Um, through which you’re analyzing DPLA’s long-term success?

COHEN: Sure, yeah. Well look, we want to have impact in, in many realms. I mean K-12, uh, college, especially colleges that are under-resourced like community college is, is a particular focus of ours. Um, lifelong learning, genealogists, amateur enthusiasts, we want to reach a lot of different audiences. But I think this is a critical question. Um, we have spent a lot of time obviously in the last couple years in a sense building up the supply of DPLA, building up a supply of rich materials from our nation’s, uh, cultural heri… institutions. What we’re focusing on now and it’s directly related to your question is in a sense curating these materials for kids and many others. Um, we know that um, uh, fewer and fewer, um, students for instance are going, let’s say, uh, to a Google-style search engine that shows you ten blue links to find what they’re looking for for a report or to study, um, and they need materials again in the shape that they’re looking for. So we’ve actually recently launched a part of our site dedicated to education and curation of these materials, and this includes at this point 60 primary source sets, sort of grab and go, in a sense, uh, off the rack sets that take the best materials from those 1800 institutions and put it all in one place. So you don’t need to look up a particular topic and sift through the giant sea of materials, you can kind of pluck the fish out of the sea that you uh, wish to, to get, and we’re continuing to in fact, uh, expand that part of our site, um, by the summer there’ll be 100 of these primary source sets. And what we’d like to see as a kind of sign that this is working is um, really seeing the, things like that, those curated sets actually on syllabi in the classroom being used and in fact as soon as we launched the first 30 of those we saw just a spike in interest in uh, DPLA. We saw uh, all this chatter on teacher institute networks about oh this is this great resource, so we understand that we need to do more of that, and indeed we will be doing more of that in the near future.

HEFFNER: I’m smiling listening to you because one of the lessons that I’ll always carry forward from those beginning days of Google was when a librarian suggested that you type in your search, whatever you’re putting in the search engine, and then you insert “site:.edu.”

COHEN: Ah yes.

HEFFNER: That trick,

COHEN: Right.

HEFFNER: Which I hope is still being taught in, in high schools, in middle schools and even elementary schools,

COHEN: Yeah. Yeah.

HEFFNER: I mean for me it was the introduction to Google was elementary school.

COHEN: Yeah.

HEFFNER: And uh, it, it seems to me,

COHEN: Yeah.

HEFFNER: That DPLA is really trying to provide a fertile home,


HEFFNER: So that if you’re not going to Wikipedia you’re going to DPLA,


HEFFNER: And you’re inserting a, a search,


HEFFNER: And you’re finding raw materials.

COHEN: Yeah.

HEFFNER: For document-based learning.

COHEN: Right. Right. You know, that was such a great hack, you know, I taught history for 15 years and I in fact taught the same thing to my students, to restrict, restrict their search to the .edu domain, uh, in the US. Um, DPLA is that writ large, right? You know that when you come to DP.LA you are getting materials that are trusted, that are vetted, um, that are from trustworthy institutions, again um, who, who wouldn’t want to get materials that, that they know are verified, that have been scanned and described by librarians, by archivists, by museum curators? Um, so it is one-stop shopping in that way and um, we’re really excited I think in effect that we can provide so much material, um, that is trustworthy. I think that’s essential and I think that um, kids, um, are looking for this material. I think in the early days of the web, you know, you mentioned Wikipedia, I remember ten years ago my fellow historians were complaining it’s unverified and you know, who knows what this is? Um, we’re trying to direct, uh, students elsewhere. Um, you know, I think a decade later obviously it’s gotten so much better, um, it’s been fine-tuned, there are historians who help write Wikipedia articles, uh, now but I, I still think that act of trying to find trustworthy materials is essential, I think kids are taught digital literacy now and uh, where to look, and we want to be one of those key trusted resources.

HEFFNER: In the few minutes we have remaining Dan, let me just go back to that question,

COHEN: Sure.

HEFFNER: Let me try again. In terms of obstacles,


HEFFNER: I think your viewers, your benefactors, the people who are interested in furthering your long-term objective want to know, not to shame anybody but in terms of this rights-based convention that you’re setting up in this 2016 year,

COHEN: Yeah.

HEFFNER: Um, what, what is the best possible outcome,

COHEN: Sure.

HEFFNER: In your mind that the corporate interests and the scholarly interests,

COHEN: Yeah. Yeah.

HEFFNER: Can find a uniform consensus around…

COHEN: Sure.

HEFFNER: These orphaned materials and the rights at large?

COHEN: Yeah. Right. Um, so you know, my feeling on this is that copyright has always been a balance, right? And I think there’s a lot of people like me, right? I’m, I’m an author with books and I, you know, I see it from both sides. I want corporate interests, publishers, authors, et cetera to feel comfortable about the state of copyright. I also want students and teachers and the general public to get maximal access. And I think there we just have to think about working together. I think on both sides of that aisle, um, toward a kind of collaborative position to maximize access. I think clarifying rights and, and just again, return full circle. I think just that act alone of in fact clarifying what the rights are around an object and then to push for at least the most liberal version, the most liberal reading I think is the best way to go, right? If there’s an item—

HEFFNER: What do you mean the most liberal?

COHEN: Yeah, so I think,

HEFFNER: The, the greatest movement towards democratization?

COHEN: Right, I, I think we all want, right, democratized access. I think it’s a critical part of being a citizen in America that we have 16,000 libraries, public libraries in the US that we get access to and so I think as we move into a digital age we also want maximal access, obviously within the bounds. We don’t want to give away material that’s very clearly under copyright, but I think there’s a lot of material in that gray area that no longer has commercial life, that um, is in kind of a murky zone, and what I would say is I think for those materials it makes sense to at least be liberal to a point at which um, we can say we’re gonna provide as much access as we can to this. I think if someone steps forward and says, hey, wait a second, that’s uh, I wrote that, please take that offline, we can do that very easily. It’s actually very easy with digital materials, unlike printed materials, to in a sense remove it from the public sphere And so I would hope to just push against that boundary a little bit because I think if we get too restrictive, right, if we’re a little too worried about um, one side of the balance, right? Then we end up being, again, risk-averse and we end up being a little bit too tight-fisted with what is in fact our shared culture, and we want to provide access to that.

HEFFNER: When you see pushback, Dan,

COHEN: Yeah.

HEFFNER: Is it pushback as a function of institutions wanting to make money from visitors and those billing hours going to the museums and archives?

COHEN: [LAUGHS] Yeah. Right. Um, so there’s maybe a little bit of that I think for institutions that may not want to join DPLA, they see licensing dollar signs I suppose. But I would say that’s relatively minimal. I think a lot of institutions, and you can see that by our growth, understand there’s, there’s maybe not a huge amount of revenue there but we’re perfectly okay with institutions, you know, retaining that right. I think you can provide access. We in fact have copyrighted material on our website. Um, so people can view it online but the institution retains the right, in fact, to make money from that.

HEFFNER: Are there any institutions that say to you we don’t even want to put something in, in full view?

COHEN: Right. We have institutions that have, have not joined. Um, but to be honest it’s a, it’s a very small minority and I think once people grasp what the Digital Public Library of America is doing, what our overall mission is, they understand the ethical mission that we’re involved in, um, I think that the ice melts very quickly. And we get a lot of people on board understanding that they can both provide access and also retain sustainable models of funding as well.

HEFFNER: When you look at the Boston Public Library or the New York Public Library, their headquarters,

COHEN: Yeah.

HEFFNER: Uh, you see the embodiment of what you’re trying to admirably create online.


HEFFNER: Finally do, do you think that there is a wrong perception that um, public libraries are somehow inferior to those of, of private institutions?

COHEN: I’ve been so lucky to be at very privileged institutions that have huge libraries and um, you know, this in a sense career change for me to, to lead the Digital Public Library of America I think has been um, very important and very eye-opening for me to understand how big a gap there is and how important the notion of a public library is and how important it is to maintain that notion into a digital age.

I think we have uh, very quickly crept into an area where um, you know, e-reading is going on, Amazon has 65 percent of the e-book market, um, we’re very quickly moving into a phase where um, things like public libraries, um, are being challenged, and I think that we need to think very carefully about how we maintain that maximal access. I think public libraries are a critical institution and we need to think about really through institutions like the Digital Public Library of America how we maintain, uh, that mission.

HEFFNER: Dan, thank you for joining me today.

COHEN: Great to be here.

HEFFNER: And, thanks to you in the audience. I hope you join us again next time for a thoughtful excursion into the world of ideas. Until then, keep an open mind. Please visit The Open Mind website at Thirteen.org/openmind to view this program online or to access over 1,500 other interviews. And do check us out on Twitter and Facebook @Open Mind TV for updates on future programming.