Kara Van Malssen (NYU) and Karen Cariani (WGBH) represent the PDPTV project at IASA 200940th Annual Conference of the International Association of Sound and Audiovisual Archives (IASA)
20-25 September 2009 in Athens, Greece

by Kara Van Malssen

the author (left) and Karen Cariani (WGBH Media Library and Archives) in front of our poster on sustainability issues for public broadcasting preservation

From 20-25 September 2009, the 40th annual conference of the International Association of Sound and Audiovisual Archivists was held at the Megaro Mousikis (Athens Concert Hall) in Athens, Greece.  The theme: “Towards a new kind of archive? The digital philosophy in audiovisual archives.” After a full week of committee meetings, presentations, and tutorials that provided perspectives from all corners of the globe, the conference really did begin to arrive at a shape for the new kind of archive, one with good digital collection management, a clear sense of ethical responsibility, wide accessibility, and that puts users at the top of the priority list. It seemed that in a shift from other conferences in this field in recent years, where the focus has been on the overwhelming need to migrate physical media to digital and on the various issues that come along with that, the focus is now on how to best take advantage of the opportunities that digital archiving offers. The future is perhaps not so bleak as we might have thought a few years back, but it shouldn’t come as a surprise that to succeed in the digital era a few changes will be required.

The keynote talk, delivered by Edwin van Huis (Cultural Entrepreneur, former Director of Netherlands Institute for Sound and Vision, and former president of FIAT/IFTA), began on this note with a question to the audience: “What is a good archive?” And the answers, both from the attendees and the speaker himself, weren’t just about having good storage and good cataloging. These things are not necessarily a number one concern in an age where users are able to access (seemingly) all the content they want, on the web, at any time. The key is for archives to maintain relevance.  A good archive is one that is connected to users, providing many points of access, and allowing for different relations and meanings to become attached to content. A good archive is one that knows and works with its users, and changes according to their expectations. Because if you don’t keep up with them, they’ll go somewhere else.

To me, there were three overarching themes throughout the conference: the convergence of traditional domains, aggregation of content, and collaborations; new approaches to metadata; and ethical, legal, and moral issues of online access.

Convergence, Aggregation, Collaboration

Convergence is an important topic for heritage institutions in the digital age. This issue was the subject of panel early in the week, and the questions and concerns raised there continued to resound throughout the conference. Many speakers noted that users don’t necessarily care whether something is a video, film, photo, or text — they just care that it relates to a subject. Where then does that leave museums, libraries, and archives? What unique function do they have? And what role then, does the audiovisual archivist play?

In the digital world, the distinction between libraries, archives, and museums as sources of original content is disappearing. Furthermore, the traditional hierarchy of special collections like original print manuscripts over television and other mass media mediums doesn’t really hold anymore. The newly launched uber-content aggregation network, Europeana, provides a case in point. Europeana offers a federated search across digital collections from Europe’s archives, libraries, and museums. Although Europeana allows you to refine your search by language, country, date, provider, and media type, a simple search will give you a potentially enormous range of results from all over the continent, in all media types. At first glance, you can’t even tell where the content came from, and it doesn’t really matter. Suddenly a scan of someone’s diary and a video of that person being interviewed on a newscast does all feel in a way the same: its all content about that person, its all digital. New connections can be made to different pieces content from different places as never before, allowing revolutions in research. It is certain to change the way people find and use material from the traditional keepers of heritage.

Other content aggregation were presented, including the European Film Gateway, another EU-funded initiative that aims to develop an online portal to nearly 800,000 films and related objects from across Europe. Similarly, the recently completed VideoActive (yes, also EU-funded) project collects television programs from broadcast archives across Europe. Both are powerful tools for finding (in these cases, media specific) content and information from varied and disparate institutions.

These concerns and celebrations of convergence as a result of content aggregation portals were focused on access and user interaction. What about preservation? Is there a role for a networked preservation effort in Europe as well? The answer is a loud and clear yes, and the new PrestoPRIME project is a consortium that will serve that precise function. Initiated in 2009, PrestoPRIME is the successor of the important PrestoSpace project that ended in 2008. The group of Europe’s leading AV archives will research and develop solutions for long-term digital preservation of AV media, and will deliver a range of tools and services. They are also creating a networked competence center, which will be a vender-neutral information, resource and advisory organization.

Indeed, Europe is leading audiovisual archives into new digital territory, one that fully exploits the advantages of digital platforms and networked solutions. The rest of the world should keep a close eye on these ambitious projects, especially Europeana and PrestoPRIME, as they develop in the coming months and years. While other countries and regions might not have the support that will allow them to replicate these mammoth efforts, there will certainly be outcomes and lessons from the Europeans that we can all use.

There are interesting (albeit smaller) projects happening outside of Europe that are using collaborative models to build great archives. One of these is the Alan Lomax Archive / Association for Cultural Equity, which is now a completely digital archive. Users are provided with a few different entry points for searching and browsing recordings and associated metadata, including a GeoArchive. The Association for Cultural Equity continues to foster its mission through collaborations with the archives in the regions where Lomax made recordings of the local musicians. By repatriating recordings to the original communities, the ACE builds and strengthens its network and expands its reach.

The Naad Media Collective is another fascinating cooperative endeavor to collect and archive sounds and images that are becoming extinct in a rapidly developing India. Members of the collective record endangered sounds and images, and share them through peer-to-peer networks. The recordings are available through their website. While not a preservation project per se, the group is collecting some wonderful sounds, such as the crackling of bamboo trees bending against each other in the wind, and a snake breathing.

New approaches to metadata

I heard a few of very provocative, related questions during the 5 day event that concern metadata:
Are metadata standards and structures like FRBR a thing of the past?
Will Google enable free text searching of everything?
Should we be looking at linked data instead?

There were a few presentations that seemed to address these issues, in particular, the talk given by Sam Coppens called “Semantic Bricks for Performing Arts Archiving and Dissemination.” This was a report on the PokuMOn (Performing Arts Multimedia Dissemenation) project that seeks to create a de-centralized archive for performing arts organizations in Belgium. The goal was to find a common metadata model, but combine it with each organization’s own model. The problem was that so many metadata models were being used, it was not ideal to map them all. Their solution was to store the metadata records as data, and use a descriptive layer to search over all the records and display limited results in Dublin Core. Users were then able to link to the original record and see the detailed records. Their other strategy was to use the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) protocol for the description and exchange of aggregations of web resources in RDF. The records can then be published as linked open data, allowing machine readable interpretations of metadata. By automatically generating RDF and linking data to DBPedia and GeoNames they were able to enhance the datasets using these sources. Mr. Coppens recommended the OpenCalais toolkit to automatically create rich semantic metadata. I’m looking into it now.

Another speaker on the same panel, Michael Fingerhut, from Ircam - Centre Pompidou in France pointed out that while most people reach their Musique Contemporaine website in the middle of a catalog record via Google search results, and they rely on this for traffic, Google isn’t actually very good at indexing large databases. Sounds to me like another reason to publish catalog records as linked data, and allow semantic search engines to help bring users to archive websites.

I heard at least 3 institutions report that the traffic to their websites went up by a large percentage once they put links to their sites on relevant Wikipedia pages, and that for the National Film and Sound Archive of Australia, Wikipedia is now their #1 source of traffic.

So I suppose the answer to the above questions so far is — there’s an important role for all three.

Library of Congress’s Carl Fleischhauer also gave an interesting presentation on the US Federal Agencies AV Digitization Working Group, a consortium of government agencies working to create standard guidelines for digitization. One of the important differences between their work and existing guidelines, such as IASA TC-04, is that they are emphasizing the need for embedded metadata, that is, certain elements of descriptive information to be encoded into digital files. This is important in a digital world, where a piece of content can easily become separated from its source, context and catalog. If supported by software and web tools, embedded metadata will enable users to know some basic information about the content, like where it came from and what uses they are allowed. This is already happening in the world of still images, with standards EXIF, IPTC, and XMP, and software manufacturers are supporting this shift by making it easy for users to both read and write metadata. But the take up is slow with digital moving image and sound. Perhaps with the US Government pushing on vendors, we might see some changes in this area.

Ethical, legal and moral issues for online access

Quite a large portion of the conference was devoted to issues of ethics and intellectual property rights. And right fully so: now that archives have so much digital content that they can make available online, how should they best protect the rights of the creators while at the same time providing access to users? A number of presentations offered guidelines and best practices for online archival content.

In her presentation “Guidelines for the Reproduction and Sale of Digital Heritage” during a panel on Ethics and Archival Practice, Diane Thram, Director of the International Library of African Music (South Africa) reported on guidelines that were developed during a seminar recently held at her institution. The group’s conclusions included: applying ethics to the use of digital heritage (and looking at relevant ethics statements from professional associatio ns and organizations), respecting substantiated objections to online access, providing open access to low resolution watermarked excerpts so that they cannot be abused, and always respect the rights of the performer.

As another example, both Brigitte Vézina from WIPO, and Janet Topp Fargion of the British Library Sound Archive, discussed the Legal and Ethical Usage disclaimer on the British Library’s website that warns users not to infringe on the rights of indigenous and local communities. Much of WIPO’s effort to protect Traditional Cultural Expressions (TCEs) arose after the case of the music group Deep Forest, who obtained recordings from an ethnomusicology archive and remixed them without acknowledgement of the source or compensation to communities that were the original performers and creators.

In their tutorial on “Online Audiovisual Collections: Legal vs Moral Rights,” (created by Shubha Chaudhuri and Anthony Seeger (UCLA), and delivered by Mr. Seeger with the help of a fine troupe of archivist-actors role playing scenarios) the authors again touch on the issue of TCEs and the rights of the creators of folklore. This is a big concern because while folklore often doesn’t fall under copyright, the creators have rights that fall into murky territory. They note that under new WIPO agreements, indigenous peoples may be getting more rights to their intangible heritage. Careful consideration of the material, the creators, and the rights must be considered before posting things online. In some places, such as Australia and the south Pacific, communities have very strict rules about who can have access to certain types of knowledge — some songs are only for men, some ceremonies are only for women, etc. Consult as many people as possible before posting AV media online, and be prepared for the possibility of removing it if contested. The Documentation and Archiving the Performing Arts website of the American Institute of Indian Studies, Archives and Research Centre for Ethnomusicology has some good information on this topic, including forms for performers to help them understand their rights.

A few other things worth checking out:

2nd Edition of IASA-TC04 Guidelines on the Production and Preservation of Digital Audio Objects
has been published. This is THE definitive guide on digitization and preservation of audio, now much improved.

VIDI-Video: European research consortium, developing a semantic search engine for video and a “1000 element thesaurus for automatically detecting instances of semantic concepts in the audio-visual content.” Aimed at improving indexing and retrieval practices of broadcast archives. Very cool.

Spectaclesdumonde.fr - Portal of traditional and world music from France, with nice geo-interface.

We Know It Project - from Athens and their Visual Image Retrieval and Localization Tool

And of course there is the lovely poster presentation we gave on Strategies for Sustainable Preservation of Born Digital Public Television (30″ W x 45″ H).

Analyzing Sustainability

One of the final projects of Preserving Digital Public Television is an assessment of sustainability for television archives in the public broadcasting system.

While our final report is still in process, we’ve found a number of issues related to sustainability that are not unique to moving images and television, but which seem likely to reflect much broader concerns over time. Among them:

  • Rights management. Television and moving images involve more rights holders than most other types of digital material, and the looming issue is the enormous cost of locating, negotiating and paying for huge collections of underlying rights materials incorporated into thousands of local and national productions. Even the Library of Congress  ”identified copyright as a potentially serious impediment to the preservation of important digital collections and recognized that solving certain copyright issues was crucial to achieving long-term preservation of important digital content.”  
  • Economics.  The tendency in television archives is to see the collection primarily as a potential source of income.   Yet there is both monetary and non-monetary value to our collections, especially when measured against our mission of promoting education.  With no existing commitment within public broadcasting to fund preservation (at least right now,) we are scrutinizing our existing funding streams and operating models for potential new models of generating financial support.
  • Metadata. The possibilities for describing the contents of television broadcasts are still evolving, and questions remain about how best to do so.
  • Preservation quality files. Format complexity, lossy compression, and a  wide gap between preservation and access copies all raise quality concerns. Future migration of archived works will involve not only moving from tape to disk to other physical media, but also from one image format to another. The preference to preserve the highest quality image, the potential for loss, the relatively smaller size and costs of storing compressed files vs. uncompressed, and the need to make works available in many different viewing formats are difficult issues for archivists to resolve.
  • Scale. Even moderately sized collections of moving images require petabytes of storage.  Even so, we are projecting that over time, while costs for storing collections will continue to drop, long-term operating costs will rise, based on the need to maintain personnel, refresh the holdings, and keep the lights on.

In early 2009, we’ll be publishing our full report on sustainability. In the meantime, you can get a sense of what we are thinking from the resources page. And for more background, the Interim Report of the Blue Ribbon Task Force is available for your reading pleasure.

Web Crawl Update

As part of our digital preservation initiative, we have saved copies of the majority of websites related to the public television system in 2007 - more than 300 websites of stations, program productions, and related organizations. 

Working with the Internet Archive, we will be transferring our 5 terabytes of data to the Library of Congress in early 2009. You can read more about this work on the project page.

Marcia Brooks on PBcore

Current, “the newspaper about public TV and radio,” has a very nice two page article about PBcore by Marcia Brooks, who helped develop the proposal for CPB funding of the PBCore project and directed the project at WGBH for most of the last six years. Some of the main points:

  • Almost six years ago CPB had the foresight to fund the development of a metadata standard for the multimedia, multiplatform present and future of public broadcasting.
  • Frontline and The NewsHour with Jim Lehrer are using PBCore as the basis of the Frontline/NewsHour video database and are using a modified version to let web users click for “related video.”
  • If you go to the National Educational Telecommunications Association Conference next month, check out the PBCore session Thursday morning, Jan. 24, about PBCore in three stations’ real-world workflows. There are many more uses of PBCore in the field, including more documented case examples on PBCore.org.

The Dutch national audiovisual archive Beeld en Geluid, and its partners in the Images for the Future project are in the process of digitizing 22,500 hours of film, 137,000 hours of television, 124,000 hours of radio recordings, and 2.9 million photos. They have received an investment of 154 million euros from the Dutch government that will be spread out over 7 years.

There are a number of very interesting aspects to this project that others, especially in the U.S., could learn from.

The new Beeld en Geluid building (which is beautiful and dramatic), has apparently inspired some donors of archival material to trust them and to want to contribute. Its museum had 250,000 visitors in its first year, which in a country of 16 million people is huge.

Visitors typically stay for four and half hours; most people get really involved in watching news and entertainment footage from their youth. Visitors are invited to assemble and read the news, or decide about the best programming for saturday night. They have a great promotional film that they show in the museum.

The back story on their funding is instructive. The Images for the Future project was able to obtain funding on a large scale by applying the same style of economic analysis to the archive as is applied to other government funded infrastructure. A study conducted on their behalf by a management consultancy concluded that “The present value of the total user benefits of the plan is approx. E176 million. Compared to these benefits, the costs drawn up in euros in 2006 amount to E148 million…” This kind of thorough, sober, cost benefit analysis of educational infrastructure is something U.S. organizations could emulate. It also forces the project to generate benefits. E 19 million has to be earned by the project partners, an outcome of the cost benefit analysis.

They are also engaging rights holders (and thus mass clearance) in a very interesting way. First, they are committed to ensuring that all rights are completely protected. But they are making their catalog available online, and for right holders that want it, they are linking catalog entries to stills, low res, or hi res moving images.

This seems like an excellent way to satisfy rights holders, and to gradually make most their collection accessible online — it seems a foregone conclusion that sales of the material by right holders will be increased if people can browse it first.

Like everyone, they are grappling with questions related to metadata, from lack of metadata on most tape cases, the habit of some producers to cram tapes full of random segments, and in harmonizing information from 150 (!) different databases that have been built over the years.

They are going with MXF & MPEG-2 at 50 Mbps for most television material, and MXF & MPEG-2 at 30 Mbps for news.

They are launching a project to link their materials to other “trusted,” repositories e.g. maps & newspapers, for educational purposes. On top of these repositories, student will get tools to work wih the materials and teachers will get tools to develop lessons. Sort of like Teachers’ Domain.

As with the PTV Digital Archive, they are going to start working on capturing web sites associated with programs and films, but there are challenges related to capturing streams and some AV files.

For other archives grappling with questions of file formats, metadata, selection, and budgeting, the approaches taken in the Netherlands can serve as a useful model. There is more background from the New York Times in a recent article, “Heaven, Hell and Purgatory, Encased in Glass.”

At long last! Dave MacCarn’s outstanding survey of file formats and practices in public television production and distribution is now available.

Survey of Digital Formatting Practices in Public Television

We hope this will be useful to organizations that are facing the sometimes daunting prospect of making choices about how to archive video files for preservation.

Please contact us with questions or comments about this or any of our other materials.

Last week, I participated in an all-day seminar hosted by the Museum ofthe Moving Image in Queens. It was sponsored in part by a grant from the Institute for Museum and Library Services (IMLS), which provides federal funding to libraries and museums.

IMLS has been supporting an impressive range of innovative projects to help their constituents embrace the challenges of the digital world. This seminar was called: Open Collections: Exploring Online Cultural Resources, and the program focused on managing collections on-line.

A key goal was to promote the Museum’s free open source collections management application - ‘Open Collection.‘ IMLS supported developing this software, and from my untutored eye, it appeared many institutions might find it a useful tool, although it is not suitable for us. (The folks from the Museum of Natural History were quite enthusiastic about it.)

Even so, I was very happy to be reminded that there is a creative and functional world of digital catalogs and collections outside our rather narrow broadcast focus.

The CORSAIR system at the Morgan Library was nothing short of elegant, and the Virtual New York City project, underway at the New Media Lab at the City University of New York (CUNY) Graduate Center was just one example of an on-line collection as a work-in-in progress.

All the examples demonstrated that large, complex aggregations of digitized resources can be easily used — if they have well-organized databases/content management systems running behind them. It made me long for a few million dollars so we could produce something similar.

The seminar also emphasized the recent and growing shift away from local computing over to ’superior’ web-based applications. OK, I got that message. These tools are coming up fast, and they offer flexibility and functionality, plus the opportunity for sharing tasks much more broadly.

But the shift is hardly seamless. How do our disparate databases, running on all manner of systems both local and web-based, communicate with each other, not to mention maintain file integrity and security? (Hey, isn’t that C3PO’s job???) And how do we pay for it? Clearly they have a way to go yet, and no one told us what was under the hood.

Also, I couldn’t help thinking about broadband access, and how this is based on the naive assumption that everyone has super-fast downloading capabilities. (Not so.) Or that small, sophisticated, hand-held devices will continue to roll out and alter how we experience the internet.

Mostly, though, the seminar reinforced my conviction that public broadcasting should mobilize our army of volunteers to help describe and catalog our video collections.

To me, descriptive cataloging remains one of the biggest hurdles we have to clear in order to make our materials findable, searchable, and truly accessible.

Alas, we will never have a crew of professional librarians and catalogers paid to tackle this problem. But wait — we already DO have thousands of well-educated volunteers and supporters at local stations around the country!

What with so much ‘tagging’ already going on, it’s not much of a stretch to develop a participatory plan. And with a web-based application, folks doing the work could reside anywhere.

To develop a credible and useful cataloging system, of course we’ll need standards. We’ll also need common database templates (based on PBcore), and we’ll have to create a structured process based on professional criteria.

Then, voila! If we provide training, review and oversight, we can let folks go at it. I imagine many volunteers would be delighted to participate in such a collaborative, substantive effort.. And no doubt, some of them would be excellent.

I left the seminar filled with ideas about how to get started — what could go into designing a pilot project and who might help us.

Much more exciting than free software - thank you IMLS and Museum of the Moving Image!

For project team members out on the road and in need of a project summary on paper, we now have a spiffy new Project Summary.

Everything Old Can Be New Again, Nan Rubin’s article in Current, the newspaper about public TV and radio, offers an overview of PBS archives, plans for the future, and advice for station managers concerned about preservation and access. Here’s an excerpt:

It’s time to get over our wasteful habit of letting our programs vanish forever. We’ve got decades of national and local productions sitting in storage, and the public is hungry for them. Making programs accessible will generate great goodwill, new audiences and new funding.

Thirteen/WNET New York invites you to check out the finding aid for our newly remastered landmark public affairs series, The 51st State.
http://www.thirteen.org/the51ststate/

On the air from 1972-1976, The 51st State began as a nightly news program with a mission to present in-depth and thoughtful reporting of regional issues.  During this period, New York City was struggling with the national traumas brought on by the civil rights movement, women’s liberation, and the Vietnam War, as well as facing a rising crime rate and heading towards the largest financial crisis of its history (“Ford to City: Drop Dead”).

 

The program was noted for an unorthodox journalistic style and covered a wide range of subjects, from a town hall meeting of youth gangs in the Bronx and the pollution of the Hudson River to statewide hearings on abortion legislation and the New York City take on such national issues as pornography and the war in Vietnam. 

 

Unconventional from the start, Jack Willis, the series’ Executive Producer, hired a combination of experienced print and television reporters along with a selection of completely inexperienced but eager young journalists.  He gave The 51st State added credibility with the hiring of Host and Editor, Patrick Watson, who had long been regarded as the foremost television interviewer and public affairs program producer at the Canadian Broadcasting Corporation.

 

The program was given an unprecedented amount of editorial freedom and jumped right in to exploring contemporary urban concerns.  This resulted in fresh and creative coverage of the people and issues that made up New York City, affectionately known as The 51st State.  Nat Hentoff stated, “This provocatively unpredictable nightly news show [The 51st State] is beginning to present a formidable challenge to print journalists while leaving the other local television news operations a light-year behind (The New York Times, April 2, 1972).” 

 

This project was made possible thanks to a grant from the NATIONAL HISTORICAL PUBLICATIONS AND RECORDS COMMISSION (NHPRC).  It is the first in an ongoing initiative at Thirteen to preserve important programs from our library of 30,000 videotapes. 

 

Stay tuned for announcements about other collections!
http://www.thirteen.org/the51ststate/
 

                                                                         
For more information about viewing the programs, please contact us:   archives@thirteen.org

Winter Shanck
Archival Media Librarian
Thirteen/WNET
450 West  33d St.
New York, NY 10001
212-560-3067