Report from the Hydra Project Partners Meeting, Dec. 7-9, 2011 at Stanford University
Ithaca, NY Earlier this month 20 Hydra Project Partners, current and prospective, (http://projecthydra.org) met in Palo Alto to share demonstrations, institutional updates and discuss ongoing development issues and priorities.
In 2008 the University of Virginia, the University of Hull and Stanford University got together to talk about their shared assumptions that no single application would meet the full range of digital asset management needs, and no single institution or provider could resource the development or maintenance of a full set of solutions for the same needs.
The Hydra multi-institutional collaboration emerged from those early discussions. Hydra partners have developed a common, open source framework for multi-function, multi-purpose, repository-powered applications that is currently in use at several institutions. Hydra's architecture reflects a "one body, many heads" design: a robust digital repository (the body) anchors feature-rich applications (the heads), tailored to content-, domain- and institution specific needs and workflows.
Hydra's technical framework features the Fedora Repository on the back end, with a front end comprising Ruby on Rails, Blacklight, solr, and a suite of web services. Hydra software is free and open source and is available under an Apache 2 license.
Work in the last quarter, across the partners, has focused on moving code to be Rails 3 compatible and HTML5 compliant.
University of Virginia, Libra (http://libra.virginia.edu/)
An open access deposit & access application
Libra, UVa’s open access Hydra head, is built on Blacklight and provides the Uva community with an online archive of University of Virginia scholarship.. Current in-process upgrades to the Hydra stack include adding electronic thesis and dissertation submissions support, as well as data set management capabilities will be completed prior to engaging faculty in submitting content.
Data modeling and policy issues around managing very large data sets remain significant area of ongoing investigation.
Recent improvements include a combined catalog and article search in Virgo, Uva’s Blacklight-based catalog
Northwestern Digital Image Library–DIL (https://images.northwestern.edu/)
An image database that houses the Visual Media Collection digitization project and the Saskia Art Images collection.
Northwestern University Library (NUL) has developed repository applications around Fedora, supporting delivery of a variety of archival materials for several years. The Digital Image Library (DIL) code at Northwestern University is being migrated from Hydrangea to a Rails3 Hydra Head (about 75% migrated). The basic functionality is designed for multi-resolution images. With DIL users can create new collections via drag and drop and rearrange images in collections.
The crop tool zooms in and out, rotates, and creates new cropped images, which in turn creates new Fedora objects. The image upload feature connects to the image processing workflow.
In other work, Northwestern aslo referenced findingaids.library.northwestern.edu, a Blacklight implementation displaying various indexed facets with a Fedora dissemination for each component in the EAD. Archon is the collection management system. EADs of finding aids are exported, then ingested into Fedora. NU also has another system for scan and production management (PSDS) that will be migrating to Hydra—possibly with code shared with Stanford’s Argo system (see below). A long list of improvements is on the horizon.
Stanford University, Hypatia
An archival arrangement, description and access tool for born digital archival materials
Currently 14 Hypatia collection records and 7 of collections which contain born-digital materials. For Stephen Jay Gould materials, for example, the Library received a stack of floppy disks that required the use of the Forensic Tool Kit (FTK) to extract contents.
The workflow has been changed so that EAD accessioning into Hypatia starts with Hypatia allowing for enhancements. They hope to be able to export an EAD file from the application.
Dushay demonstrated creating a new asset via Hypatia by dragging and dropping to edit relationships and add new sets. The CSS indicates when there are unsaved changes. Most of what we're seeing is the result of two months of heavy development. The code is frozen pending discussions about what happens next. May bring Hypatia into Stanford, load in all the objects for archivists, and make it look more like browsing a finding aid would look.
Stanford University, Argo
A repository administrator's dashboard for workflow management and reporting
A demonstration of Argo, which provides a window on administrative and in-process side of digital library and management of DOR (digital object registry). Plans are add preservation management functions as well (from SDR--Stanford Digital Repository). Argo is currently primarily a Blacklight application, leveraging Fedora, solr, and ActiveFedora, and integrating with Stanford’s “workDo” workflow system.
The first digitization step, or to add something to the system, is object registration–project name, select type (assigns an administrative policy), select worfklow, select metadata source, and select editor form that can assign tags. The user then sees a spreadsheet-like display (metadata ID, SourceID, assigns DRUID) that can generate tracking sheets for managing physical items and front-ending API calls to register things programmatically.
Developers on specific projects can bypass this form and do things programmatically. In the admin interface tags are organized into a facet (that is registered by AdminPolicy, etc.) other facets for object type, content type, owning collection, and workflows. Can rotate facets so that process is second level, status is top level.
There is a Fedora datastreams view (pops up styled XML), life cycle view that is a “light box” showing the details of all the steps in the workflow including errors in context. Future work will include a graphical view of the workflow with prerequisites.
Indiana University, Variations on Video (VoV)
This is a new, jointly developed Indiana and Northwestern project that was recently funded. There is interest in building a digital media management system for diverse types of collections with broad access needs, from archival video/film collections to teaching collections, public access, restricted access, etc.
Dunn provided a demonstration with the Opencast Matterhorn component, which is primarily used for lecture capture, but which includes a video processing workflow based on ffmpeg wrapped in OSGi services to define a video processing pipeline: validation, transcoding, speech-to-text, fully configurable and scalable across multiple servers.
Chris Colvard has been playing around with basic integration between Hydra and Matterhorn (code named Hydrant) based on 3.14 of Hydra. The default metadata interface showed a brief 4-minute video upload attached to a basic workflow to take files, and convert to Ogg theora as a test. The example showed a file being handed back from Matterhorn to Hydra. The goal is to be able to deliver to a streaming server or delivery platform, rather than streaming from Fedora, which is what is what was demonstrated.
Indiana has been working with Fedora for a long time and has built out some of the things being discussed here but not in a Hydra environment–finding aids, born-digital archival material, and digitized GPO content.
Indiana is also doing a number of projects with Blacklight: a cross-collection digital search (in beta) with Blacklight to integrate content from image collections and music combined as MODS derived via XSLT from native format (such as EAD). Indiana is interested in exploring how to integrate with Hydra community.
Rockhall: Rock and Roll Hall of Fame and Museum (http://rockhall.com/)
Video cataloging system
Adam has been at the Rock and Roll Hall of Fame and Museum for two years and was tasked with creating an asset manager for all the libraries. Rockhall is now an active Hydra site and things have been moving quickly. The site is now in production, ingesting video–about 10TB worth so far–more coming in every week.
For ingestion Rockhall has has partnered with George Blood in Philadelphia to package into SIPs and ingest into Hydra which then creates objects in Fedora. The initial metadata comes from the vendor, is then enhanced with additional information and rigged up to a Wowza streaming server. It’s all managed in PBCore–reviewer information and two digital video data streams with accompanying technical information. Both come from the vendor and from media information–about 80 fields. Plans are to automate this generation as much as possible. Rockhall needed a process for reviewing content as it comes from the vendor since many of the works did not have complete metadata. There were also decisions needed to specify who has access to content. Logging in as a user part of a reviewer group (specified in role mapper file) provides the same search interface but adds a review column, which offers a license selection field. This is in addition to the permissions scheme, so there are two separate methods for specifying access. Eventually license selection will be linked to a Fedora mechanism.
Rockhall is also building a Blacklight interface to presert EAD and MARC data. Video will be restricted to the museum and library, but will show the public the fact that videos exist. Rockhall expects to do a soft launch in January 2012 with a grand opening in April to correspond with induction ceremonies.
MediaShelf has been hard at work on basic infrastructure–getting Hydra into Rails3, stabilizing it, and then stabilizing the release cycle along with documentation and HydraCamp. A lot of the work has been done on laying the groundwork for community participation and collaboration.
Other development includes horizontal scaling: bulk-processing hundreds of thousands of objects resulting in multiple writes back to SOLR. Some development addresses the Fedora core and helping to stabilize and scale Fedora. Future work on will involve indexing hundreds of millions of objects into SOLR in a scalable way with the goal of being able to support many simultaneous uses/users/processes/updates.
There is a lot of interest in Hydra in communities that MediaShelf serves. Hydra heads are strong, compelling solutions in the specific spaces for various users. In addition, the fact that Fedora underlies the stack makes it a stronger choice.
University of Hull, Hydra in Hull (hydra.hull.ac.uk)
Hydra in Hull is a fully featured institutional repository solution aimed at making available the full range of Hull’s digital output. As such, it can handle content ranging from journal articles and electronic theses through past examination papers and committee minutes to learning materials and much more. Multiple levels of security are enforced such that the public can see some of the content, students rather more and staff more still. Ingest is via a three-stage workflow. Creators can put together a repository object containing metadata and a file or files of content. This is then transferred to a quality assurance (QA) queue where it will be checked over by library staff who, when they are happy with it, transfer it to the repository proper. Hydra is not the only feed to the repository and other University systems are capable of depositing objects into the QA queue. The repository supports display sets (to provide context to groupings of objects) and structural sets (a management function allowing groups of similar objects to be dealt with simultaneously).
In time, each different type of content will have customized create/edit/display pages appropriate to their needs; at present many of them are dealt with via a generic approach until such time as specialization can be developed. Further work in the immediate future will focus on enhanced provision for datasets and images.
Hydra in Hull has been the University repository since September, taking over from a Muradora-based system. All pre-existing content has now been transferred.
University of Notre Dame, Seaside Research Portal
Rick Johnson first demonstrated the Seaside Research Portal to provide context for the digital exhibits work going into Atrium. The Seaside Research Portal is a Blacklight application pointing to a Solr index also connected to a separate Hydra Head that manages the collection in Fedora. All of the content appearing in the portal comes from Solr except for content such as images which come from Fedora. The Seaside Research portal is a view into the Archive housed at Notre Dame for Seaside FL, the world's first New Urbanist Community. The portal mixes content backed by both VRA Core 4 and EAD metadata and allows exploring the town via essays about Architects, searching via a map, and animating over multiple overlays of the Town plan to see how it developed over time.
The site was soft launched at the end of Sept. 2011 and will do an official launch in January with many features now going into Atrium included.
University of Notre Dame, Atrium
A Blacklight rails 3 gem (plugin) for building Digital Exhibits on top of any content in a Blacklight solr index
A Blacklight plugin that is like Omeka for Blacklight–a gem on top of Blacklight, like Hydra–is based on the Digital Exhibits framework built by Notre Dame using Hydra. Dependency on Fedora has been removed.
It supports creating multiple collections quickly for any content in Solr. Within each collection you can add multiple exhibits. Each exhibit supports a dynamically generated browse menus through selected browsable facets, with the ability to add essays and feature sources at each level of the browse navigation. Some of the features:
- Using CKEditor for adding and editing essays
- Can associate facet filters to indicate a collections' scope
- Create Ad-hoc collections; select from available facets for collection to show in search page;
- Can have multiple exhibit within a collection; Can add exhibit facet filters, inheriting its collection filters as well.
- Currently have 1 default layout; Plan to have multiple themes to choose from for collections and exhibits
- Will migrate "Seaside" to use Atrium fully.
- Supports hierarchical facets; Can reorder facet hierarchy for a specific exhibit
- 'Customize this page" feature available on any page in the tree.
- Authorization using CanCan
Next Steps are:
- Multiple Style/Layout Templates to choose from
- Index essay full-text content in Solr
- Define a collection from a static list as opposed to a Solr filter
- As Atrium is a Gem over Blacklight like Hydra, it will need some work to coexist with Hydra.
With thanks to Tom Cramer, Richard Green and Rick Johnson for help in preparing this report.