Skip to end of metadata
Go to start of metadata

Executive Summary

Curation Architecture Prototype Services (CAPS) is a prototype service-oriented architecture consisting of a web application for ingest and management of digital objects and a service platform providing atomistic curation functions (or curation microservices). The CAPS project was chartered by the Content Stewardship Council in November of 2010 and given an aggressive deadline of March 2011. November and December were spent writing a project charter, deciding on initial scope, setting up IT infrastructure and storage, configuring a development environment, rapid proofing of concepts to vet technologies, and engagement of potential stakeholders. Stakeholders were added to the team in early January and iterative development of software began in earnest. The development team consists of a project manager, a software developer, a curator, a metadata librarian, an archivist, and a technical architect. The project team is made up of the development team, stakeholders from the University Libraries, and a graphic designer on contract. Stakeholders are faculty and staff with curatorial responsibilities including representation from Special Collections, the Maps Library, Digitization & Preservation, and the Arts & Architecture Library.

The goals of the CAPS project were to develop a back-end architecture for management of digital objects, to engage library curators, to address extant and emerging curatorial needs, to apply an agile development methodology (wherein stakeholders drive development priorities), and to assess the costs and benefits of the curation microservices model.

Development focused on a curation tool for ingest and management of digital objects, which initially includes identification, description, versioning, audit, and storage. The ingest and management tool is a web application that allows curators to upload digital objects and metadata into a curation environment -- where "digital object" is defined as one or more files of any type, the idea being that curators are free to define what constitutes a digital object for their needs. Uploading an object kicks off the ingest process:

  1. A unique, durable, cross-application identifier is minted and bound to the object. Unique identifiers provide curators with a durable way to cite their objects.
  2. The object's files are stored on the filesystem, and are available to applications via a storage service API. Having objects stored on vanilla filesystems enable curators to do digital forensics on objects.
  3. A checksum, or fixity value, is then computed for each file within the object and stored in an object manifest. Checksums offer curators a way to check their objects have not been altered.
  4. The object is placed under version control. Version control systems allow curators a safeguard against inadvertent, and intentional, modifications to their objects, and a widely used method for tracking, comparing, and reverting changes over time.
  5. Metadata, if provided by the curator, is attached to the object, serialized to disk, and stored in a metadata registry. Metadata enhance the discoverability of digital objects for curators and provide context for objects.
  6. Events are logged for all of the above operations. An event log tying particular actions taken against a curator's objects over time enable the curator to audit the provenance of digital objects.

The curation microservices we developed are Python modules, but we are also looking at service frameworks, such as HTTP REST and OpenSRF/XMPP, for scalability and separation of application and service layers. We made heavy usage of open-source software and open standards to leverage existing work and to align our tool chain and specifications with the broader digital curation and technology communities. 

The services architecture developed during the CAPS project is built as a platform on which multiple applications may be built, one example of which is the prototype ingest and management tool.  Other applications may be for public search and browse of curated collections, an electronic records archive, research data pilots, and scholarly publishing services such as management and display of electronic theses and dissertations. The rationale for separating the curation services platform from the application layer is to enable flexibility for applications rather than force a one-size-fits-all approach; an application may connect to the curation services most appropriate for its requirements, its data, and its users. 

Because the focus for the prototype phase was on delivering a functional and visually compelling tool for curators, decisions about scope and resources were made with a bias toward the application layer rather than the platform; although the architecture provides all curation functionality to the application, it is not directly visible to curators. Therefore, the back-end architecture is only somewhat reusable and significant work will be required to make the platform scalable, secure, and production-ready.

Background

The E-Content Delivery Platform Review chartered by the Content Stewardship Council revealed a number of inefficiencies in how the University Libraries manage digital content.  Each of the applications used to deliver digital content is a silo unto itself, requiring different workflows, different training, and different back-end technologies. Because each application focuses on content delivery and not management, we lack a centralized or unified architecture to manage digital content, making systematic management and preservation of such content extraordinarily difficult. Management of digital content is now manually coordinated across heterogeneous work environments, with some bits managed in our applications, other bits managed in assorted filesystems, and yet other bits in personal spreadsheets and databases. Managing content in this way is burdensome, inefficient, and unsustainable should we wish to continue scaling our efforts up and out. More worrisome was the discovery that three of the four delivery applications are moribund; not only are these applications not being upgraded or patched by their developers, but there is either no access to the source code or no ownership of the source code at Penn State. The upshot is that these applications have stopped evolving and are no longer able to keep pace with the evolving needs of our curators.  

On the other hand, having widely used legacy applications has proved valuable. The advantage of having legacy applications such as these in place is twofold: 1) they present us with an opportunity to analyze the gap between curators' needs and the functionality provided by the applications; and 2) they suggest a base of potential users for next-generation curation systems, namely those curators whose needs are not currently being met by legacy applications.

After finishing the platform review project, the Content Stewardship Council chartered the Curation Microservices Proof-of-Concept project to explore the curation microservices model. The proof-of-concept was intended to test whether microservices tools and specs could help Penn State curators manage digital content destined for the four digital content delivery applications, thereby addressing our need to manage content in a more systematic and less "siloized" way. Before we could test microservices tools, we needed to understand better how curators at Penn State do their work; that is, the practice of digital curation needed to be situated within the Penn State University Libraries context. To this end, we gathered a set of curatorial use cases which paint a clear, if necessarily limited, picture of curatorial practice and needs at Penn State.

Toward the end of the proof-of-concept project cycle, the Council charged us to expand the scope of the project to include building a curation services platform prototype (or, as the project was named, Curation Architecture Prototype Services (CAPS)), with requirements driven by the curatorial use cases gathered in the proof-of-concept project. The prototype would include building an architecture for curation microservices (or a "platform"), and a web-based curatorial tool for ingest and management of digital objects, supporting such operations as identification, description, versioning, auditing, and storage. 

Development

Microservices

The microservices model is a decoupling of common curation services, user applications, and IT infrastructure.  The separation of these functions lends itself to code reuse as well as flexibility at the application layer; an application may invoke the curation services that are appropriate to the user needs driving its development. An advantage of the curation microservices model is that code and specifications already exist for many of the functions we set out to build, such as managing file fixities and generating durable identifiers. Additionally, these loosely joined components allowed us to leverage existing code from beyond the library community such as for version control of ingested objects, which uses the widely used distributed version control system, Git. The ability to reuse code providing functionality necessary for a curation platform gave us an opportunity to concentrate on user needs within the application, rather than on building yet another version control system.

One of the strengths of this approach is that the applications we build can be tailored to our users, whereas the behind-the-scenes functionality can be based on open code and specifications developed by the curation community. Storing content in a way that is aligned with other members of the curation community supports the notion of relay-supporting archives, allowing successor archives to make sense of content ingested into CAPS.

The CAPS project could be viewed as needlessly building yet another repository system from scratch, yet this perspective fundamentally misses the point, since it fails to take into account our layered approach. While CAPS may not be built atop widely deployed repository system software, it is built atop widely deployed components -- open-source code and open-community specifications -- each of which may be abstracted in the code and thus replaced with similarly lightweight components in the future. The ability to mix, match, and swap smallish components reflects a major benefit of the microservices model.

While we were successful in separating the levels of our architecture, we remain challenged by completely decoupling our curation services. Currently, most service functionality falls within the storage service, which, in addition to offering file storage, also handles all of the versioning and fixity functionality -- both of which seem inextricably linked to storage. This is an area where open specs, such as the California Digital Library's storage and fixity service specifications, may be consulted for inspiration, although implementation differences will likely remain due to different technical inclinations. These differences will prevent code reuse of a service implementation. However, these implementations are able to use existing libraries that allow for standard practices.

Service Frameworks

Although one of the project's goals was to build a prototype service framework, the aggressive deadline of the CAPS project and the learning curve imposed by formal service frameworks left us insufficient time to prototype one. During the first two months of the project, when we were proofing the concept of service frameworks before stakeholders were engaged, we devoted a significant period of time to learning the Open Service Request Framework (OpenSRF), an open-source message-passing architecture built upon the open eXtensible Messaging & Presence Protocol. OpenSRF service libraries are available in the C, Perl, and Python programming languages, and we chose to write Python-based OpenSRF services, given that the DLT Application Team is moving in the direction of Python as a lingua franca.

One difficulty we encountered is a dearth of documentation on the Python OpenSRF bindings, which did little to shorten the curve of learning the technology stack required by OpenSRF.  The most severe setback was the discovery that the Python bindings are somewhat less mature than the C and Perl bindings; in fact, the OpenSRF development team was aware of no institution who has yet deployed the Python bindings in production. We encountered a series of bugs that we reported to the OpenSRF development team, but ultimately we lacked the time to engage them seriously and test OpenSRF as a framework. It remains a promising technology and may yet prove to be the service framework we wind up deploying -- its scalability, security, and performance characteristics are robust enough for eventual deployment into production.

In addition to OpenSRF, we spent less than a week vetting the open-source Celery library, an implementation of distributed task queues based on the open Advanced Message Queuing Protocol standard that is integrated into our Python/Django application stack. The learning curve here as well proved too steep for less than 1 FTE to vet in under a week. The notion of service frameworks is novel in the DLT/UL context, and we will need to invest more time and people in developing these in the future.

A deeper technical discussion of the merits of OpenSRF and other service frameworks in this context are considered out of scope for this document.

Open Source

All of the software libraries and tools that power CAPS are released as open source, such as Python, Git, Django, jQuery, and MySQL. Building our stack upon prominent open-source projects has the benefit of aligning our development efforts with the broader technology community. Their large developer bases allowed us to find answers to our more difficult questions; because these tools are used broadly both within and without the library community, the issues we encountered had already been solved by other developers.

In addition to these widely deployed tools, we made use of open-source software libraries written primarily around curation microservices by fellow members of the digital curation community, including libraries for the BagIt package specification, the Pairtree storage specification, and the Archival Resource Key (ARK) identifier specification. Every digital object is assigned an ARK persistent identifier which is laid out on the filesystem, in order to provide reasonable object fanout on disk, via the Pairtree specification. Objects stored in CAPS are serialized per the BagIt specification, which packages files along with file fixity values -- so file fixities for later integrity checking are a core part of the digital object. Each digital object is also a Git repository, and the storage microservice knows how to talk to Git, so all changes to the object are tracked in a widely used version control system. Again, since code had already been written for these specifications, we were able to focus more on accommodating our users' needs than writing more basic functionality from scratch, allowing us to work much more rapidly on our own use cases.  

Another value of working with open source components, especially those from the curation microservices community, is rapid identification and resolution of software deficiencies. In one instance, integrating with the Python-based BagIt library, we discovered a bug that prevented CAPS from completing file fixity verifications. We reported the bug to the library's developer and within 48 hours, the code had been patched, tested, and deployed to the world for the benefit of all developers using the code, at which time we upgraded the code on our side with the working version. Although this is but one anecdote, it validated our decision to work with open source technologies and open specifications.

Not only has the CAPS project allowed us to utilize more open source software, but it has also given us experience collaborating with peer institutions on such software, such as the BagIt library. In addition, it has presented us with the opportunity to contribute directly to open-source software development: coding and testing of CAPS has been occurring entirely out in the open on Github.

Metadata

In addition to its role as a description of the resources in a system, metadata plays a crucial role in supporting many of the individual services in the microservices model. It was necessary to develop a metadata framework for the CAPS project allowing for annotation of metadata throughout the lifecycle of the digital object, through both system-generated and user-generated processes, that supported various curation services within the microservices model. The ability for users to annotate new descriptive metadata to an object, to describe preservation events associated with the object, and to uniquely identify it and all of its changes during its lifecycle are all essential to the success of the CAPS system.

The Phase One objectives in metadata development for the CAPS project were two-fold: to survey stakeholders in order to determine their description needs for the CAPS system, and to derive a simple, extensible metadata standard from these needs to underlie the system's search functions. As a foundation for future metadata services, the CAPS system supports Dublin Core. Because Dublin Core is also in use in other platforms in use at the Libraries, such as CONTENTdm, it is already familiar to many CAPS stakeholders, and it is generic enough to be suitable for many different content types.

CAPS metadata is modeled using the Resource Description Framework (RDF), allowing for interoperability with systems beyond CAPS. Currently, it is expressed in the plain-text N-triples format, although future iterations of the project will allow for serialization through other formats such as Turtle / Notation3, JSON, comma-separated text, and so forth. This will allow for many different methods of both displaying and exporting the metadata in the system. Versioning of previous metadata for each object is supported, in order to track digital provenance and enable restoration of previous versions of the digital object in the system.

We developed a data dictionary to outline the fields currently in the system and the intentions for each, in order to demonstrate to stakeholders what sorts of metadata were being collected and to get input on what was still needed. The dictionary currently draws mainly from Dublin Core but will grow to include necessary preservation, technical, and administrative metadata fields, as the processes for collecting them become more specific in future phases.

Agile Project Management

Agile methodology was used for project management and development in order to accomplish the objectives of this project within the short timeframe. Using the agile approach, we involved stakeholders throughout the process, rather than just at the beginning and end as happens typically in traditional project management processes. The development goals and scope continue to evolve with stakeholder input week after week.

The agile process workflow may be characterized as follows:

  1. Development
  2. Share progress  (development team, daily)
  3. Share deliverables with stakeholders  (project team, weekly)
  4. Stakeholder feedback (project team, weekly)
  5. Define deliverables for following week (project team, weekly)
  6. Repeat

Critical success factors that allowed agile methodology to work successfully for this project were the following:

  • All of the skills necessary to create the product were represented on the team.
  • The team was empowered to make the necessary decisions in order to get the work done.
  • Administrative support allowed team members the necessary time to do the work without competing priorities.
  • A strong communication plan was in place within the development team and with the stakeholders.
  • There was a strong group commitment to agile methodology.

The development team spent a significant amount of time between weekly meetings coding for the project. This was sustainable in the short term, but it may not be in the long term for the particular people involved. The development team may need to expand if the project moves beyond the prototype phase.

Structured stakeholder involvement was key. Initially, the development team came up with a series of basic screen shots showing minimal functionality. These screen shots were done essentially to start the conversation with users, as it seemed more provocative than a blank screen saying "tell us what you want."  From these screens, discussions developed around user requirements, desired functionality, and potential changes to current workflows.

The potential scope of the project was huge. Agile methodology was a good choice for CAPS, because the stakeholders set priorities for development in real time and had a good sense of what was possible to accomplish week to week. Decisions on what to develop were based on issues that were the most important to stakeholders at each step in the process. This approach could have led to an unmanageable task list, but since everyone was involved in the process, there was a shared understanding of what was reasonable to accomplish in the time available.

Assessment

The five overarching goals of the CAPS project were to:

  1. develop a curation services architecture, assessing benefits and costs to determine sustainability of the microservices approach at Penn State;
  2. engage library faculty and staff in the development of new curatorial and delivery applications atop the architecture;
  3. begin to address the programmatic curatorial needs of electronic records, research data, and digital library collections;
  4. apply user-driven agile development practices; and finally
  5. test the curation microservices model.

The most obvious cost of this project is that of staff time, both for programmers to build the architecture, and for curators to engage in ongoing review and feedback during the development process. The CAPS team believes this cost is more than outweighed by the benefits: speed, flexibility, and community.

Speed and flexibility were achieved through the combined use of the microservices approach, in which small pieces of code could be developed and edited separately (or else imported from sources outside of Penn State and incorporated into the architecture) and of agile development practices, which required daily meetings for the project team, weekly meetings with stakeholders, and constant generation and regeneration of short-term and long-term goals. The agile development approach also required the active engagement of stakeholders, which in turn built a community around digital curation that bridges IT staff, librarians, archivists, and digitization professionals at Penn State.

Over the span of just three months, the CAPS team was able to build a back-end architecture and a generic ingest and management application, incorporating ongoing changes to functionality as well as look and feel based on curator input. This architecture will provide the basis to build more specific ingest and management applications, for materials such as electronic university records and research data, and will support the future public interface for searching and retrieving those digital objects that are available to the public.

The curation microservices model looks to be a huge success so far, with positive feedback from the team, stakeholders, and library administration. At the conclusion of development of the prototype, stakeholders in particular were surveyed for their feedback; four out of six completed the survey. To the question, "The CAPS project team did a good job of listening to my concerns as a stakeholder," all respondents said "Strongly agree." Most respondents were also pleased with the frequency and substance of communication (including the weekly stakeholder meetings and getting questions answered by the development team clearly and in a timely fashion). Half strongly agreed they were pleased with the prioritization of stakeholder requirements, while half agreed they were pleased. Regarding whether the mock-ups of the project deliverables reflected the prioritized stakeholder requirements, half strongly agreed that they did, while the other half agreed. (Complete survey results may be viewed here.)

As the survey results suggest, the agile development method has proven extremely fruitful. When stakeholders can offer feedback and see that feedback put into action one week later, not only does the usefulness of the final product increase, so does the investment of library staff. And when administration can commit a team to a project and have that project realized within the time frame of a semester, future support for iterations of the project is also strengthened.

Building Community

One promising outcome of the CAPS project has been the start of community building around curatorial practices. Our project goals arguably reflect multiple levels of collaborative participation, in particular: the collaboration of the development team (Belden, Clair, Coughlin, Giarlo, Hswe, and Klimczyk) and the collaboration of our stakeholders (Bidney, Dyke, Esposito, Jakle Movahedi-Lakarani, Pisciotta, and Rozo). Through these collaborations, a community of practice around digital curation has taken shape. Our curator colleagues are more informed about curation microservices and increasingly have a vested interest in continuing to participate in their development at Penn State, and the development team has a deeper understanding of the curatorial needs of our colleagues. Because our stakeholders have been closely involved in the development of the prototype, they are also instant test users of the resulting platform. Debugging and feedback of the CAPS prototype continues, but initial responses may be found here

More critically, by building a community of curatorial practice, we are beginning to work outside system-specific boundaries, which bodes well for both curators and developers of microservices. Curatorial workflows have the potential to be more uniform, or more closely aligned, when not bound to a monolithic repository or delivery system. The inherent flexibility and interoperability of a microservices environment mean that curators do not have to be prisoners of a particular platform - and neither do developers of curation microservices, who are not only creating microservices based on the curatorial requirements of their users but also drawing on code that is openly available for distribution, sharing, and re-purposing. 

Also, since curation microservices are emerging as an alternative to repository software applications such as DSpace, Eprints, and Fedora, there has not been much reporting, or documentation, on the experience of curators implementing these tools. Departing from this custom, the CAPS project has been particularly attentive to capturing what our curator-users think of the iteratively developed prototype, an approach we intend to continue in future phases and hope to make more widely known by reporting on our experiences at conference venues and via other community channels. Documenting the process of stakeholder engagement, such as user testing and discussions with developers, and then sharing reports about these activities with the larger digital library community could serve as a model for how to "do" user engagement in application prototyping and development.

It's also key to remember that the activity of engaging internal users such as curators has much in common with the activity of engaging end users, such as faculty and students. Many of the ideas for additional pilot projects and for additional features, listed below in the section "Next Steps," are based on use cases that have been documented and gathered from consultations with users outside the project and from discussions with our stakeholders, who obviously interact heavily with faculty and students. It is our intention, should there be future phases for further development of both microservices and the public interface, to understand the needs and requirements of end users by engaging them regularly as well.

Use Case Analysis

Services that are done, though ongoing refinement is likely:

  • IDENTIFY -- Generate an identifier for an object.
  • ANNOTATE -- Describe an object (add metadata to an object)  
  • VERIFY -- Ensure an object's files have not inadvertently changed
  • VERSION – The system generates a new version number when the object is manipulated within the system. Differentiating versions of an object ingested at different times will be addressed in the future.

Applications that are done, though ongoing refinement is likely:

  • Upload done for single objects and batches with pre-formatted spreadsheets
  • View a dashboard of your objects
  • View an audit trail for an object
  • Search for an object
  • Tree view - associated objects files 

Services/Applications out of scope for CAPS, but on the list for future development

  • Generate administrative metadata, including technical and preservation
  • Authenticate a user
  • Authorize a user (to take a particular action on a particular object)
  • View orphaned objects
  • View unpublished objects
  • Publish an object to an exhibit
  • Tag someone else's object
  • Share an object with another steward
  • E-records: validate format, assign to records group, set a retention policy
  • Replicate an object
  • View report on all formats used (e.g., for migration prep)
  • Migrate an object or a set of objects from one format to another
  • Provide API access to a subset of the above actions

Public Interface Design

About two-thirds of the way into prototype development, the team conferred with Bien Concepcion of Concepcion Design, LLC, to mock up a public interface for CAPS. With most of the focus up to that point on curator-type, or object-management, functionalities, we thought it would serve our stakeholders and team members well to start envisioning the public-services, or object-delivery, layer of the prototype - that is, what sits at the top of the stack and would enable end-user interaction with CAPS. The mock-up of the public interface is based largely on the user interface for the University of North Texas Digital Library, which bills itself as a "a centralized repository for the rich collections held by the libraries, colleges, schools, and departments at the University of North Texas" and thus not unlike what we foresee in a production instance of CAPS. (The UNT Libraries have also built their Digital Library using a curation microservices approach.) Brief feedback on the public interface may be found here; in stakeholder discussions, most folks seemed pleased with the direction of the public interface - meaning, more ways to get into a collection (via subject access, time periods, geographic references, etc.) and more ways for users to do things with collections. Going forward, the team will need to consider how to integrate end-user testing and feedback in further development of the public interface, but we are anticipating that our stakeholders will be able to point us to sets of end users who could test and respond to the interface. 

Next Steps

The prototype phase of development on CAPS has ended and judging by our assessment measures, it's been a success in the eyes of its developers and its stakeholders.  The team has discussed a number of potential follow-up pilot projects, a list of features that CAPS ought to provide, changes to how we handle metadata, and the challenges standing between the CAPS prototype and a production-ready curation services platform.

Pilots

Should the project's sponsors choose to continue with development of a curation services platform, there are a number of pilot projects that may be developed atop the platform:

  • Electronic records ingest & management
  • Electronic thesis & dissertation workflow
  • Research data deposit & archiving
  • Historic newspaper search & display
  • Image search & display
  • Generic public browse/exhibit/search of curated collections
  • Reporting/statistical output/delivery showing user access of collections

Features

We also have gathered a list of features identified as important by stakeholders and project team members, but were either out of scope for the prototype phase or could not be accommodated in the given timeline:

  • Retention periods
  • Event logging service
  • Notification service
  • Object replication service
  • Routine fixity checks
  • Object versioning/difference views
  • Exposure of objects and metadata via the linked data pattern
  • Format migration tools 
  • Digital object & collection usage statistics and reporting for curators
  • Controlled vocabulary/ontology management
  • Support for creation and management of object collections
  • Rights, access control, user accounts, and authorization for digital objects
  • Publication of digital objects to display/exhibit applications
  • Integration w/ other systems such as legacy delivery applications and Summon

Metadata

In future phases of metadata development for the CAPS system, we hope to accomplish the following tasks:

  • Automatic extraction of technical metadata. At the moment the system supports mostly descriptive metadata, along with some technical metadata (dates of creation and last modification, file name and size, etc.) that is generated automatically by the system. Further extensions to technical metadata, i.e. EXIF or IPTC metadata derived from the image itself, will be made available in future phases of the project.
  • Standardized recording of preservation events. Through the versioning system, some metadata provenance is already recorded in CAPS. In future phases where more information about the curation needs of digital content is known, we hope to develop a local controlled vocabulary of specific preservation events and their associated metadata within CAPS, and to link this vocabulary with specific ontologies for preservation such as the PREMIS standard.
  • Support for linked data vocabularies. Due to time constraints associated with phase one, the metadata values associated with each object are currently stored within the system as literal strings, rather than as unique identifiers as recommended by linked data best practices. Future phases of the project will involve better integration with linked data vocabularies such as DBPedia and id.loc.gov (for, e.g., subject headings), GeoNames (for geospatial information), and so on. This will be done through both API interfaces to the data stores and local harvesting of vocabularies to allow for quicker and easier access.
  • Usage of the rdflib library. Due again to time constraints and learning curves, we were not able to use the standard Python library for dealing with RDF data. Rather we stuffed all metadata into the ingest & management application's relational database, and manually constructed the N-triples files for serialization into object packages. We would like to align with the broader RDF community and use common tools and approaches, such as rdflib among those developers in the Python community.

OpenCASA

Before any follow-up pilots may be investigated, the platform must provide a reliable, scalable, and performant foundation on which to develop curation applications. We therefore recommend the next phase of the project be dedicated to planning and executing the development of a production-ready, enterprise-quality services architecture, which we have begun referring to as the Open Curatorial and Archival Services Architecture, or OpenCASA.

Project planning for OpenCASA will involve work at a few different levels, including development, infrastructure/deployment, and project management.

Development

We recommend prioritizing the following activities from the software development standpoint:

  • Audit codebase for Python best practices (PEP8), documentation for sustainable development by a team, and security purposes
  • Build service architecture (OpenSRF/XMPP or REST/HTTP)
  • Decouple existing services from prototype ingest app, continuing to use the prototype app as a testbed
  • Integrate with WebAccess/CoSign authentication
  • Develop authorization service
  • Develop a rights/data classification/access control model
  • Develop a service to scan ingested objects for virii and other malware (possibly use CLAM AV)
  • Utilize weekly code reviews to reduce learning curves among development and build teamwork

Infrastructure and Deployment

We recommend prioritizing the following activities from the infrastructure and deployment standpoint:

  • Integrate representatives from DLT Storage, Infrastructure, and Security team with OpenCASA development team
  • Build out storage architecture for production
  • Configure distributed development environment
  • Configure industry-standard development/shared/test/staging/production infrastructure for team development
  • Develop deployment procedures that ensure libraries and tools across different development environments are consistent
  • Build expertise in distributed version control systems among development team
  • Set up local DVCS server(s) to be master for source code

Project Management

In order to scale up our ability to develop enterprise-quality software, we may need release management practices that work over the long haul. Work on OpenCASA will likely require:

  • New functional areas such as more formal quality assurance (QA) and testing
  • Multiple developers working simultaneously on the same codebase, and also working with teams in parallel in different functional areas
  • Multiple milestones and overlapping iterations/release cycles between functional areas
  • Multiple linked code repositories
  • Interlinking between feature requests, bug fixes, source code commits, milestones, and documentation/wiki pages for ease of tracking, reporting, and project management

We currently lack a sufficiently integrated issue tracking/project management/source control system for accommodating such an effort, and if we are to continue momentum gained during the CAPS prototype phase, putting the ITS-wide change management system on our critical path may be detrimental to our ability to remain agile and engaged. We therefore recommend bringing up a lightweight, open source project/development management system as a stopgap for the first phase of OpenCASA development, until such time as the ITS-wide change management system is available and we have the cycles to integrate with it.

More generally, we recommend continuing to utilize agile project management and development methodologies and engaging the stakeholders who have been participating thus far.

Project Team

  • Marcy Bidney
  • Michelle Belden
  • Kevin Clair
  • Bien Concepcion
  • Dan Coughlin
  • Robyn Dyke
  • Jackie Esposito
  • Mike Giarlo
  • Patricia Hswe
  • Linda Klimczyk
  • Stephanie Jakle Movahedi-Lankarani
  • Henry Pisciotta
  • Albert Rozo
  • No labels