A-1.5 What Are The Barriers To Sharing, Re-using And Aggregating Metadata?


Cite Permalink:
1
In analysing the barriers to metadata aggregation it has been necessary to look at the literature in other areas, such as sharing learning resources and repositories, as there are currently less aggregation examples to draw from. The literature has documented many of the barriers experienced by institutions, communities and individuals with regard to sharing, and most have highlighted legal and cultural issues (e.g. McGill et al, 2008), and some have commented on metadata generation. Several studies point to the notion of perceived barriers. These are often anticipated barriers that are not as real as imagined or that are minimised as new developments come along, such as the introduction and wide scale adoption of Creative Commons [i] Licences, or the increasing publishing choices offered by Web 2.0 applications.
Cite Permalink:
2
Sarah Shreeves (2007), who has extensive experience with metadata aggregation for the Open Archives initiative, presents a list of the attributes of shareable metadata, which:
Cite Permalink:
3
  • Is quality metadata.
  • Promotes search interoperability.
  • Is human understandable outside of its local context.
  • Must be useful outside its local context – an aggregator can actually build services based on the data in the records provided, e.g. geographic data that can be used to put the items on a map.
  • Is preferably machine processable.
  • Provides enough contextual information, e.g. the Theodore Roosevelt collection didn’t have a Roosevelt subject term because the title of the collection was assumed to be enough.
  • Is consistent across a single collection – i.e. same date field, same controlled vocabulary.
  • Is coherent.
  • Is true to its content but also its potential audience.
  • Conforms to standards – descriptive, technical, etc.
Cite Permalink:
4
An Images Case Study (Rogers and Barker 2007) for Engineering Collections, part of a set of case studies following on from a report on image collections [ii], found that the main issues arising were cultural as opposed to technical or legal issues. The collections interviewed indicated that they were willing to share metadata and images; however a lack of understanding about what that required technically was apparent. Most of the collections approached shared images by placing them on their own website, which meant sharing was only possible if a user happened to find the website and could ascertain from the terms and conditions whether they could legally use the image. A related Archaeology Collections Case Study (Romer and MacMahon 2007) cited the most common barrier as a lack of time to share images (68%), with 57% of respondents being concerned about legal and copyright issues. Eighteen per cent indicated that a lack of technical skills in accessing the images (including downloading) was a barrier.
Cite Permalink:
5
The following sections outline the potential main barriers that an aggregator may encounter, and evidence from the literature for their existence.
Cite Permalink:
6

A-1.5.1 Organisational

Cite Permalink:
7
Organisational issues are likely to encompass getting cooperation to participate in an aggregation, and who will be involved. Collaborative projects, or collections which are made up of submissions from many parties, often have to reach agreement among all partners to expose metadata. This can be time consuming particularly if the participants, either within the organisation or externally, are less familiar with certain standards, or have fears about exposing their metadata to a wider audience.
Cite Permalink:
8
Further, if a collection is not yet public, or has not been fully digitised then it makes sense that museums, libraries or archives will not want to share their metadata widely until their own collection and web site is launched and ready to use.
Cite Permalink:
9
There are differences between the attitudes and readiness to share research as compared to learning materials. Trust is a key issue for both, as are attribution, incentive, reputation and approbation. In both areas, and so it is likely for metadata also, authorship or lineage of a resource is used as an important guide to quality ‘by reviewing the authorship of a resource they could make decisions on the resource’s authority and legitimacy’, PROWE, Metadata report (Whitelaw 2007).
Cite Permalink:
10
However, in focus groups and written evaluations carried out for the Open Archives Initiative Metadata Harvesting Project [iii], there was consensus that the name of a holding institution did not influence which search results the test subjects chose to view. The participants reported that they assumed the nature of the University of Illinois portal assured that all data providers could be considered credible and authoritative providers of primary source materials. This fact has implications for how a metadata aggregator organises and presents the collections.
Cite Permalink:
11

A-1.5.2 Legal

Cite Permalink:
12
Any aggregator needs to be workablewithin legal, intellectual property (IPR) and copyright bounds. Although there are unlikely to be as many rights issues for metadata there are likely to be restrictions on the full content that the end-users are led to through the aggregator service.
Cite Permalink:
13
The Images Case Study for Engineering Collections (2007) found that one of the main barriers to sharing any type of resource is the issue of who owns the IPR to the resource and the terms of use that they enforce. Each collection approached during the study had different issues in relation to IPR; some collections assured that they owned the images as they had been digitised from still images from their own collection and many of the collections comprised historical images and other galleries hosted images created by their own staff so had no issue over copyright clearance. However, at least one collection could not offer any assurance about IPR status as the images had been donated from a number of sources which could no longer be traced.
Cite Permalink:
14
Each collection approached in the Images Case Study for Engineering Collections (2007) was happy to licence the images with the following conditions: attribution, non-commercial, for educational purposes only. Many had heard of Creative Commons licences and were willing to adopt such licences if they met their requirements.
Cite Permalink:
15
The Linking UK Repositories [iv] study reported that copyright and IPR issues have presented some projects with serious challenges. Authors are generally ignorant and, as a consequence, wary about the legal aspects and requirements for depositing their work in repositories, which constitutes one of the biggest barriers to gaining a critical mass of content in Open Access repositories. Some recommendations for repositories were made in a study by Charlesworth (2005) on the way to proceed on a national scale.
Cite Permalink:
16
The JISC-funded L2L project revealed how difficult it can be to obtain copyright clearance, particularly from public organisations (Brosnan, 2005). One outcome of that project was a publication (Casey 2004) introducing IPR issues for people producing e-learning materials. Digital image collections are also severely affected by rights issues, summed up in the report on The Digital Image project (Pringle, 2005): ‘IPR in the digital image world is a confused and confusing picture, with far-reaching consequences for getting it wrong.’
Cite Permalink:
17
However, it is worth noting here that it is common practice commercially to allow contributors to release their content under creative commons licences, giving users clear guidelines of reuse. This type of licence is used by popular services such as Flickr, Scribd, SlideShare, and Zoho. JISC Legal provides advice and guidance on legal matters to do with digital rights and associated issues.
Cite Permalink:
18

A-1.5.3 Financial

Cite Permalink:
19
The Images Case Study for Engineering Collections (2007) found that many collections created as projects and funded originally by either JISC funding or by other grants, are obliged to share their resources as widely as possible. Some collections they examined charge for using images to fund the digitisation of new images; others may have a funding strategy involving membership (i.e. BUFVC). Most collections which started life as projects will have no or very limited funding so the costs of maintaining access are often absorbed by host institutions. As the Images Case Studies (2007) [v] summary report outlined, few community image collections have the technical capacity to engage with the JISC Information Environment. They are often run by a fragmented group which may include a champion academic and remote, best-efforts IT support from computing services.
Cite Permalink:
20
The general consensus of the collection owners interviewed by the Images Case Study for Engineering Collections (Rogers and Barker 2007) was that they would be willing to implement an RSS feed, subject to the following conditions:
Cite Permalink:
21
  • It did not involve a lot of resource and working hours.
  • They be given some guidance on how to implement this (i.e. a step by step guide).
  • It did not involve repetition for each image.
Cite Permalink:
22

A-1.5.4 Technical

Cite Permalink:
23
The collections approached by the set of Images Case Studies (Rogers and Barker 2007) contained images in a variety of formats but were mostly JPEG or GIF files. Some of the images were born-digital but most were scanned from historical slide collections or from portraits. The resources were generally well described with most of the collections using their own metadata schema to describe the images.
Cite Permalink:
24
The technical expertise available for each collection was often limited, some more so than others, with some only able to maintain the current collection and did not have the staff time to be able to add any new technical functionality.
Cite Permalink:
25

A-1.5.5 Preservation

Cite Permalink:
26
The first hurdle faced in preserving AV files is to know about, understand, fund and use the existing digital library tools that can change a heap of files into a managed collection.
Cite Permalink:
27
The second hurdle is recognising that digital library tools provide management (so files can be accessed and do not get lost) but do not cover preservation. Files face a range of obsolescence issues, addressed by digital preservation technology – methods for ensuring that obsolete files can migrate to new standards and formats, methods for emulating old IT environments to extend the lifetime of obsolete formats, criteria for evaluating the reliability of a digital repository, and finally an overall methodology: OAIS. AV collections have difficulty finding anyone on their IT staff who has even heard of OAIS, which rather limits support for funding and implementation. Fortunately, the EC project MEMORIES [vi] is developing OAIS and related procedures specifically for audio and video collections.
Cite Permalink:
28
The third hurdle is that the specific needs of AV files are not fully supported by digital library and digital preservation technology.
Cite Permalink:
29

A-1.5.6 Language

Cite Permalink:
30
There is a difference between academic and non-academic views of digital resources. TheDigital Preservation Europe report describes this as: ‘Two worlds: Digital library technology comes from the academic library world. AV collections are largely outside that world. The biggest holders of content are broadcasters, and other major holdings are in film museums and other cultural and heritage institutions (one of the largest film collections in the UK is at the Imperial War Museum). Broadcasters vary, but it is common for the computer and technical staff of a broadcaster, and the management who decide and fund technology issues to know absolutely nothing of academic libraries and digital library technology.’ [vii]
Cite Permalink:
31
This work will cover a number of different stakeholders, and individual interviews may be needed to address the translation of concepts between these two worlds.
Cite Permalink:
32

A-1.5.7 Standards

Cite Permalink:
33
There are many published examples of where compliance to a given standard has created issues for interoperability between collections. For example, while the OAI protocol (see section A-1.7.7.1 OAI-PMH) has been promoted as a ‘low barrier’ means to share metadata, it is not without technical hurdles (Shreeves 2005). A lack of technical resources is a fundamental barrier to the implementation of OAI metadata provider services particularly for those institutions which do not have a digital content management system with a built in OAI metadata provider. In order for any standard to be used, all the metadata needs to be in a shareable state; meaning that the metadata should be of good quality and provide an appropriate context, be consistent and coherent across collections.
Cite Permalink:
34
OAI is not the only the way that institutions can share metadata or federate access to their content. Communities with specialized content or resources, such as some scientific communities, have developed other means and standards to sharing metadata and resources within their community. It is questionable whether or not using a particular standard is really a barrier, in that institutions may be sharing the metadata via another means. Shreeves (2005) also comments that collection owners may feel that they will need to ‘dumb down’ their metadata to meet the simple DC requirement for OAI-PMH and may not wish to do this. Some domains like museums, archives, and some scientific communities, as well as others, often create metadata which is used for far more than simple discovery; the metadata may provide contextual and historical information and record relationships between items. Simple DC does not have the semantic complexity and richness to express much of what these communities need to express. However, the protocol does allow multiple metadata formats to be exposed for each item; thus an item could be exposed in both DC and MODS.
Cite Permalink:
35
Shreeves (2005) concludes that encouraging use of standard, community specific metadata formats with published XML schemas could be crucial to increasing the usefulness of the services built on aggregated metadata.
Cite Permalink:
36

A-1.5.8 Quality

Cite Permalink:
37
Most image and time-based media resources will, by their nature, have little intrinsic semantic content, and so will require metadata to be added to them. Since manually adding metadata is expensive, in terms of time and money, it can only be feasible for a restricted number of resources, or alternatives such as automated methods of enrichment such as text mining the existing metadata should be explored. Targeting key fields such as description and keywords for image, video and audio resources would be more cost effective, but still involve effort (Charlesworth et al, 2007). An alternative is to seek input from users through services such as tagging, or by using commercial sites such as Flickr to enable this ‘crowdsourcing’. Richer metadata should be able to supply the searcher with further information and sticky links, such as ‘more like this’.
Cite Permalink:
38
Where metadata is not adequate then improvement of the metadata quality, and enrichment, can be carried out (Groat 2009).
Cite Permalink:
39

A-1.5.9 Access and use – discovery

Cite Permalink:
40
Many web resources have limitations on their discoverability and reuse, for example the Financial Times articles carry the message: ‘Copyright The Financial Times Limited 2010. You may share using our article tools. Please don’t cut articles from FT.com and redistribute by email or post to the web.’ [viii]
Cite Permalink:
41
As many websites rely on Google for a significant amount of traffic the ability for a given resource to be discoverable by Google is important. Google are known for changing their algorithm frequently and this results in a site or group of sites ranking changing. This may significantly affect the traffic to their site. This implies that for an aggregation of metadata to be discoverable it must be available to Google, and be able to be returned in a reasonable ranking. However Google is rumoured to return lower ranks for aggregators, so consideration of this would be useful for any aggregation of multimedia resources.
Cite Permalink:
42
[i] http://creativecommons.org/
[ii] CLiC Report http://www.jisc.ac.uk/uploaded_documents/CLIC_Report.pdf
[iii] http://www.openarchives.org/
[iv] Swan, A. and Awre, C. (2006) LINKING UK REPOSITORIES
[v] http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2007/imagecasestudies.aspx
[vi] http://www.memories-project.eu/
[vii] Digital Preservation Europe (DPE), Briefing Paper Preservation of Digital Audiovisual Content (2007), http://www.digitalpreservationeurope.eu/publications/briefs/audiovisual_v3.pdf
[viii] Financial Times Copyright Statement and an example is on http://www.ft.com/cms/s/0/1a5596c2-8d0f-11df-bad7-00144feab49a.html?ftcamp=rss

Total comments on this page:

Comments are closed.