6.4 What are the metadata considerations?

Cite Permalink:
Most metadata-related issues identified in this study are not specific to images and time-based media, but are, nevertheless, relevant as they apply to all aggregations of metadata including those for images and time-based media.
Cite Permalink:

6.4.1  Should metadata be normalised into a common standard? If so, who should do this?

Cite Permalink:
The reasons for having a common standard, and the issues associated with normalising (standardising) metadata were explored. The responses were split, with many assuming an aggregator would introduce a common schema, and others cautioning against it due to practicality and other reasons.
Cite Permalink:
More online respondents indicated that metadata should be normalised (8) than not (5), but the largest number of respondents selected ‘other’ (14) in response to this question. More of the comments of those selecting ‘other’ supported the view that metadata should be normalised (7) than not (2), the remainder proposed caveats to normalisation or alternatives. The alternatives were:
Cite Permalink:
  • “Both the arguments are valid, surely there must exist some sort of interface which could accommodate both.
  • “If you go to semantics, you can keep the original metadata based on cataloguing rules and standards and enrich them with alias to the semantic model.
Cite Permalink:
From this small sample there is little agreement about whether metadata should be normalised and who should do this. Of interest from the online survey is that the majority of respondents who were collection owners thought that metadata should not be normalised, and the majority of those who thought that the collection owner should normalise metadata were users.
Cite Permalink:
Several interviewees had experience of agreeing a common schema for normalising metadata, and a common view was that it would be useful for consistency and cataloguing but would also be difficult to put into practice. One noted that smaller archives are concerned that larger ones would take over in setting the standard and that getting agreement between different partners would be challenging.
Cite Permalink:
Concerns were raised by several interviewees that metadata quality would be lost through aggregation if trying to “shoehorn things into Dublin Core”, for example a large collection owner said that it uses specialised fields for video and that information would be lost if the metadata must adhere to a standard schema. However, this problem is mitigated by providing, within the aggregated metadata record, a link back to the full record on the collection owners’ site.
Cite Permalink:
One interviewee from a large aggregation said that metadata does not necessarily need to be normalised – that normalising is an impossible task – but as long as it is possible to translate between the different schemas the commonality can be extracted when needed. This has been found, by an existing aggregator, to be particularly true with regard to image collections and metadata sets. S/he recommends that trying to normalise too much would be difficult since different organisations in different subject areas use different schemas.
Cite Permalink:

6.4.2  What is the minimal profile for metadata?

Cite Permalink:
From the high number of relevant responses giving a wide variety of proposed schemas it seems that gaining agreement on a single metadata schema would be a significant challenge if the common metadata schema model were to be followed. One respondent neatly summed this up as: “Hard to find one that would please all parties”. Respondents also added different fields as being desirable to the various schemas suggested.
Cite Permalink:
Several respondents specified a list of fields that would constitute their minimum schema rather than naming an existing schema, and no two of these were the same. A selection of those suggested as minimum fields were:
Cite Permalink:
  • Title, description, unique identifier, copyright.
  • Title, description/abstract, keywords, file format.
  • Title, creator, summary/description, file format, original production date.
  • Title, related words, dates, file type, file format.
  • How the item has been labelled (e.g. ‘INT.27.rtf’ for the 27th interview in a collection); duration; description; theme; and format.
Cite Permalink:
The following profiles were suggested as suitable schemas for a minimal profile for metadata for images and time-based media:
Cite Permalink:
  • Dublin Core (DC) (although each respondent suggesting DC also gave a different enhancement as none thought simple DC would be sufficient – these included geospatial data, basic keywords, RDF triples, qualifiers not used for ones’ own collection and a number of other fields).
  • PBCore (Public Broadcasting Core)[i].
  • EBUCore (European Broadcasting Union Core)[ii].
  • DICOM (Digital Imaging and Communications in Medicine) based with extension to include metadata useful to research (this was in relation to medical images).
  • The schema used by the European Collected Library of Artistic Performance (ECLAP).
  • People’s Network Discovery Service DC Application Profile[iii] but with no “complicated FRBR based stuff though!”.
  • METS (Metadata Encoding and Transmission Standard, a Library of Congress metadata packaging format using XML[iv]) MODS and RightsMD.
  • Images Application Profile[v] and Time-based Media Application Profile[vi].
  • A simplified version of the schema used by the Siobhan Davies Replay Archive[vii].
  • PADS (Performance Art Data Structure)[viii].
  • STARS (Semantic Tools for Screen Arts Research)[ix].
  • BBC Class Clips Learning Zone[x].
  • Web standards such as RSS.
Cite Permalink:
Two interviewees said that consideration should be given to what people will be doing with the collection before a minimum profile is decided upon. For example, subject or discipline- specific searching may be really useful, but this must then be accommodated in the metadata structure.
Cite Permalink:
One interviewee from a specialist HE museum agreed as it holds a variety of schemas for different subjects e.g. entomology, zoology, palaeontology, and has not standardised these. S/he ‘counsels strongly against trying to attempt to fit into the [single schema] and the time spent agonising [or you will] invent 10,000 fields to accommodate everyone‘. Furthermore, s/he believes that standardisation would ‘degrade the information available’.
Cite Permalink:
Interestingly nobody suggested the use of the W3C ‘Ontology for Media Resource 1.0[xi]’, which combines multiple different media types and maps this to a number of common vocabularies such as DC, METS, OGG and YouTube. This may be because this is relatively new and therefore has not been tested and adopted.
Cite Permalink:
From the high number of relevant responses and the variety of proposed schemas it seems that gaining agreement on a single metadata schema would be a significant challenge. Clearly, the specifics of any schema would relate to the envisaged use. This would explain why different survey respondents and interviewees suggested different fields – there is probably an implicit use case behind many responses. However, the idea behind the aggregation is that it be made available to service providers who would create services that meet the user needs of which they are aware or as they become aware of them. This adds weight to the argument that a minimal profile is inappropriate as any loss of metadata would restrict the potential use and application before the specific functionality needed by users is known. It may also stymie the capacity of the aggregation to meet emergent user needs.
Cite Permalink:

6.4.3  Should metadata be enriched?

Cite Permalink:
All but one of the respondents to the online survey believed that metadata should be enriched but there was no conclusive outcome regarding who should be responsible for enrichment. Ten respondents indicated that the collection owner should be given technical support to enrich the metadata as they know their metadata best. Nine indicated that the aggregator should enrich the metadata as they can act consistently for all collections. Five indicated that the service provider should enrich the metadata depending on the demands of users for the service. A further four responses suggested that it would depend on a number of different factors, such as the resources of the collection owners, the use of standardised labels (e.g. top-level library of congress categories) or niche discipline labels and the target audience.
Cite Permalink:
The online survey respondent who did not believe that metadata should be enriched had experience of this with an HE repository, where metadata were automatically uploaded from another database which did not work effectively. However, s/he would not limit reuse of metadata from the repository as it can be searched and re-presented online, and she would be willing for machines to enrich it whether to make it into linked data or for any other purpose.
Cite Permalink:
Among the interviewees, those supporting enrichment by collection owners tend to acknowledge that in many instances collection owners will lack the resource and/or expertise to enrich metadata and thus would require support from the aggregator to do this. Those supporting aggregators and service providers to enrich metadata suggested that text mining may be used to automatically enhance the metadata with references to items such as places, names, dates and languages. However, if the common schema metadata aggregation model were used with a minimal schema, the metadata available could be sparse so automatic enhancement would be limited. If the multiple schemas metadata aggregation model were used then there may be more metadata available, which may give more opportunity for automatic enhancement.
Cite Permalink:
Many interviewees thought that contributions from users should be taken into account: they may have the knowledge to enrich the metadata, but care should be taken as ‘incorrect’ metadata may hinder searching; therefore the weight that user generated metadata plays in a search would be important. Three examples were suggested:
Cite Permalink:
  • Using search behaviour of users to enhance the discovery path of subsequent users.
  • Crowdsourcing: asking users to add metadata such as tags, categories or comments.
  • Asking subject area experts to add metadata tags for particular subjects; for some interviewees metadata provenance was important.
Cite Permalink:
One interviewee said that if the opportunity to expose enriched metadata through RDFa or Linked Data format were taken, then it is likely this would still be useful in 5-10 years time (Google, Twitter and Facebook appear to be thinking along these lines[xii]). Further, the NSDL (National Science Digital Library[xiii]) in the US has been looking at metadata recombination[xiv], i.e. getting metadata from different places and combining it into a single better record.
Cite Permalink:
Several interviewees commented that none of the existing technical protocols were good at giving metadata back to the collection owners, and that if metadata owners were to check the metadata that were added to their original records this would be ‘a project in its own right’.
Cite Permalink:
The increasing use of digital cameras to take digital images and film means that a large quantity of technical metadata can be captured automatically (e.g. date, time, focal length, F-number, exposure time and in some cases geo-location) from the digital device. It is not clear whether such metadata that is stored within the asset should be extracted by the collection owner, aggregator or service provider; this was not explored further.
Cite Permalink:
Perhaps most appropriate would be that the aggregator undertakes some enrichment of metadata and that it provides support for enrichment to those collection owners who have the resource and the will to enrich their own metadata. Enrichment by users through services could also be incorporated through the addition of user tags that are clearly differentiated from ‘official’ metadata.
Cite Permalink:

6.4.4  Other metadata issues

Cite Permalink:
A number of non-technical challenges are known to EDINA through experience of developing the Visual Sound and Materials (VSM) Portal Demonstrator. These include:
Cite Permalink:
  • Difficulty identifying the appropriate person for metadata within the collection owning organisation and the possibility that they have little time or budget to devote to this initiative.
  • Lack of understanding, within the collection owning organisation, of what was being requested by the aggregator.
  • The agenda of the portal did not match the internal agenda of the collection owning organisation so a lower priority was given to the job of contributing metadata.
Cite Permalink:
One interviewee said that there was, perhaps surprisingly, a poor awareness of metadata among collection owners, and some were not aware of the potential value to the community of the collections that they held, or of their associated metadata.
Cite Permalink:
A number of interviews touched briefly on the challenges associated with aggregations of aggregations such as differences in metadata in aggregations or services due to metadata enrichment or model used, duplication of entries, and potentially increased effort from collection owners if supplying multiple aggregators. It would be useful to investigate these challenges further, with the caveat that the value of this work may be transitory if linked data can be implemented effectively by collection owners.
Cite Permalink:
35 Metadata quality

Cite Permalink:
A number of the respondents were concerned about the content and quality of metadata provided for aggregation given the highly specific historical and disciplinary nature of much of the metadata in the catalogues of collections, and because metadata is often created for particular uses under specific circumstances (and thereby, is less useful for broader use).
Cite Permalink:
Two interviewees stated that the “description” field, especially for images, may not contribute any useful information: it may simply repeat the title, or be blank. Another said that frequently the subject of the image is not contained in the metadata. Keywords may be more useful, and easier to keep in a common, searchable format, however they may be inconsistent.
Cite Permalink:
One interviewee explained that for art images there is often no description or title as, in the arts, many things are untitled as well as anonymous. A challenge shared with scholarly works is that there may be many creators rather than just one.
Cite Permalink:
One interviewee involved with normalising metadata for images said that the metadata which commercial agencies compile for images tends to be oriented around specific keywords of interest to the advertising community, which may not be helpful or relevant for educational use. This is less so for organisations such as galleries, whose metadata are more suitable for academic use. In another instance, the required information was contained in the full metadata record but was not included among the metadata fields in the specified schema.
Cite Permalink:
40 Samples of images, films and sounds

Cite Permalink:
As text-based documents are commonly ‘full-text searchable’ it is possible to locate the search string within the document and present it in context, i.e. to present the extract of the document (the snippet) that contains that search term with the search term highlighted. A relevant ‘snippet’ to aid discovery would be a thumbnail for images or films, or a short clip for films or sounds. For those searching for such resources, such a sample is essential to determine whether the resource is the right one: one interviewee said that if an aggregation did not provide the facility for users to view the resources, for example, as thumbnails, it would be a ‘critical failure’; another said ‘a thumbnail still has to be there’. If such a sample is not available, then those searching for resources are reliant on the metadata, which may include the metadata embedded in the object if it has been extracted.
Cite Permalink:
42 Complexity in metadata for images and time-based media

Cite Permalink:
The JISC-funded application profiles developed for images[xv] and time-based media[xvi] reflect a systematic understanding of the complexities involved in metadata for images and time-based media. Both application profiles were based on the Functional Requirements for Bibliographic Records (FRBR)[xvii] entity-relationship model, which, although developed for bibliographic records, identifies and considers many of these issues which are relevant for images and time-based media. The scenarios for use of aggregated metadata given by respondents were in line with those identified as use cases for these two application profiles.
Cite Permalink:
Complexity in metadata about images and time-based media reflect complexity in the resources themselves – and differences between different media types. The media types are used to capture a range of different resources e.g. an image may represent a photograph of a painting or medical problem or the pages in an original manuscript. Users searching for a photograph of a painting may want information about both the original painting and about the photograph (e.g. who was the painter? Who was painted? Where is the painting currently stored or displayed? Who was the photographer? When was it painted? When was the photograph taken?). Those searching for a medical image will want information about the diagnosis but, in this instance, the anonymity of the subject is essential and access to view may be restricted to specific audiences. Effective use of page images from a manuscript may require that the metadata include the specific page order and if available that individual pages be linked to a transcript of the full text. The grouping of images to reflect page flow in a manuscript is important, and hence metadata to support this rendering.
Cite Permalink:
A film may have multiple authors and may consist of various elements e.g. film segments, soundtrack, subtitles and a printed cover, each of which has various contributors e.g. the designer, illustrator and editor for the cover. Thus, a film is one resource made up of multiple other resources, with corresponding layers of metadata. The relationships between these different layers of the resource are important to both contributors and users and thus should be possible to reflect in the metadata.
Cite Permalink:
46 ‘Time’ and ‘date’ mean various different things

Cite Permalink:
As indicated above the field label ‘date’ is not straightforward as, often, many dates are relevant to a resource and different users look for different dates. A documentary produced in 2001 may describe an event from 1600 using some footage produced in 1970. If it is made available in a repository, the date of ingest will also be recorded. Similarly, a digital image of an analogue photograph of a painting of a famous battle has several important dates: the date of the battle, the date of the painting, the date on which the analogue photograph was taken, the date on which the digital version was created, and the date of ingest into a repository. To an art-history student one date is most important and to a student of a photography another.
Cite Permalink:
‘Time’ may also refer to different things, e.g. the time of day on which a recording was made or the duration of the work.
Cite Permalink:
[i] http://pbcore.org/2.0/ (Accessed 31/08/10; PB is Public Broadcasting)
[ii] http://tech.ebu.ch/lang/en/MetadataEbuCore (Accessed 31/08/10; EBU is the collective organisation of Europe’s 75 national broadcasters)
[iii] http://www.ukoln.ac.uk/metadata/pns/pndsdcap/ (Accessed 31/08/10)
[iv] Metadata Encoding and Transmission Standard is librarians XML standard, contrasted with for example the multimedia standard DIDL (Digital Item Declaration Language, also known as part 2 of the MPEG-21 standard) See Richard Gartner http://www.jisc.ac.uk/media/documents/techwatch/tsw_0801pdf.pdf
[v] http://www.ukoln.ac.uk/repositories/digirep/index/Images_Application_Profile (Accessed 15/09/10)
[vi] http://wiki.manchester.ac.uk/tbmap/index.php/Project_Outputs (Accessed 15/09/10)
[vii] http://www.siobhandaviesreplay.com (Accessed 31/08/10)
[viii] Referenced at http://www.jiscdigitalmedia.ac.uk/training/digital-performance-seminars/ with example at http://www.jiscdigitalmedia.ac.uk/seminars/elements/ (Both accessed 31/08/10))
[ix] http://www.dshed.net/stars/preview (Accessed 31/08/10)
[x] http://www.bbc.co.uk/learningzone/clips/ (Accessed 31/08/10)
[xi] http://www.w3.org/TR/mediaont-10/
[xii] http://www.newscientist.com/article/mg20727715.400-google-twitter-and-facebook-build-the-semantic-web.html (Article from New Scientist online 02/08/10 by Jim Giles, accessed 30/08/10)
[xiii] http://nsdl.org/ (Accessed 06/09/10)
[xiv] http://ecommons.cornell.edu/bitstream/1813/7897/1/Paper_21.pdf (Accessed 06/09/10)
[xv] http://www.ukoln.ac.uk/repositories/digirep/index/The_Images_Application_Profile
[xvi] http://wiki.manchester.ac.uk/tbmap/index.php/Project_Outputs
[xvii] http://www.ifla.org/en/publications/functional-requirements-for-bibliographic-records

Total comments on this page:

One Response to “6.4 What are the metadata considerations?”

Phill Purdy says:

Note: a discussion paper has recently been published outlining a number of possible options for future revision of People’s Network Discover Service DC Application Profile which was adopted by the Culture Grid, UK aggregator service. Please see http://museum-api.pbworks.com/w/page/Culture-Grid-Profile to access and respond to the paper