CAPACOA, Culture Creates and other partners have wrapped up Phase 2 of the Linked Digital Future initiative. This post offers an overview of our key activities and lessons learnt, as well as links to full reports.
Towards Conceptual Clarity
Have you ever found yourself in a discussion with someone, thinking you are talking about the same thing, then realizing midway that you were in fact talking about two different things? That you were using the same word, but were ascribing different meanings to it?
This happened over and over during Phase 2 of the Linked Digital Future initiative as we continued modelling activities in Wikidata. Words can be ambiguous. A single word can convey multiple meanings (polysemy) while different words can denote the same thing or concept (synonymy). For example, the term “conductor”. Is this an occupation, a position within an organization or a role in a production? It turns out it could be all three. When representing this work-related concept as an RDF triple (subject-predicate-object), you may need a distinct predicate or property to represent the occupation, the position and the role.
In the last year, we worked with Conseil québécois du théâtre, Culture Creates and others to define good practices for representing performing arts information as linked open data. This required us to set words aside and to focus on a simple, high-level concept that they represent. Then, we had to describe these concepts in clear, domain-neutral ways, so that they would convey the same meaning to other people across languages and domains. In one case, for example, we borrowed the concept of ‘group’ from the CIDOC-CRM ontology to represent all performing arts groups, ensembles, troupes and organizations, no matter their legal form (see the resulting Wikidata item and the discussion).
By the end of Phase 2, we had achieved significant progress towards a harmonized performing arts ontology – the semantic layer in the vision for a linked open data ecosystem for the performing arts. This progress was documented in the WikiProject Performing arts and in the Artsdata.ca documentation. But much still needs to be done. Modelling efforts continue in collaboration with the Performing Arts Information Representation Community Group and the LODEPA community (Linked Open Data Ecosystem for the Performing Arts).
The Resource Description Framework (RDF) is a set of W3C specifications for representing information over the Web. The RDF triple is the fundamental building block for the entire web of linked open data.
CIDOC Conceptual Reference Model (CRM) is a classic RDF ontology developed by the cultural heritage sector. Alignment with CIDOC-CRM enables information reuse across cultural sectors.
Reaching a critical mass of named entities
Achieving conceptual clarity is also essential when dealing with named entities.
For example, is a database record describing an organization or a venue? Or both?
For the database manager, this difference may not be relevant. However, when you are trying to synchronize different datasets, ambiguity is a pain – and a source of headaches.
In the last year, we undertook work with Culture Creates, LaCogency, Conseil québécois du théâtre and several partners to integrate named entities about persons, organizations, buildings and places into Artsdata.ca and Wikidata. These named entities are part of the data layer in our vision for linked open data. They represent a foundation for the data ecosystem and they are a prerequisite for reuse of performing arts information.
A named entity is a ‘thing’ with a human-language name used to distinguish this thing from other things of the same type. Persons, Organizations and Works are examples of named entities. In authority files, this name will be matched to a unique persistent identifier. In linked data, this identifier and locator is called Uniform Resource Identifier (or URI).
Is this foundation in place yet? With the help of a new prototype software to perform Named Entity Recognition (NER), the number of persons, organizations and places in Artsdata grew tenfold, from 1,040 to 11,100!
According to benchmark data at Statistics Canada, we are now about halfway to completion for persons and places, and at around 20% for organizations (see exact figures in this presentation).
More effort will be invested in 2021-2022 to reach out to associations and unions who hold datasets about persons and organizations. If you do have datasets to contribute to Wikidata and Artstdata, please get in touch with Bridget MacIntosh at firstname.lastname@example.org.
We will also keep disentangling those records that conflate venue and organization information… with lots of patience – and Tylenol.
Accelerating data collection about events
One of the LDFI anticipated proof-of-concepts is to assemble named entities into rich event calendar entries, with full cast information and details about organizers. So, we have intensified efforts to build capacity for automatically crawling and scraping event information from websites. In addition to the Footlight technology, Culture Creates started ‘wringing’ Schema structured data wherever available. They also started experimenting with a new tool called Capacitor to scale up event crawling technology. All of these technologies have proven effective: the number of events in Artsdata grew 400%, from 1,140 to 4,620, even though this was a pandemic year.
Now there could be as many as 75,000 performing arts events in Canada per year. That’s a lot of information to codify into data and to load into Artsdata. AI-powered tools can do the bulk of the work. Yet we need other data provision strategies to achieve data exhaustiveness and quality. We need additional inputs straight from the performing arts community itself. It’s not all about technology. It’s about community collaboration.
Only then will we be able to offer consistent, quality and timely answers to the most important question: “What live performance can I see near me, now?”
Full reports and evaluations
We accomplished so much in the last year, that it would be impossible to pay justice to the efforts of all LDFI team members in a single blog post. Between the amazing work of LaCogency and Conseil québécois du théâtre around Wikidata, the workshops of the LODEPA community and the hundreds of hours of coaching, we have much to celebrate. So here are detailed reports on the LDFI and its various components.