Designing User Interfaces for Curation Technologies

Rehm, Georg; He, Jing; Moreno-Schneider, Julián; Nehring, Jan; Quantz, Joachim

doi:10.1007/978-3-319-58521-5_31

Georg Rehm¹⁴,
Jing He¹⁵,
Julián Moreno-Schneider¹⁴,
Jan Nehring¹⁴ &
…
Joachim Quantz¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10273))

Included in the following conference series:

International Conference on Human Interface and the Management of Information

2071 Accesses
2 Citations

Abstract

Digital content and online media have reached an unprecedented level of relevance and importance. In the context of a research and technology transfer project on Digital Curation Technologies for online content we develop a platform that provides curation services that can be integrated into concrete curation or content management systems. In this project, the German Research Center for Artificial Intelligence (DFKI) collaborates with four Berlin-based SMEs that work with and on digital content in four different sectors. The curation services comprise several semantic text and document analytics processes as well as knowledge technologies that can be applied to document collections. The key objective of this set of curation services is to support knowledge workers and digital curators in their daily work, i.e., to automate or to semi-automate processes that the human experts are normally required to do intellectually and without tool support. The goal is to help this group of information and knowledge workers to become more efficient and more effective as well as to enable them to produce high-quality content in their respective sectors. In this article we concentrate on the current state of a user interface that is currently under development at ART+COM, one of the SME partners in the project. A second, more generic, i.e., not domain-specific user interface is under development at DFKI. In this article we describe the technology platform and the two different interfaces. We also take a look at the different requirements for ART+COM’s domain-specific and DFKI’s generic user interface.

You have full access to this open access chapter, Download conference paper PDF

Towards a Platform for Curation Technologies: Enriching Text Collections with a Semantic-Web Layer

Different Types of Automated and Semi-automated Semantic Storytelling: Curation Technologies for Different Sectors

Towards User Interfaces for Semantic Storytelling

Keywords

1 Introduction

Digital content and online media have reached an unprecedented level of relevance and importance, especially with regard to commercial, political and societal aspects, debates and collective decisions. One of the many technological challenges related to online content refers to better support and smarter technologies for data, information and knowledge workers, i.e., persons, who work primarily at and with a computer, who are facing an ever increasing incoming stream of heterogeneous information and who create, based on the specific requirements, demands, expectations and conventions of the sector they work in and also based on their job profiles and responsibilities, in a rather general sense, new information. For example, experts in a digital agency build mobile apps or websites for clients who provide the digital agency with documents, data, pictures, videos and other assets that are processed, sorted, augmented, arranged, packaged and then deployed. Knowledge workers in a library digitise a specific archive, augment it with additional metadata, maybe also critical edition information and publish the archive online. Journalists need to stay on top of the news stream including blogs, microblogs, newswires, websites etc. in order to produce a new article on a breaking topic, based on the information they collected, processed, sorted, evaluated, verified and synthesised. A multitude of examples exists in multiple different sectors and branches of media (television, radio, blogs, print journalism, investigative journalism etc.). All these different professional environments and contexts can benefit immensely from semantic technologies that support these knowledge workers, who typically work under high time pressure, in their respective activities: finding relevant information, highlighting important concepts, sorting incoming documents in multiple different ways, translating articles in foreign languages, suggesting interesting topics. We call these different semantic services, that can be applied in different professional environments that all have to do with the processing, analysis, translation, evaluation, contextualisation, verification, synthesis and production of digital information, Curation Technologies.

In the context of our research and technology transfer project Digital Curation Technologies (DKT), the German Research Center for Artificial Intelligence (DFKI) develops a curation platform that offers language- and knowledge-aware services such as semantic analysis, search, analytics, recombination and generation (e.g., thematic, chronological and spatial) for the curation of arbitrary types of digital content. The platform automates specific parts of the curation workflows that knowledge workers or digital curators typically follow. Semantic technologies can be used to assist the experts in data processing, in terms of efficiency, breadth, depth, and scope, ascertaining what is important, relevant, maybe even genuinely new and eye-opening. The common ground for all these tasks and challenges is the curation of digital information.

We mainly work with larger, self-contained document collections – however, our technologies can also be applied to dynamic content, news, search results, tweets, blog posts etc. The key objective is to shorten the time it takes knowledge workers to familiarise themselves with a potentially large set of documents by semantically extracting relevant data of various types and presenting the data in a way that allows the knowledge workers to be more efficient, especially in the situation when he or she is not a domain expert with regard to the topics of the document collection. In the project, DFKI works with data sets provided by the project partners, four SME companies active in different sectors:

ART+COM AG: Museums and museum design, exhibitions, showrooms
Condat AG: Television, web tv, radio, media
3pc GmbH: Public archives
Kreuzwerker GmbH: Print journalism

DFKI develops the curation technology platform and provides curation services to the SME partners through standard web interfaces. The SME partners integrate these curation services in their own domain-specific and sector-specific systems and applications. One of the key aspects of our project is to explore how far we can go, how much we can achieve with the rather generic curation services in the different domain-specific use case scenarios of the four SME partners, all of which have their own requirements, demands, constraints and peculiarities.

With regard to the technologies DFKI builds modular semantic Language Technology and Knowledge Technology components that can be arranged in pipelines or workflows. Based on the output of these Natural Language Processing (NLP) and Information Retrieval (IR) components, a semantic layer is generated on top of a document collection. It contains various types of metadata in the form of annotations on the documents that can be made use of in further processing steps, visualisations or graphical user interfaces.

In this article we concentrate on the collaboration between DFKI and the project partner ART+COM AG, a design studio with an extensive history in creating media-rich exhibition designs in cultural and commercial sectors, e.g., a zoo for micro-organisms (Micropia) in Amsterdam, The Netherlands, an experience centre on Viking history in Jelling, Denmark, or a Product Info Center for the car manufacturer BMW in Munich, Germany.^{Footnote 1} ART+COM employs a number of knowledge workers who can, in a way, be conceptualised as the prototypical textbook users of digital curation technologies. These knowledge workers need to be able and flexible enough to familiarise themselves with completely new topics and domains in a very short timespan. In order to support this highly complex task, ART+COM is experimenting with curation technologies, for example, to process materials in order to produce a project pitch, to plan a new museum from scratch or to plan individual museum exhibits. The goal is to help these digital curators to become more efficient and more effective as well as to enable them to produce high-quality content through generic curation technologies.

The remainder of this article is structured as follows. First, we describe the technology platform and some of the curation services in more detail (Sect. 2). After this description of the technical background we concentrate on the state of a user interface that is currently under development at ART+COM (Sect. 3). A second, more generic, i.e., not particularly domain-specific, user interface is under development at DFKI (Sect. 4). Finally, we take a look at the boundary and different requirements between DFKI’s generic user interface and ART+COM’s rather domain-specific user interface (Sect. 5).

2 Curation Technologies

In this section we briefly describe the current version of our curation platform [1, 4, 5, 13]. The curation services exposed through RESTful APIs comprise modules that either work on their own or that can be arranged in the form of workflows. All current use cases at the four SME project partners (see above) revolve around the processing of document collections of almost arbitrary sizes, ranging from a few files to collections comprising millions of documents. The system is still under development.^{Footnote 2} The various NLP modules analyse documents and extract information to be used in several digital curation scenarios. Interoperability between the modules is achieved through the NLP Interchange Format (NIF) [11], i.e., all modules accept NIF as input and return NIF as output. The shared usage of NIF allows for the combination of web services in a decentralised way, without hard-wiring specific workflows or pipelines. In the following we present some of the curation services.

2.1 Named Entity Recognition and Named Entity Linking

First we convert every document to NIF and then perform Named Entity Recognition (NER). The NER module consists of two different approaches (additional ones to be added) that allow training the system with annotated data and/or to use lexicons and dictionaries. Afterwards the service attempts to look up the named entity on its (language-specific) DBPedia page using DBPedia Spotlight [12] to extract additional information using a SPARQL query. Similar queries can be used to retrieve additional information for different types of entities, e.g., for locations we point the service to Geonames in order to retrieve its coordinates.^{Footnote 3}

2.2 Temporal Expression Analysis

The temporal expression analyser is based on a regular expression based grammar that can process German and English natural language text. After the identification of temporal expressions, they are normalised to a shared machine readable format and added to the NIF representation. We also add document-level statistics based on the normalised temporal values. This analysis information can be used to position a document on a timeline.

2.3 Geographical Localisation Module

The geographical location module uses SPARQL to retrieve the latitude and longitude of a location as specified in its DBpedia entry (see above). The module also computes the mean and standard deviation value for latitude and longitude of all identified locations in a document. This analysis information can be used to position a document on a map visualisation.

2.4 Text Classification and Document Clustering

For the classification service we make use of the Mallet Toolkit^{Footnote 4} to assign topics or domains such as “politics” or “sports” to documents. The respective topic is stored within an RDF element in the NIF representation. Document clustering, or, rather, the clustering of all entities contained in a document or all documents, is performed with the help of WEKA.^{Footnote 5} The service is called from the interface in order to give users a clarifying view by groups of entities appearing in a collection, which can be useful as an exploratory tool.

3 Curation Technologies for Exhibition Design

The primary technical challenge of the project is the development of smart curation technologies that, already in their generic state, provide good results in terms of recall, precision, quality, and coverage, and that can be customised and configured to the four specific SME partners’ use cases through domain adaptation and customisation capabilities. A second challenge relates to the – by definition domain-specific – design of the corresponding user interfaces (UI) under development by the SME partners. These UIs have their own requirements and constraints, depending on the respective domains, sectors, use cases and target users. In this paper we focus upon the sector of exhibition design. We explain, in detail, the user-centered design process, which has been guiding the implementation of the ART+COM user interface.

3.1 Initial User Studies

The UX design team at ART+COM began the user research process with in-depth interviews and surveys to learn about our potential users, the knowledge workers, their behavior, goals, motivations and needs. The initial user research was carried out in a relatively broad way, involving 12 knowledge workers in academic and commercial fields producing either scientific or creative work. The interviews were conducted in each interviewee’s typical work environment, ranging from library, office to cafe and home. The interviews began with asking the interviewees general questions about their research process and goals. Then interviewees were asked to show us, on their computers, how they perform their research. This provided us with insights on the kinds of tools and environments each user is familiar with as well as extrapolating their usage patterns. Moreover, the team acquired an intuitive understanding of the different phases of their workflows, and the goals and expectations for each research step.

While the interviews gave us qualitative findings, the surveys provided a quantitative overview of the potential users. The questionnaire was based on the insights the design team gathered from the interviews, addressing the problem space identified in each key research phase. The 15 questions were split into “gathering information”, “processing information”, and “sharing information” phases. About half of the questions were multiple choice, with the option for embellishment, and the other half free-text. The questions included the kinds of existing tools and services they use and pain points they experience as well as the kind of data and information they gather, how they organise them and share their findings. Responses from 20 participants were received.

3.2 Personas

The qualitative and quantitative user research forms the basis for creating several personas. Personas can be used to synthesise real users’ needs, behaviours, and mental models. Based on the results of the surveys and interviews a set of behavioral variables were identified, which can be used to create semantic differential scales [6]. During the persona workshops, the team collectively placed each user along the scales based on his/her responses and actions. Eventually, clusters began to emerge on the scales, from which primary and secondary personas were identified. Meet Julia, Kate, and Alex (Fig. 1).

Julia is our primary persona. She works at a creative agency as a content researcher. Constrained by time and budget, shes quick on her feet in gathering information for many short-term projects, often at the same time. Her research process is open, from one keyword branching into other possibly interesting topics, formulating and refining the “right” questions. Her goal is to be able to quickly grasp a general impression of a topic so that she can focus on meaningful key points to highlight and share with the team. During the research, she constantly weighs in on the validity of the sources found.

On the contrary, Alex is an academic researcher working on long-term rigorous academic research. When he is not writing, he is doing research in online archives or burying his head in books at the library. Alex works mostly alone. His research process is highly structured and iterative, taking his time to conduct thoroughly primary as well as secondary research.

Personas set the foundation for the following steps in the user-centered design process, helped us prioritise core features and to communicate them to the design, development and product management team.

3.3 User Scenarios, Task Analysis, Functional Requirements

Based on the knowledge about the personas, the team formulated user scenarios to understand the context and motivation behind users’ interactions, one of which represents the most common usage pattern of Julia, our primary persona. The scenario describes that Julia has received a lot of material from the client for an exhibition about microbes. These materials include images and scientific papers about different micro-organisms, including their behaviors, interactions with their habitat, etc. Julia has a couple days to get an overview of the received information as well as to familiarise herself with the topic of microbiology. She has to effectively share her key findings with the design team in order to kick-start conceptual and creative brainstorm sessions.

A task flow [7] is generated describing the individual steps Julia would take in order to complete this assignment. The tasks are described intentionally on a high level to focus on her actions rather than being distracted by the current technology she employs. Twelve steps have been defined, which are grouped in the phases “search”, “evaluate”, and “organise” of her entire workflow (Fig. 2). Having the persona in mind, the team then brainstormed on the user’s functional needs, required knowledge and optimal outcome for each step. For example, in step one Julia studies her notes and identifies keywords. Her functional needs at this stage could be having structured notes, an overview of keywords and the ability to identify keywords. However, since she is new to the topic of microbiology, it may not be easy for her to find relevant information at first.

After having gone through all the key tasks the user performs, the next iteration focused on translating the user’s functional needs to functional requirements of the system. In order to fulfill his/her needs in the “search" step, the system could allow the user to import her notes and source materials, extract the keywords and display the keywords in a meaningful way, which could provide the user an immediate overview and understanding of the information at hand so that she can have a better start for her research.

In this first step, the design team has identified the user’s need to acquire specific knowledge necessary to accurately make the right decision. In this case, familiarising herself with the topic of microbiology. The key pain point for our personas is the lack of an overview of the information they find and collect. Therefore, the team decided to prioritise the system functionalities which would optimise the user’s task flow, namely impart the knowledge they need, and improve the information overview.

3.4 Minimum Viable Product

Based on the core features identified during the task analysis, we defined the major problem-space that we would like to tackle within the scope of the minimum viable product (MVP) [8]. The design of the MVP would only focus upon optimising the user’s initial research phase of the entire workflow. The MVP would inform us how the curation services and visualisation techniques could improve the knowledge worker’s understanding of the domain and their ability to discover meaningful insights.

Application Framework. Our prototype is a web application implemented using RESTful APIs. It allows users to begin their research by importing existing documents, such as briefing materials from the client, or by performing an explorative web search with keywords. Whether it’s imported documents or results from web pages, the content is automatically analysed by the semantic curation services provided by our research partner DFKI. The application then performs a batch-lookup on the extracted information, for example, named entities, on Wikidata in order to enrich the entities (e.g., “Harald Bluetooth”, “Gorm the Old”, or “Sweden” in the context of the Viking experience center) with useful additional information. The Wikidata entry of a given entity can be used to enrich the locally presented with the date and place of one’s birth and death, family relations and occupation, etc. The entities are further enriched with top-level ontology labels in order to give the user an overview of the distribution of information in categories, for instance, person, organisation, and location. Figure 3 shows the key screens implemented in the current prototype.^{Footnote 6}

Information Visualisation. Our second focus is to visualise the extracted information. We implemented the visualisations in D3, a Javascript library optimised for visualising data with HTML, SVG and CSS. The challenge is to provide intuitive and effective interaction modalities so that the users can gain a quick overview of a specific topic or drill down into the semantic knowledge base to explore deeper patterns or relationships.

We realised several visualisations including a network overview, semantic clustering, timelining and maps (Fig. 4). While a network is effective for exploring semantic relationships amongst extracted entities, semantic clustering provides a good overview of entities closely related to each other in groups. Timelines and maps, in addition, offer a geo-temporal overview of extracted entities. Combining all of the above mentioned visualisations for doing research on Harald Bluetooth, for example, the user can explore the places and people to which he is connected in the network visualisation, view his timespan of reign and family lineage along the timeline, and explore notable events that are highlighted on the map.

In terms of interaction modalities, the user can directly manipulate the graph by zooming and panning. On highlighting entities, tooltips appear to display entity properties found on Wikidata. Users can also select and focus on the connections between two entities to explore in depth how they are related. Harald Bluetooth’s connection to Gorm the Old, for instance, is encoded with several connection possibilities. Harald is the son of Gorm, He also became the King of Denmark after his father.

Besides the direct interaction with the visualisation, we also implemented interactive filters in order for the user to effectively get to specific information needed. For instance, the category filter allows the user to filter entities which are only person or a combination of person, location, or other category labels that are deemed relevant to the user.

3.5 User Evaluation

The MVP is developed in an agile approach, in which the team at ART+COM iterates through design, development and user test cycles. We consistently invited our in-house curation workers to participate in the user tests.

Earlier Prototypes. In the first user test the user was asked to interact with a paper prototype (Fig. 5). The aim was to quickly find out if the overall navigation structure was sound. For instance, we learned that our users need to be able to quickly switch between projects and perform project-based searches. We took this insight and built our application so that the search history is part of a project property. Previously searched items show up in relevant project locations so that users don’t lose track of their search progress when switching between projects.

This was followed by an interactive prototype implemented in Proto.io.^{Footnote 7} In this prototype users could perform basic interaction such as enter a keyword, scroll through the page, drag and drop files. The interaction was limited, but the design was able to evaluate and conclude that the users understood the overall interaction framework. Furthermore, we were able to get the user’s impression on the usefulness of contextual visualisation. All six users we interviewed understood that visualisations are used to summarise the content of a given page. Some found it “novel”, while others could imagine the visualisations help them create new content and “make sense of data”. Since the content of the user test was based on an existing project about Vikings in Denmark, one user raised the question how visualisations would evolve if the content is more scientific instead of historical.

Current Prototype. The last user test was conducted using the latest developed web application, described in Sect. 3.4. The design team focused on, one, evaluating the usefulness of semantically extracted entities and the information that enriches the given entities, and, two, user interaction with the visualisations.

We interviewed five knowledge workers. Each interview took about one hour. We first informed the users that our design studio has been invited to create a concept proposal for an interactive installation about the Italian artist Caravaggio for an art museum. Given this context, the user had a few minutes to glance through a printed document about the artist. The users were then asked to switch to the web application, in which we presented the printed document digitally with extracted entities highlighted. We asked them to focus on highlighted words and what they could mean. All the users were able to identify that the highlighted words were about people and places. One user questioned why only these words were highlighted while another user concluded that the people and places must all have a connection to Caravaggio.

The users were asked to navigate to the entity list view where the entities are listed based on number of occurrences, category types or alphabetical order. Each entity is assigned a top-level category and enriched with a short descriptive statement. We asked the users to explore the content and interface freely while thinking aloud. Here, the users perceived that the enrichment of the entities seem to come from the web, like “a condensed version of a browser search”.

Throughout the evaluation, users were asked to keep in mind a few keywords that seemed interesting to them as they transition from the document view, to the entity list view, and finally to the visualisation view, in hope that the users can keep focus on exploring the content rather than feeling overwhelmed by being confronted with a new interface. In the visualisation view, the users could first begin with free exploration while thinking aloud. All of them started by directly manipulating the graph, i.e., dragging the nodes or highlighting nodes to see additional information in the tooltip. We then asked them to perform actions they missed during the free exploration phase. For example, one user didn’t realise that she could zoom into the graph while another user didnt know that she could click on the edge between nodes to explore in depth the relation. In summary, all users found the list view most useful and the expanded edge view, showing all the connections between two entities, intriguing, offered further research directions.

The challenge of the user test was that the users had to familiarise themselves with the interface before they could feel confident enough to use the tool to help them analyse the content. Secondly, our users are content workers who are not familiar with interacting with visualisations. Changing the visual encoding of the node size based on the number of connections versus the number of occurrences in the document was not obvious to them, nor was the concept of cross-filtering to distill down the information they were looking for.

The users concluded that the application provides a good overview of the subject and that they would use the tool at the beginning of a project, particularly when confronted with massive amounts of text. However, some relevant information from the original text was missing and some extracted information seemed out of context. Regarding the usefulness of extracted information, they validated the relevance of the extracted names of people and places and that the enriched “general information” about each entity was useful. However, they wish for more relations between entities extracted from the document so that they could deepen their understanding through the document itself as well as through the, more generally defined, Encyclopedia-like information.

3.6 Next Steps

ART+COM will continue to improve the visualisation-based user interface, allowing a seamless browsing experience between entities and the content from which they are extracted. We will evaluate the approach with different knowledge domains as ART+COM’s exhibition topics are quite diverse, ranging from historical to scientific. We already started exploring image classification as the knowledge workers often work with images. The web search so far is based on search queries from Bing and Wikipedia only. We could, in addition, include sources from Project Gutenberg, Archive.org or other structured knowledge bases.

Finally, “Queen” can be either an occupation or the name of a rock band. In order to offer our knowledge workers reliable enriched information that is context-specific, the team at ART+COM will, in close collaboration with DFKI, continue to improve the graph-service so that we can better resolve the intended meaning of ambiguous entities and provide more flexibility in the user-interface so that the system can also learn and adapt based on user’s input and intent.

4 Between Domain-Specificity and General Applicability

So far we have described the digital curation platform (Sect. 2) and one specific use-case, the application of curation services, provided by the platform, in the domain-specific user interface designed and implemented by ART+COM (Sect. 3). The curation platform itself provides RESTful APIs. Language and Knowledge Technology platforms such as this one do not typically come with a UI because their functionality is usually integrated in larger applications with concrete use cases and established user interfaces.

Nevertheless, there is a certain GUI approach that several Natural Language Processing applications share, especially systems that concentrate on text analytics such as part-of-speech tagging, Named Entity Recognition and relation extraction. These methods analyse text content and annotate information in the processed documents. There are several different ways of presenting and visualising the annotated information, i.e., using different text or background colours (of specific text segments), different fonts or font sizes, using pop-up menus or using additional graphs that are embedded into the document display. Additionally, an explanation of, for example, the different colours is usually shown to the left or right of the document view. To a certain extent this approach can be seen as an established interface convention and best practice in research and industry.

Several tools apply this interface metaphor. One of them is the General Architecture for Text Engineering (GATE), an integrated system that provides mechanisms and a graphical interface that allow putting together NLP pipelines in an easy way.^{Footnote 8} GATE has several technological properties that make its application in a general curation scenario a rather big challenge. For example, the inclusion of external storage, such as Virtuoso or Lucene, is possible but very difficult. GATE provides part of its functionality using the interface metaphor briefly described in the introduction of this section. A second tool that uses this metaphor is Open Calais, a commercial product by Thomson Reuters.^{Footnote 9} Open Calais can be tested through a web interface and also attempts to link entities to knowledge bases. Compared to these typical interfaces, ART+COM’s user interface goes several steps further by providing customised features that are tailor-made to the requirements of their in-house knowledge workers and content curators, such as presenting different views that show the number of occurrences of entities in a document in order to make it easier to assess their respective relevance.

A type of software that comes close to what a generic curation user interface can or should be able to support, is the established software category of Content Management Systems (CMS). The primary function and use case of all CMS products is the management of individual content pieces including creation, editing and publishing. Many CMS tools also provide analytics functions but these only refer to typical key performance indicators such as web access statistics, advertisements, conversion rates and others. Only very few Content Management Systems provide features that resemble our curation services, i.e., actual semantic content analytics. However, many CMS products allow the integration of plug-ins, making it possible to integrate our curation services into an established CMS.

DFKI has been developing a more generic interface that we call “Curation Dashboard”. It is meant to provide a GUI and testbed that makes available all curation services developed by DFKI. The interface is used to test the curation platform with an immediately and intuitively usable interface, to evaluate certain curation services, to showcase the system (and project) to interested colleagues from research and industry as well as to experiment with concrete curation scenarios. Among others, we have already used the dashboard to experiment with four use cases not covered by our funded project (digital libraries [9, 10], forensic linguistics [2], digital humanities [3] and investigative journalism).^{Footnote 10}

In contrast to ART+COM’s interface, the DFKI curation dashboard (Fig. 6) is not meant for production use but as a tool that can be demonstrated at conferences, in industry exhibitions, presentations, or within project acquisition scenarios. There are many different use cases and sectors that could be interested to work with curation technologies. In addition to the ones already mentioned, these are, among others, healthcare, finance, trend detection, customer relationship management and research itself (citation analysis etc.).

The main interface of the dashboard (Fig. 6) is composed of four parts that show a small preview of the available visualisations as a summary of the document collection: timelining (top left), geolocation (top right), entities clustering visualisation (bottom left) and document list view (bottom right). The current preview is limited. A visualisation of all documents in a collection can be accessed through their own specific views (Figs. 7 and 8). These are bigger and offer better visualisation possibilities for larger numbers of documents.

The document view (Fig. 8) shows the output of the NLP processes through the annotations visualised within the text using the established best practice approach described above, i.e., among others, applying colours to mark specific metadata. In our case, the colours represent the type of analysis that created the annotations.

5 Summary and Conclusions

In the project Digital Curation Technologies we develop a platform that provides a set of generic curation services that can be integrated, via RESTful APIs, into sector-specific applications with their own sector-specific requirements and use cases. The services comprise, among others, several semantic text and document analytics processes as well as knowledge technologies that can be applied to document collections. The goal is to support knowledge workers in their daily work, i.e., to automate or to semi-automate routine processes that the human experts are normally required to do intellectually and without tool support. We want to explore if we can support these digital curators to become more efficient and more effective by delegating time-consuming routine tasks to the machine.

In this article we concentrate on one of the user interfaces currently under development in the project. ART+COM’s interface is primarily meant to be used by in-house content curators who develop, among others, concepts and exhibits for museums, showrooms and exhibitions. Based on a user-centered design approach, ART+COM’s curation interface comprises features requested and tailored to the requirements of their target users.

A more generic, i.e., not domain- or use-case-specific interface is under development at DFKI. This curation dashboard has a different purpose: it is meant as an environment so that services can be tested and evaluated. The interface is also used to showcase the platform to colleagues from research and industry as well as to assess the feasibility of other curation scenarios. DFKI recently used the system for experiments in four domains not covered by our project, i.e., digital libraries [9, 10], investigative journalism, forensic linguistics [2], digital humanities [3]. These preliminary tests have shown that the dashboard has a lot of potential for all four use cases but that it would have to be adapted to the respective use case before being deployed in a production scenario.

While DFKI’s dashboard is adopting the approach of showcasing all features of the implemented curation services by exposing, through the UI, all technical features to the user, who typically wants to demonstrate and test the system, ART+COM’s approach is to focus upon the actual requirements of the intended target users in a production environment. The next iteration of DFKI’s curation dashboard will incorporate the generic, non-domain-specific requirements and insights acquired by ART+COM during the user centered design phase of their interface, for example, presenting the respective frequencies of named entities including interactive visualisation and ordering options, because this feature can potentially be helpful in multiple different domains and use cases.

Nevertheless, advanced and novel UI features such as the one mentioned above must be considered unusual when compared to established interface types and visual metaphors, such as, the typical word processor or spreadsheet, the typical file viewer or search engine result page. In that regard we are exploring new visual metaphors and interaction concepts that bridge the gap between information and knowledge, whether through smart context-sensitive annotation features or non-linear information displays, from which patterns and insights can emerge. This boundary between established and experimental interfaces is dynamic and fluid. It is influenced by the experience users make with popular software tools and mobile apps, which in turn, shape their expectations with regard to the responsiveness and behaviour of an application. For example, due to the fact that all online search engines introduced the feature of automatically providing suggestions when typing in a query, users nowadays expect any search interface to provide suggestions as well.

Especially in a production environment, it can be a challenge to introduce novel interfaces and new visual metaphors – interfaces that are too novel or too avantgarde may require extensive training sessions, for example. Nevertheless, completely new functionalities require the development of new interfaces and the adoption of new metaphors. In that regard we are currently in the process of developing a new type of interface for a novel curation service called “semantic storytelling” that is meant to generate possible storylines for articles that deal with the content of a specific document collection [4]. As semantic technologies that are based on Artificial Intelligence methods have been making tremendous steps forward recently, so should the way knowledge workers and content curators experience information be transformed. Technology is slowly but surely becoming a co-curator of automatically processed or generated information, making innovative user interface design all the more relevant and important.

Notes

1.
http://www.artcom.de.
2.
In a second paper at this conference [4], we focus upon the “Semantic Storytelling” curation service and provide more technical details regarding the Curation Platform. The platform itself is based on the FREME infrastructure [11].
3.
http://www.geonames.org.
4.
http://mallet.cs.umass.edu.
5.
http://www.cs.waikato.ac.nz/ml/weka/.
6.
A screencast of the prototype is available at https://vimeo.com/182694896.
7.
https://proto.io.
8.
https://gate.ac.uk.
9.
http://www.opencalais.com.
10.
A screencast of the prototype: https://www.youtube.com/watch?v=TgP_TxoobuU.

References

Bourgonje, P., Moreno-Schneider, J., Nehring, J., Rehm, G., Sasaki, F., Srivastava, A.: Towards a platform for curation technologies: enriching text collections with a semantic-web layer. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 65–68. Springer, Cham (2016). doi:10.1007/978-3-319-47602-5_14
Chapter Google Scholar
Bourgonje, P., Schneider, J.M., Rehm, G.: Digital curation technologies for forensic linguistics. In: 13th Biennial Conference of the International Association of Forensic Linguists, Porto, Portugal, July 2017, In print
Google Scholar
Bourgonje, P., Schneider, J.M., Rehm, G.: Semantically annotating heterogeneous document collections – curation technologies for digital humanities and text analytics. In: CUTE Workshop 2017 -CRETA Unshared Task zu Entitätenreferenzen. Workshop bei DHd2017, Berne, Switzerland, February 2017, In print
Google Scholar
Bourgonje, P., Schneider, J.M., Rehm, G.: Towards user interfaces for semantic storytelling. In: 19th International Conference on Human-Computer Interaction- HCI International 2017, Vancouver, Canada, July 2017, In print
Google Scholar
Bourgonje, P., Schneider, J.M., Rehm, G., Sasaki, F.: Processing document collections to automatically extract linked data: semantic storytelling technologies for smart curation workflows. In: Gangemi, A., Gardent, C. (eds.) Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016), pp. 13–16. The Association for Computational Linguistics, Edinburgh, September 2016
Google Scholar
Osgood, C.E., Suci, G.J., Tannenbaum, P.H.: The Measurement of Meaning. University of Illinois Press, Urbana (1957)
Google Scholar
Hackos, J.T., Redish, J.C.: User and Task Analysis for Interface Design. Wiley, New York (1998)
Google Scholar
Manovich, L.: Lean UX. OReilly Media Inc., Sebastopol (2013)
Google Scholar
Neudecker, C., Rehm, G.: Digitale Kuratierungstechnologien für Bibliotheken. Zeitschrift für Bibliothekskultur 027.7 4(2), November 2016. http://0277.ch/ojs/index.php/cdrs_0277/article/view/158
Rehm, G.: Flexible Digitale Kuratierungstechnologien für verschiedene Branchen und Anwendungsszenarien. In: Bienert, A., Flesser, B. (eds.) EVA Berlin 2016- Elektronische Medien & Medien, Kultur, Historie, Berlin, Germany, 23. Berliner Veranstaltung der internationalen EVA-Serie. 09–11 November, pp. 19–22, November 2016
Google Scholar
Sasaki, F., Gornostay, T., Dojchinovski, M., Osella, M., Mannens, E., Stoitsis, G., Richie, P., Declerck, T., Koidl, K.: Introducing FREME: deploying linguistic linked data. In: Proceedings of the 4th Workshop of the Multilingual Semantic Web. MSW 2015 (2015)
Google Scholar
Spotlight, D.: Dbpedia spotlight website (2016). https://github.com/dbpedia-spotlight/
Srivastava, A., Sasaki, F., Bourgonje, P., Moreno-Schneider, J., Nehring, J., Rehm, G.: How to configure statistical machine translation with linked open data resources. In: Proceedings of Translating and the Computer 38 (TC38), London, UK, pp. 138–148, November 2016
Google Scholar

Download references

Acknowledgment

The project “Digitale Kuratierungstechnologien” (DKT) is supported by the German Federal Ministry of Education and Research (BMBF), “Unternehmen Region”, instrument Wachstumskern-Potenzial (no. 03WKP45). More information: http://www.digitale-kuratierung.de.

Author information

Authors and Affiliations

DFKI GmbH, Alt-Moabit 91c, 10559, Berlin, Germany
Georg Rehm, Julián Moreno-Schneider & Jan Nehring
ART+COM AG, Kleiststraße 23-26, 10787, Berlin, Germany
Jing He & Joachim Quantz

Authors

Georg Rehm
View author publications
You can also search for this author in PubMed Google Scholar
Jing He
View author publications
You can also search for this author in PubMed Google Scholar
Julián Moreno-Schneider
View author publications
You can also search for this author in PubMed Google Scholar
Jan Nehring
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Quantz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georg Rehm .

Editor information

Editors and Affiliations

Tokyo University of Science, Tokyo, Japan
Sakae Yamamoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rehm, G., He, J., Moreno-Schneider, J., Nehring, J., Quantz, J. (2017). Designing User Interfaces for Curation Technologies. In: Yamamoto, S. (eds) Human Interface and the Management of Information: Information, Knowledge and Interaction Design. HIMI 2017. Lecture Notes in Computer Science(), vol 10273. Springer, Cham. https://doi.org/10.1007/978-3-319-58521-5_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-58521-5_31
Published: 18 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58520-8
Online ISBN: 978-3-319-58521-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics