The Discovery Lab seeks to advance the ability to construct, use and study a large-scale knowledge graph that integrates knowledge across heterogeneous scientific content and data. The knowledge graph will allow for a deeper, richer use of content and data across a larger span of domains than possible thus far.

Knowledge Graph construction and updating from natural language sources is a key capability for any scientific knowledge graph. One of the targets of the lab is the creation of a knowledge graph updating service that feeds in real-time on a stream of scientific literature and will produce high quality knowledge graph updates that will enable high precision alerts.

Reinforcement learning over structured multi‐modal information

When providing recommendations, it is often possible to get some evaluation of that recommendation (‘was it useful?’) while counterfactual information is often not available (‘would another recommendation have been more useful?’). These circumstances can make it difficult to train models using regression or classification techniques. In the lab we investigate the use of reinforcement learning which can work with structured multi-modal information such as user context, knowledge graphs, or text as inputs. As a result the system generates structured recommendations, such as proposals for promising looking lab experiments.

Task‐Based Question Answering

In the lab we want to advance the state-of-the-art in question answering. We focus on challenging scenarios that requires the QA model to reason, gather, and synthesize disjoint pieces of information within the context to generate an answer. Moreover, we do so in the task-based setting of a research platform with a global perspective, aimed at generating partial answers from background knowledge, combined with a task-based perspective. The goal is to generate actionable answers that fit the research task at hand, and a mixed-initiative perspective, that is aimed at eliciting information about research tasks and designs from users of the research platform so as to optimize the QA responses being generated.

Hypothesis Generation

Many disciplines have managed to amass such large amounts of data that data-gathering platforms are no longer restricted to passive “query answering” anymore. Instead of expecting researchers to formulate hypotheses (and subsequently test these against the data), we will build the capacity to propose plausible new hypotheses. The challenge will be to extend the prediction of micro-hypotheses to hypotheses of more generality and complexity. This will require advances in knowledge graph completion (extending current techniques to completion with subgraphs, rather than with simple links), and in knowledge graph generalization.

Knowledge‐driven query construction

Tasks such as constructing meta-reviews in medical research, or writing literature surveys in general, involve finding evidence using manual or semi-automatic methods, which are time-consuming and suffer from low precision. To overcome this difficulty, we will investigate the knowledge-based construction of complex queries where prior domain specific knowledge is used to constrain the space of possible queries to be generated. This work will result in a query-answering service that can answer natural language questions from a joint defined example set through knowledge-driven query construction and query answering through query embedding. This service can also be used to construct “standing queries” that continuously monitor literature and knowledge graph updates in order to alert a scientist based on their personal profile.