Bringing Light Into the Dark: A Large-Scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework

Ali, Mehdi; Berrendorf, Max; Hoyt, Charles Tapley; Vermue, Laurent; Galkin, Mikhail; Sharifzadeh, Sahand; Fischer, Asja; Tresp, Volker; Lehmann, Jens

doi:10.1109/tpami.2021.3124805

Cited by 58 publications

(41 citation statements)

References 32 publications

(82 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Tensor factorization and tensor networks [12] are wellknown factorization techniques used for knowledge graph embedding with semantic matching interactions [13]. In a tensor representation, each triplet of the knowledge graph is represented by the value, X ijk of a three dimensional tensor, X ∈ R n×n×m the triplet (i, j, k) exists, or can be an arbitrary non-negative value that measures the strength of the k-relation between the iand j-entities.…”

Section: Introductionmentioning

confidence: 99%

Distributed non-negative RESCAL with Automatic Model Selection for Exascale Data

Bhattarai¹,

Kharat²,

Skau³

et al. 2022

Preprint

View full text Add to dashboard Cite

With the boom in the development of computer hardware and software, social media, IoT platforms, and communications, there has been an exponential growth in the volume of data produced around the world. Among these data, relational datasets are growing in popularity as they provide unique insights regarding the evolution of communities and their interactions. Relational datasets are naturally non-negative, sparse, and extra-large. Relational data usually contain triples, (subject, relation, object), and are represented as graphs/multigraphs, called knowledge graphs, which need to be embedded into a low-dimensional dense vector space. Among various embedding models, RESCAL allows learning of relational data to extract the posterior distributions over the latent variables and to make predictions of missing relations. However, RESCAL is computationally demanding and requires a fast and distributed implementation to analyze extra-large real-world datasets. Here we introduce a distributed non-negative RESCAL algorithm for heterogeneous CPU/GPU architectures with automatic selection of the number of latent communities (model selection), called pyDRESCALk. We demonstrate the correctness of pyDRESCALk with real-world and large synthetic tensors, and the efficacy showing near-linear scaling that concurs with the theoretical complexities. Finally, pyDRESCALk determines the number of latent communities in an 11-terabyte dense and 9-exabyte sparse synthetic tensor.

show abstract

Section: Introductionmentioning

confidence: 99%

Distributed non-negative RESCAL with Automatic Model Selection for Exascale Data

Bhattarai¹,

Kharat²,

Skau³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…On the other hand, QE bypasses the need for a database or query engine and performs reasoning directly in a latent space by computing a similarity score between the query representation and entity representations 1 . A query representation is obtained by processing its equivalent logical formula where joins become intersections (∧), and variables are existentially quantified (∃).…”

Section: Introductionmentioning

confidence: 99%

“…Instead, CQD decomposes a query in a sequence of reasoning steps and performs a beam search in a latent space of KG embeddings models pre-trained on a simple 1-hop link prediction task. A particular novelty of this approach is that no end-to-end training on complex queries is required, and any trained embedding model of the existing abundance [1,16] can be employed as is.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Query Embedding on Hyper-relational Knowledge Graphs

Alivanistos¹,

Berrendorf²,

Cochez³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Multi-hop logical reasoning is an established problem in the field of representation learning on knowledge graphs (KGs). It subsumes both one-hop link prediction as well as other more complex types of logical queries. Existing algorithms operate only on classical, triple-based graphs, whereas modern KGs often employ a hyperrelational modeling paradigm. In this paradigm, typed edges may have several key-value pairs known as qualifiers that provide fine-grained context for facts. In queries, this context modifies the meaning of relations, and usually reduces the answer set. Hyper-relational queries are often observed in real-world KG applications, and existing approaches for approximate query answering cannot make use of qualifier pairs. In this work, we bridge this gap and extend the multi-hop reasoning problem to hyper-relational KGs allowing to tackle this new type of complex queries. Building upon recent advancements in Graph Neural Networks and query embedding techniques, we study how to embed and answer hyper-relational conjunctive queries. Besides that, we propose a method to answer such queries and demonstrate in our experiments that qualifiers improve query answering on a diverse set of query patterns. 1 Existing QE approaches operate only on the entity level and cannot have relations as variables Preprint. Under review.

show abstract

“…Existing software packages that provide implementations for different KGEMs usually lack entire composability: model architectures (or interaction models), training approaches, loss functions, and the usage of explicit inverse relation cannot arbitrarily be combined. The full composability of KGEMs is fundamental for assessing the performance of KGEMs because it allows assessing single components individually on the model's performance instead of attributing a performance increase solely to the model architecture, which is misleading (Ruffinelli et al, 2020;Ali et al, 2020). Besides, often only limited functionalities are provided, e.g., a small number of KGEMs are supported, or functionalities such as HPO are missing.…”

Section: Introductionmentioning

confidence: 99%

BioKEEN: a library for learning and evaluating biological knowledge graph embeddings

et al. 2019

Self Cite

View full text Add to dashboard Cite

Summary Knowledge graph embeddings (KGEs) have received significant attention in other domains due to their ability to predict links and create dense representations for graphs’ nodes and edges. However, the software ecosystem for their application to bioinformatics remains limited and inaccessible for users without expertise in programing and machine learning. Therefore, we developed BioKEEN (Biological KnowlEdge EmbeddiNgs) and PyKEEN (Python KnowlEdge EmbeddiNgs) to facilitate their easy use through an interactive command line interface. Finally, we present a case study in which we used a novel biological pathway mapping resource to predict links that represent pathway crosstalks and hierarchies. Availability and implementation BioKEEN and PyKEEN are open source Python packages publicly available under the MIT License at https://github.com/SmartDataAnalytics/BioKEEN and https://github.com/SmartDataAnalytics/PyKEEN Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Bringing Light Into the Dark: A Large-Scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework

Cited by 58 publications

References 32 publications

Distributed non-negative RESCAL with Automatic Model Selection for Exascale Data

Distributed non-negative RESCAL with Automatic Model Selection for Exascale Data

Query Embedding on Hyper-relational Knowledge Graphs

BioKEEN: a library for learning and evaluating biological knowledge graph embeddings

Contact Info

Product

Resources

About