TIVA-KG: A Multimodal Knowledge Graph with Text, Image, Video and Audio

# Project Overview

TIVA-KG is a multimodal commonsense knowledge graph featuring multiple modalities and triplet grounding.

Multiple modalities: Most multimodal KGs focus on image modality, with only a few exceptions that consider other modalities INSTEAD OF images. However, it is time to bring them all together. TIVA-KG adds images, videos and audio clips to the basic topology which contains text as well.

Triplet grounding: When aligning data to "run", we tend to get images about humans running, while for "dog", images about dogs sitting or standing. How can we exactly get images about running dogs? Align data to the triplet consisting of "dog", "run" and the relation from "dog" to "run": "dog CapableOf run". This simple example shows the advantagous expressing ability granted by triplet grounding.



Statistics TIVA-KG consists of 440K entities and 1.3M triples, i.e., 443,580 entities and 1,382,358 triples, with every entity reachable from each other to ensure good connectivity. We provide detailed statistics below:

Statistics of entities in TIVA-KG
entities covering the corresponding modality data samples in each modality
Audio 103,580 359,465
Image 340,225 1,695,688
Video 239,566 1,112,918
Statistics of triplets in TIVA-KG
triplets covering the corresponding modality data samples in each modality
Audio 30,169 93,521
Image 223,998 1,117,389
Video 194,037 927,029

# View (part of) the Graph

Here we provide a small subgraph that you can view online.

# Get the Data

https://pan.baidu.com/s/1ozkvjT8I8LEBd-J71Czc7g?pwd=jpce (opens new window)

# Publications

https://dl.acm.org/doi/abs/10.1145/3581783.3612266 (opens new window)

# Support and Discussion

Coming soon…