# Project Overview
TIVA-KG is a multimodal commonsense knowledge graph featuring multiple modalities and triplet grounding.
Multiple modalities: Most multimodal KGs focus on image modality, with only a few exceptions that consider other modalities INSTEAD OF images. However, it is time to bring them all together. TIVA-KG adds images, videos and audio clips to the basic topology which contains text as well.
Triplet grounding: When aligning data to "run", we tend to get images about humans running, while for "dog", images about dogs sitting or standing. How can we exactly get images about running dogs? Align data to the triplet consisting of "dog", "run" and the relation from "dog" to "run": "dog CapableOf run". This simple example shows the advantagous expressing ability granted by triplet grounding.
Statistics TIVA-KG consists of 440K entities and 1.3M triples, i.e., 443,580 entities and 1,382,358 triples, with every entity reachable from each other to ensure good connectivity. We provide detailed statistics below:
entities covering the corresponding modality | data samples in each modality | |
---|---|---|
Audio | 103,580 | 359,465 |
Image | 340,225 | 1,695,688 |
Video | 239,566 | 1,112,918 |
triplets covering the corresponding modality | data samples in each modality | |
---|---|---|
Audio | 30,169 | 93,521 |
Image | 223,998 | 1,117,389 |
Video | 194,037 | 927,029 |
# View (part of) the Graph
Here we provide a small subgraph that you can view online.
# Get the Data
https://pan.baidu.com/s/1ozkvjT8I8LEBd-J71Czc7g?pwd=jpce (opens new window)
# Publications
https://dl.acm.org/doi/abs/10.1145/3581783.3612266 (opens new window)
# Support and Discussion
Coming soon…