.. _dataset_cn: AutoGL 数据集 ============== 我们基于PyTorch-Geometric (PyG),Deep Graph Learning (DGL)及Open Graph Benchmark (OGB)等图学习库提供了多种多样的常用数据集。 同时,用户可以使用AutoGL所提供的统一静态图容器``GeneralStaticGraph``自定义静态同构图及异构图,例如: .. code-block:: python from autogl.data.graph import GeneralStaticGraph, GeneralStaticGraphGenerator ''' 创建同构图 ''' custom_static_homogeneous_graph = GeneralStaticGraphGenerator.create_homogeneous_static_graph( {'x': torch.rand(2708, 3), 'y': torch.rand(2708, 1)}, torch.randint(0, 1024, (2, 10556)) ) ''' 创建异构图 ''' custom_static_heterogeneous_graph = GeneralStaticGraphGenerator.create_heterogeneous_static_graph( { 'author': {'x': torch.rand(1024, 3), 'y': torch.rand(1024, 1)}, 'paper': {'feat': torch.rand(2048, 10), 'z': torch.rand(2048, 13)} }, { ('author', 'writing', 'paper'): (torch.randint(0, 1024, (2, 5120)), torch.rand(5120, 10)), ('author', 'reading', 'paper'): torch.randint(0, 1024, (2, 3840)), } ) 提供的常用数据集 ---------------- AutoGL目前提供如下多种常用基准数据集: 半监督节点分类: +------------------+------------+-----------+--------------------------------+ | 数据集 | PyG | DGL | 默认train/val/test划分 | +==================+============+===========+================================+ | Cora | ✓ | ✓ | ✓ | +------------------+------------+-----------+--------------------------------+ | Citeseer | ✓ | ✓ | ✓ | +------------------+------------+-----------+--------------------------------+ | Pubmed | ✓ | ✓ | ✓ | +------------------+------------+-----------+--------------------------------+ | Amazon Computers | ✓ | ✓ | | +------------------+------------+-----------+--------------------------------+ | Amazon Photo | ✓ | ✓ | | +------------------+------------+-----------+--------------------------------+ | Coauthor CS | ✓ | ✓ | | +------------------+------------+-----------+--------------------------------+ | Coauthor Physics | ✓ | ✓ | | +------------------+------------+-----------+--------------------------------+ | Reddit | ✓ | ✓ | ✓ | +------------------+------------+-----------+--------------------------------+ | ogbn-products | ✓ | ✓ | ✓ | +------------------+------------+-----------+--------------------------------+ | ogbn-proteins | ✓ | ✓ | ✓ | +------------------+------------+-----------+--------------------------------+ | ogbn-arxiv | ✓ | ✓ | ✓ | +------------------+------------+-----------+--------------------------------+ | ogbn-papers100M | ✓ | ✓ | ✓ | +------------------+------------+-----------+--------------------------------+ 图分类任务: MUTAG, IMDB-Binary, IMDB-Multi, PROTEINS, COLLAB等 +-------------+------------+------------+--------------+------------+--------------------+ | 数据集 | PyG | DGL | 节点特征 | 标签 | 边特征 | +=============+============+============+==============+============+====================+ | MUTAG | ✓ | ✓ | ✓ | ✓ | ✓ | +-------------+------------+------------+--------------+------------+--------------------+ | IMDB-Binary | ✓ | ✓ | | ✓ | | +-------------+------------+------------+--------------+------------+--------------------+ | IMDB-Multi | ✓ | ✓ | | ✓ | | +-------------+------------+------------+--------------+------------+--------------------+ | PROTEINS | ✓ | ✓ | ✓ | ✓ | | +-------------+------------+------------+--------------+------------+--------------------+ | COLLAB | ✓ | ✓ | | ✓ | | +-------------+------------+------------+--------------+------------+--------------------+ | ogbg-molhiv | ✓ | ✓ | ✓ | ✓ | ✓ | +-------------+------------+------------+--------------+------------+--------------------+ | ogbg-molpcba| ✓ | ✓ | ✓ | ✓ | ✓ | +-------------+------------+------------+--------------+------------+--------------------+ | ogbg-ppa | ✓ | ✓ | | ✓ | ✓ | +-------------+------------+------------+--------------+------------+--------------------+ | ogbg-code2 | ✓ | ✓ | ✓ | ✓ | ✓ | +-------------+------------+------------+--------------+------------+--------------------+ 链接预测任务:目前AutoGL可以使用针对节点分类任务的多种图数据进行自动链接预测。 通过GeneralStaticGraph序列构建自定义数据集 ---------------------------------------------------------------- 如下代码片段展示了通过一个由``GeneralStaticGraph``序列构建自定义数据集的方法。 .. code-block:: python from autogl.data import InMemoryDataset ''' graphs变量是一个由GeneralStaticGraph实例所构成的序列 ''' graphs = [ ... ] custom_dataset = InMemoryDataset(graphs)