Theme NexT works best with JavaScript enabled
0%

数据集划分方法总结

^ _ ^

Condition1

提前将数据集划分为 train, eval, test 三个数据集保存在不同的文件中, 然后使用时分别读取即可.

Condition2

加载全部数据集, 然后将其进行划分.

Method1

1
2
3
4
5
6
7
train_ratio, val_ratio = 0.8, 0.1
train_nums = int(data.shape[0]*train_ratio)
val_nums = int(data.shape[0]*val_ratio)

train_data = data[:train_nums]
val_data = data[train_nums: train_nums+val_nums]
test_data = data[train_nums+val_nums:]

Method2

1
trian_dataset, test_dataset = torch.utils.data.random_split(dataset, [train_size, val_size, test_isze])

Method3

1
2
3
4
5
6
7
8
9
10
11
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

indices = list(range(dataset_size))
np.random.shuffle(indices)

trian_indices, val_indices = indices[:train_num], indices[train_num:]
train_sampler = RandomSampler(trian_indices)
val_sampler = SequentialSampler(val_indices)

trian_loader = DataLoader(dataset, batch_size=64, sampler=train_sampler)
val_loader = DataLoader(dataset, batch_size=64, sampler=val_sampler)