[机器学习]三行代码快速划分交叉训练中训练集和验证集

本文主要是介绍[机器学习]三行代码快速划分交叉训练中训练集和验证集，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

使用numpy.random.choice()和set()快速划分交叉训练数据集

之前在划分训练集和验证集时，都是手工随机生成index，很笨。

学到的新方法如下：

import numpy as np
# 正态分布生成原始数据
x = np.random.random.normal(1,0.1,100)
# 按8:2分割数据
x_train_index = np.random.choice(len(x),round(len(x)*0.8),replace = False)
x_valid_index = np.array(list(set(range(len(x))) - set(x_train_index)))x_train = x[x_train_index]
x_valid = x[x_valid_index]

总结1: np.random.choice()

Definition : choice(a, size=None, replace=True, p=None)

Type : Function of None module

Parameters
a : 1-D array-like or int
If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a was np.arange(n)
size : int or tuple of ints, optional
Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.
replace : boolean, optional
Whether the sample is with or without replacement
是否包含重复元素
p : 1-D array-like, optional
The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.
按什么概率分布选取元素，默认是均匀分布

Returns
samples : 1-D ndarray, shape (size,)
The generated random samples

总结2: set()

Python的集合(set)和其他语言类似, 是一个无序不重复元素集, 基本功能包括关系测试和消除重复元素.

总结3: batch training

batch training 一样可以使用这种方法选取数据

batch_size = 25
for epoch in range(100):rand_index = np.random.choice(len(x_train), size = batch_size)rand_x = x_train[rand_index]rand_y = y_train[rand_index]...

这篇关于[机器学习]三行代码快速划分交叉训练中训练集和验证集的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！