本文主要是介绍[机器学习]三行代码快速划分交叉训练中训练集和验证集,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
使用numpy.random.choice()和set()快速划分交叉训练数据集
之前在划分训练集和验证集时,都是手工随机生成index,很笨。
学到的新方法如下:
import numpy as np
# 正态分布生成原始数据
x = np.random.random.normal(1,0.1,100)
# 按8:2分割数据
x_train_index = np.random.choice(len(x),round(len(x)*0.8),replace = False)
x_valid_index = np.array(list(set(range(len(x))) - set(x_train_index)))x_train = x[x_train_index]
x_valid = x[x_valid_index]
总结1: np.random.choice()
Definition : choice(a, size=None, replace=True, p=None)
Type : Function of None module
Parameters
a : 1-D array-like or int
If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a was np.arange(n)
size : int or tuple of ints, optional
Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.
replace : boolean, optional
Whether the sample is with or without replacement
是否包含重复元素
p : 1-D array-like, optional
The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.
按什么概率分布选取元素,默认是均匀分布
Returns
samples : 1-D ndarray, shape (size,)
The generated random samples
总结2: set()
Python的集合(set)和其他语言类似, 是一个无序不重复元素集, 基本功能包括关系测试和消除重复元素.
总结3: batch training
batch training 一样可以使用这种方法选取数据
batch_size = 25
for epoch in range(100):rand_index = np.random.choice(len(x_train), size = batch_size)rand_x = x_train[rand_index]rand_y = y_train[rand_index]...
这篇关于[机器学习]三行代码快速划分交叉训练中训练集和验证集的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!