nvidia-rapids︱cuML机器学习加速库

2023-12-21 03:08

本文主要是介绍nvidia-rapids︱cuML机器学习加速库,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

cuML是一套用于实现与其他RAPIDS项目共享兼容API的机器学习算法和数学原语函数。

cuML使数据科学家、研究人员和软件工程师能够在GPU上运行传统的表格ML任务,而无需深入了解CUDA编程的细节。 在大多数情况下,cuML的Python API与来自scikit-learn的API相匹配。

对于大型数据集,这些基于GPU的实现可以比其CPU等效完成10-50倍。 有关性能的详细信息,请参阅cuML基准测试笔记本。

官方文档:
rapidsai/cuml
cuML API Reference

官方案例还是蛮多的:

在这里插入图片描述

来看看有啥模型:

在这里插入图片描述
在这里插入图片描述

关联文章:

nvidia-rapids︱cuDF与pandas一样的DataFrame库
NVIDIA的python-GPU算法生态 ︱ RAPIDS 0.10
nvidia-rapids︱cuML机器学习加速库
nvidia-rapids︱cuGraph(NetworkX-like)关系图模型


文章目录

  • 1 安装与背景
    • 1.1 安装
    • 1.2 背景
  • 2 DBSCAN
  • 3 TSNE算法在Fashion MNIST的使用
  • 4 XGBoosting
  • 5 利用KNN进行图像检索


1 安装与背景

1.1 安装

参考:https://github.com/rapidsai/cuml/blob/branch-0.13/BUILD.md

conda env create -n cuml_dev python=3.7 --file=conda/environments/cuml_dev_cuda10.0.yml

docker版本,可参考:https://rapids.ai/start.html#prerequisites

在这里插入图片描述

docker pull rapidsai/rapidsai:cuda10.1-runtime-ubuntu16.04-py3.7
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \rapidsai/rapidsai:cuda10.1-runtime-ubuntu16.04-py3.7

1.2 背景

不仅是训练,要想真正在GPU上扩展数据科学,也需要加速端到端的应用程序。cuML 0.9 为我们带来了基于GPU的树模型支持的下一个发展,包括新的森林推理库(FIL)。FIL是一个轻量级的GPU加速引擎,它对基于树形模型进行推理,包括梯度增强决策树和随机森林。使用单个V100 GPU和两行Python代码,用户就可以加载一个已保存的XGBoost或LightGBM模型,并对新数据执行推理,速度比双20核CPU节点快36倍。在开源Treelite软件包的基础上,下一个版本的FIL还将添加对scikit-learn和cuML随机森林模型的支持。
在这里插入图片描述
图3:推理速度对比,XGBoost CPU vs 森林推理库 (FIL) GPU

在这里插入图片描述


2 DBSCAN

The DBSCAN algorithm is a clustering algorithm that works really well for datasets that have regions of high density.

The model can take array-like objects, either in host as NumPy arrays or in device (as Numba or cuda_array_interface-compliant), as well as cuDF DataFrames.

import cudf
import matplotlib.pyplot as plt
import numpy as np
from cuml.datasets import make_blobs
from cuml.cluster import DBSCAN as cuDBSCAN
from sklearn.cluster import DBSCAN as skDBSCAN
from sklearn.metrics import adjusted_rand_score%matplotlib inline# 定义参数
n_samples = 10**4
n_features = 2eps = 0.15
min_samples = 3
random_state = 23#Generate Data
%%time
device_data, device_labels = make_blobs(n_samples=n_samples, n_features=n_features,centers=5,cluster_std=0.1,random_state=random_state)device_data = cudf.DataFrame.from_gpu_matrix(device_data)
device_labels = cudf.Series(device_labels)
# Copy dataset from GPU memory to host memory.
# This is done to later compare CPU and GPU results.
host_data = device_data.to_pandas()
host_labels = device_labels.to_pandas()# sklearn 模型拟合
%%time
clustering_sk = skDBSCAN(eps=eps,min_samples=min_samples,algorithm="brute",n_jobs=-1)clustering_sk.fit(host_data)# cuML 模型拟合
%%time
clustering_cuml = cuDBSCAN(eps=eps,min_samples=min_samples,verbose=True,max_mbytes_per_batch=13e3)clustering_cuml.fit(device_data, out_dtype="int32")# 可视化
fig = plt.figure(figsize=(16, 10))X = np.array(host_data)
labels = clustering_cuml.labels_n_clusters_ = len(labels)# Black removed and is used for noise instead.
unique_labels = labels.unique()
colors = [plt.cm.Spectral(each)for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):if k == -1:# Black used for noise.col = [0, 0, 0, 1]class_member_mask = (labels == k)xy = X[class_member_mask]plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),markersize=5, markeredgecolor=tuple(col))plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()

在这里插入图片描述

结果评估:

%%time
sk_score = adjusted_rand_score(host_labels, clustering_sk.labels_)
cuml_score = adjusted_rand_score(host_labels, clustering_cuml.labels_)>>> (0.9998750031236718, 0.9998750031236718)

两个结果是一模一样的,也就是skearn和cuML的结果一致。


3 TSNE算法在Fashion MNIST的使用

TSNE (T-Distributed Stochastic Neighborhood Embedding) is a fantastic dimensionality reduction algorithm used to visualize large complex datasets including medical scans, neural network weights, gene expressions and much more.

cuML’s TSNE algorithm supports both the faster Barnes Hut $ n logn $ algorithm and also the slower Exact $ n^2 $ .

The model can take array-like objects, either in host as NumPy arrays as well as cuDF DataFrames as the input.

import gzip
import matplotlib.pyplot as plt
import numpy as np
import os
from cuml.manifold import TSNE%matplotlib inline# https://github.com/zalandoresearch/fashion-mnist/blob/master/utils/mnist_reader.py
def load_mnist_train(path):"""Load MNIST data from path"""labels_path = os.path.join(path, 'train-labels-idx1-ubyte.gz')images_path = os.path.join(path, 'train-images-idx3-ubyte.gz')with gzip.open(labels_path, 'rb') as lbpath:labels = np.frombuffer(lbpath.read(), dtype=np.uint8,offset=8)with gzip.open(images_path, 'rb') as imgpath:images = np.frombuffer(imgpath.read(), dtype=np.uint8,offset=16).reshape(len(labels), 784)return images, labels# 加载数据
images, labels = load_mnist_train("data/fashion")plt.figure(figsize=(5,5))
plt.imshow(images[100].reshape((28, 28)), cmap = 'gray')

在这里插入图片描述

# 建模
tsne = TSNE(n_components = 2, method = 'barnes_hut', random_state=23)
%time embedding = tsne.fit_transform(images)print(embedding[:10], embedding.shape)CPU times: user 2.41 s, sys: 2.57 s, total: 4.98 s
Wall time: 4.98 s
[[-13.577632    39.87483   ][ 26.136728   -17.68164   ][ 23.164072    22.151243  ][ 28.361032    11.134571  ][ 35.419216     5.6633983 ][ -0.15575314 -11.143476  ][-24.30308     -1.584903  ][ -5.9438944  -27.522072  ][  2.0439444   29.574451  ][ -3.0801039   27.079374  ]] (60000, 2)

可视化Visualize Embedding:

# Visualize Embeddingclasses = ['T-shirt/top','Trouser','Pullover','Dress','Coat','Sandal','Shirt','Sneaker','Bag','Ankle boot'
]fig, ax = plt.subplots(1, figsize = (14, 10))
plt.scatter(embedding[:,1], embedding[:,0], s = 0.3, c = labels, cmap = 'Spectral')
plt.setp(ax, xticks = [], yticks = [])
cbar = plt.colorbar(boundaries = np.arange(11)-0.5)
cbar.set_ticks(np.arange(10))
cbar.set_ticklabels(classes)
plt.title('Fashion MNIST Embedded via TSNE');

在这里插入图片描述


4 XGBoosting


import numpy as np; print('numpy Version:', np.__version__)
import pandas as pd; print('pandas Version:', pd.__version__)
import xgboost as xgb; print('XGBoost Version:', xgb.__version__)# helper function for simulating data
def simulate_data(m, n, k=2, numerical=False):if numerical:features = np.random.rand(m, n)else:features = np.random.randint(2, size=(m, n))labels = np.random.randint(k, size=m)return np.c_[labels, features].astype(np.float32)# helper function for loading data
def load_data(filename, n_rows):if n_rows >= 1e9:df = pd.read_csv(filename)else:df = pd.read_csv(filename, nrows=n_rows)return df.values.astype(np.float32)# settings
LOAD = False
n_rows = int(1e5)
n_columns = int(100)
n_categories = 2# 加载数据
%%timeif LOAD:dataset = load_data('/tmp', n_rows)
else:dataset = simulate_data(n_rows, n_columns, n_categories)
print(dataset.shape)# 训练集切分
# identify shape and indices
n_rows, n_columns = dataset.shape
train_size = 0.80
train_index = int(n_rows * train_size)# split X, y
X, y = dataset[:, 1:], dataset[:, 0]
del dataset# split train data
X_train, y_train = X[:train_index, :], y[:train_index]# split validation data
X_validation, y_validation = X[train_index:, :], y[train_index:]# 检验
# check dimensions
print('X_train: ', X_train.shape, X_train.dtype, 'y_train: ', y_train.shape, y_train.dtype)
print('X_validation', X_validation.shape, X_validation.dtype, 'y_validation: ', y_validation.shape, y_validation.dtype)# check the proportions
total = X_train.shape[0] + X_validation.shape[0]
print('X_train proportion:', X_train.shape[0] / total)
print('X_validation proportion:', X_validation.shape[0] / total)# Convert NumPy data to DMatrix format
%%timedtrain = xgb.DMatrix(X_train, label=y_train)
dvalidation = xgb.DMatrix(X_validation, label=y_validation)# 设置参数
# instantiate params
params = {}# general params
general_params = {'silent': 1}
params.update(general_params)# booster params
n_gpus = 1
booster_params = {}if n_gpus != 0:booster_params['tree_method'] = 'gpu_hist'booster_params['n_gpus'] = n_gpus
params.update(booster_params)# learning task params
learning_task_params = {'eval_metric': 'auc', 'objective': 'binary:logistic'}
params.update(learning_task_params)
print(params)# 模型训练
# model training settings
evallist = [(dvalidation, 'validation'), (dtrain, 'train')]
num_round = 10%%timebst = xgb.train(params, dtrain, num_round, evallist)

输出:

[0]	validation-auc:0.504014	train-auc:0.542211
[1]	validation-auc:0.506166	train-auc:0.559262
[2]	validation-auc:0.501638	train-auc:0.570375
[3]	validation-auc:0.50275	train-auc:0.580726
[4]	validation-auc:0.503445	train-auc:0.589701
[5]	validation-auc:0.503413	train-auc:0.598342
[6]	validation-auc:0.504258	train-auc:0.605253
[7]	validation-auc:0.503157	train-auc:0.611937
[8]	validation-auc:0.502372	train-auc:0.617561
[9]	validation-auc:0.501949	train-auc:0.62333
CPU times: user 1.12 s, sys: 195 ms, total: 1.31 s
Wall time: 360 ms

相关参考:

  • Open Source Website
  • GitHub
  • Press Release
  • NVIDIA Blog
  • Developer Blog
  • NVIDIA Data Science Webpage

5 利用KNN进行图像检索

参考:在GPU实例上使用RAPIDS加速图像搜索任务

阿里云文档中有专门的介绍,所以不做太多赘述。
使用开源框架Tensorflow和Keras提取图片特征,其中模型为基于ImageNet数据集的ResNet50(notop)预训练模型。
连接公网下载模型(大小约91M),下载完成后默认保存到/root/.keras/models/目录

数据下载:

import os
import tarfile
import numpy as np
from urllib.request import urlretrievedef download_and_extract(data_dir):"""doc"""def _progress(count, block_size, total_size):print('\r>>> Downloading %s  (total:%.0fM) %.1f%%' % (filename, total_size / 1024 / 1024, 100.0 * count * block_size / total_size), end='')url = 'http://ai.stanford.edu/~acoates/stl10/stl10_binary.tar.gz'filename = url.split('/')[-1]filepath = os.path.join(data_dir, filename)decom_dir = os.path.join(data_dir, filename.split('.')[0])if not os.path.exists(data_dir):os.makedirs(data_dir)if os.path.exists(filepath):print('>>> {} has exist in current directory.'.format(filename))else:urlretrieve(url, filepath, _progress)print("\nSuccessfully downloaded.")if not os.path.exists(decom_dir):# Decompressprint(">>> Decompressing from {}....".format(filepath))tar = tarfile.open(filepath, 'r')tar.extractall(data_dir)print("Successfully decompressed")tar.close()else:print('>>> Directory "{}" has exist. '.format(decom_dir))def read_all_images(path_to_data):"""get all images from binary path"""with open(path_to_data, 'rb') as f:everything = np.fromfile(f, dtype=np.uint8)images = np.reshape(everything, (-1, 3, 96, 96))images = np.transpose(images, (0, 3, 2, 1))return images# the directory to save data
data_dir = './data'
# download and decompression
download_and_extract(data_dir)# 读入数据
# the path of unlabeled data
path_unlabeled = os.path.join(data_dir, 'stl10_binary/unlabeled_X.bin')
# get images from binary
images = read_all_images(path_unlabeled)
print('>>> images shape: ', images.shape)# 看图
import random
import matplotlib.pyplot as plt
%matplotlib inlinedef show_image(image):"""show image"""fig = plt.figure(figsize=(3, 3))plt.imshow(image)plt.show()fig.clear()# random show a image
rand_image_index = random.randint(0, images.shape[0])
show_image(images[rand_image_index])

在这里插入图片描述

# 分割数据
from sklearn.model_selection import train_test_splittrain_images, query_images = train_test_split(images, test_size=0.1, random_state=123)
print('train_images shape: ', train_images.shape)
print('query_images shape: ', query_images.shape)# 图片特征
# set tensorflow params to adjust GPU memory usage, if use default params, tensorflow would use
# nearly all of the gpu memory, we need reserve some gpu memory for cuml.
import os
# only use device 0
os.environ["CUDA_VISIBLE_DEVICES"] = "0"import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
# method 1: allocate gpu memory base on runtime allocations
# config.gpu_options.allow_growth = True
# method 2: determines the fraction of the onerall amount of memory 
# that each visibel GPU should be allocated.
config.gpu_options.per_process_gpu_memory_fraction = 0.3
set_session(tf.Session(config=config))# 特征抽取
from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input# download resnet50(notop) model(first running) and load model
model = ResNet50(weights='imagenet', include_top=False, input_shape=(96, 96, 3), pooling='max')
# network summary
model.summary()%%time
train_features = model.predict(train_images)
print('train features shape: ', train_features.shape)%%time
query_features = model.predict(query_images)
print('query features shape: ', query_features.shape)

然后是KNN阶段,包括了sklear-KNN,和CUML-KNN:

from cuml.neighbors import NearestNeighbors%%time
knn_cuml = NearestNeighbors()
knn_cuml.fit(train_features)%%time
distances_cuml, indices_cuml = knn_cuml.kneighbors(query_features, k=3)from sklearn.neighbors import NearestNeighbors
%%time
knn_sk = NearestNeighbors(n_neighbors=3, metric='sqeuclidean', n_jobs=-1)
knn_sk.fit(train_features)%%time
distances_sk, indices_sk = knn_sk.kneighbors(query_features, 3)
# compare the distance obtained while using sklearn and cuml models
(np.abs(distances_cuml - distances_sk) < 1).all()# 展示结果
def show_images(query, sim_images, sim_dists):"""doc"""simi_num = len(sim_images)fig = plt.figure(figsize=(3 * (simi_num + 1), 3))axes = fig.subplots(1, simi_num + 1)for index, ax in enumerate(axes):if index == 0:ax.imshow(query)ax.set_title('query')else:ax.imshow(sim_images[index - 1])ax.set_title('dist: %.1f' % (sim_dists[index - 1]))plt.show()fig.clear()# get random indices
random_show_index = np.random.randint(0, query_images.shape[0], size=5)
random_query = query_images[random_show_index]
random_indices = indices_cuml[random_show_index].astype(np.int)
random_distances = distances_cuml[random_show_index]# show result images
for query_image, sim_indices, sim_dists in zip(random_query, random_indices, random_distances):sim_images = train_images[sim_indices]show_images(query_image, sim_images, sim_dists)

在这里插入图片描述


用到后再追加..

这篇关于nvidia-rapids︱cuML机器学习加速库的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/518448

相关文章

Python使用国内镜像加速pip安装的方法讲解

《Python使用国内镜像加速pip安装的方法讲解》在Python开发中,pip是一个非常重要的工具,用于安装和管理Python的第三方库,然而,在国内使用pip安装依赖时,往往会因为网络问题而导致速... 目录一、pip 工具简介1. 什么是 pip?2. 什么是 -i 参数?二、国内镜像源的选择三、如何

Java深度学习库DJL实现Python的NumPy方式

《Java深度学习库DJL实现Python的NumPy方式》本文介绍了DJL库的背景和基本功能,包括NDArray的创建、数学运算、数据获取和设置等,同时,还展示了如何使用NDArray进行数据预处理... 目录1 NDArray 的背景介绍1.1 架构2 JavaDJL使用2.1 安装DJL2.2 基本操

HarmonyOS学习(七)——UI(五)常用布局总结

自适应布局 1.1、线性布局(LinearLayout) 通过线性容器Row和Column实现线性布局。Column容器内的子组件按照垂直方向排列,Row组件中的子组件按照水平方向排列。 属性说明space通过space参数设置主轴上子组件的间距,达到各子组件在排列上的等间距效果alignItems设置子组件在交叉轴上的对齐方式,且在各类尺寸屏幕上表现一致,其中交叉轴为垂直时,取值为Vert

Ilya-AI分享的他在OpenAI学习到的15个提示工程技巧

Ilya(不是本人,claude AI)在社交媒体上分享了他在OpenAI学习到的15个Prompt撰写技巧。 以下是详细的内容: 提示精确化:在编写提示时,力求表达清晰准确。清楚地阐述任务需求和概念定义至关重要。例:不用"分析文本",而用"判断这段话的情感倾向:积极、消极还是中性"。 快速迭代:善于快速连续调整提示。熟练的提示工程师能够灵活地进行多轮优化。例:从"总结文章"到"用

【前端学习】AntV G6-08 深入图形与图形分组、自定义节点、节点动画(下)

【课程链接】 AntV G6:深入图形与图形分组、自定义节点、节点动画(下)_哔哩哔哩_bilibili 本章十吾老师讲解了一个复杂的自定义节点中,应该怎样去计算和绘制图形,如何给一个图形制作不间断的动画,以及在鼠标事件之后产生动画。(有点难,需要好好理解) <!DOCTYPE html><html><head><meta charset="UTF-8"><title>06

学习hash总结

2014/1/29/   最近刚开始学hash,名字很陌生,但是hash的思想却很熟悉,以前早就做过此类的题,但是不知道这就是hash思想而已,说白了hash就是一个映射,往往灵活利用数组的下标来实现算法,hash的作用:1、判重;2、统计次数;

零基础学习Redis(10) -- zset类型命令使用

zset是有序集合,内部除了存储元素外,还会存储一个score,存储在zset中的元素会按照score的大小升序排列,不同元素的score可以重复,score相同的元素会按照元素的字典序排列。 1. zset常用命令 1.1 zadd  zadd key [NX | XX] [GT | LT]   [CH] [INCR] score member [score member ...]

【机器学习】高斯过程的基本概念和应用领域以及在python中的实例

引言 高斯过程(Gaussian Process,简称GP)是一种概率模型,用于描述一组随机变量的联合概率分布,其中任何一个有限维度的子集都具有高斯分布 文章目录 引言一、高斯过程1.1 基本定义1.1.1 随机过程1.1.2 高斯分布 1.2 高斯过程的特性1.2.1 联合高斯性1.2.2 均值函数1.2.3 协方差函数(或核函数) 1.3 核函数1.4 高斯过程回归(Gauss

【学习笔记】 陈强-机器学习-Python-Ch15 人工神经网络(1)sklearn

系列文章目录 监督学习:参数方法 【学习笔记】 陈强-机器学习-Python-Ch4 线性回归 【学习笔记】 陈强-机器学习-Python-Ch5 逻辑回归 【课后题练习】 陈强-机器学习-Python-Ch5 逻辑回归(SAheart.csv) 【学习笔记】 陈强-机器学习-Python-Ch6 多项逻辑回归 【学习笔记 及 课后题练习】 陈强-机器学习-Python-Ch7 判别分析 【学

系统架构师考试学习笔记第三篇——架构设计高级知识(20)通信系统架构设计理论与实践

本章知识考点:         第20课时主要学习通信系统架构设计的理论和工作中的实践。根据新版考试大纲,本课时知识点会涉及案例分析题(25分),而在历年考试中,案例题对该部分内容的考查并不多,虽在综合知识选择题目中经常考查,但分值也不高。本课时内容侧重于对知识点的记忆和理解,按照以往的出题规律,通信系统架构设计基础知识点多来源于教材内的基础网络设备、网络架构和教材外最新时事热点技术。本课时知识