在接下来的两个部分中,将逐步介绍如何实现平方对数误差(Squared Log Error,SLE
1 2 [ log ( p r e d + 1 ) − log ( l a b e l + 1 ) ] 2 \frac{1}{2}[\log(pred + 1) - \log(label + 1)]^2 21[log(pred+1)−log(label+1)]2
以及它的默认评估指标均方根对数误差(Root Mean Squared Log Error,RMSLE
1 N [ log ( p r e d + 1 ) − log ( l a b e l + 1 ) ] 2 \sqrt{\frac{1}{N}[\log(pred + 1) - \log(label + 1)]^2} N1[log(pred+1)−log(label+1)]2
g = ∂ o b j e c t i v e ∂ p r e d = log ( p r e d + 1 ) − log ( l a b e l + 1 ) p r e d + 1 g = \frac{\partial{objective}}{\partial{pred}} = \frac{\log(pred + 1) - \log(label + 1)}{pred + 1} g=∂pred∂objective=pred+1log(pred+1)−log(label+1)
以及 hessian(目标的二阶导数):
h = ∂ 2 o b j e c t i v e ∂ p r e d = − log ( p r e d + 1 ) + log ( l a b e l + 1 ) + 1 ( p r e d + 1 ) 2 h = \frac{\partial^2{objective}}{\partial{pred}} = \frac{ - \log(pred + 1) + \log(label + 1) + 1}{(pred + 1)^2} h=∂pred∂2objective=(pred+1)2−log(pred+1)+log(label+1)+1
import numpy as np
import xgboost as xgb
from typing import Tupledef gradient(predt: np.ndarray, dtrain: xgb.DMatrix) -> np.ndarray:'''Compute the gradient squared log error.'''y = dtrain.get_label()return (np.log1p(predt)-np.log1p(y)) / (predt+1)def hessian(predt: np.ndarray, dtrain: xgb.DMatrix) -> np.ndarray:'''Compute the hessian for squared log error.'''y = dtrain.get_label()return ((-np.log1p(predt)+np.log1p(y)+1) /np.power(predt+1, 2))def squared_log(predt: np.ndarray,dtrain: xgb.DMatrix) -> Tuple[np.ndarray, np.ndarray]:'''Squared Log Error objective. A simplified version for RMSLE used asobjective function.'''predt[predt < -1] = -1 + 1e-6grad = gradient(predt, dtrain)hess = hessian(predt, dtrain)return grad, hess
xgb.train({'tree_method': 'hist', 'seed': 1994}, # any other tree method is fine.dtrain=dtrain,num_boost_round=10,obj=squared_log)
的默认度量标准是 RMSLE
def rmsle(predt: np.ndarray, dtrain: xgb.DMatrix) -> Tuple[str, float]:''' Root mean squared log error metric.'''y = dtrain.get_label()predt[predt < -1] = -1 + 1e-6elements = np.power(np.log1p(y) - np.log1p(predt), 2)return 'PyRMSLE', float(np.sqrt(np.sum(elements) / len(y)))
与目标函数类似,度量也接受 predt
和 dtrain
作为输入,但返回度量本身的名称和一个浮点值作为结果。将其作为 custom_metric
参数传递给 XGBoost:
xgb.train({'tree_method': 'hist', 'seed': 1994,'disable_default_eval_metric': 1},dtrain=dtrain,num_boost_round=10,obj=squared_log,custom_metric=rmsle,evals=[(dtrain, 'dtrain'), (dtest, 'dtest')],evals_result=results)
能够看到 XGBoost 打印如下内容:
[0] dtrain-PyRMSLE:1.37153 dtest-PyRMSLE:1.31487
[1] dtrain-PyRMSLE:1.26619 dtest-PyRMSLE:1.20899
[2] dtrain-PyRMSLE:1.17508 dtest-PyRMSLE:1.11629
[3] dtrain-PyRMSLE:1.09836 dtest-PyRMSLE:1.03871
[4] dtrain-PyRMSLE:1.03557 dtest-PyRMSLE:0.977186
[5] dtrain-PyRMSLE:0.985783 dtest-PyRMSLE:0.93057
注意,参数 disable_default_eval_metric
用于禁用 XGBoost 中的默认度量。
在使用内置目标函数时,原始预测值会根据目标函数进行转换。当提供自定义目标函数时,XGBoost 不知道其链接函数,因此用户需要对目标和自定义评估度量进行转换。对于具有身份链接的目标,如平方误差squared error
在 Python 包中,可以通过 predict
函数中的 output_margin
参数来控制预测的行为。当使用 custom_metric
参数而没有自定义目标函数时,度量函数将接收经过转换的预测,因为目标是由 XGBoost 定义的。然而,当同时提供自定义目标和度量时,目标和自定义度量都将接收原始预测。以下示例比较了多类分类模型中两种不同的行为。首先,我们定义了两个不同的 Python 度量函数,实现了相同的底层度量以进行比较。其中 merror_with_transform
在同时使用自定义目标时使用,否则会使用更简单的 merror
,因为 XGBoost 可以自行执行转换。
import xgboost as xgb
import numpy as npdef merror_with_transform(predt: np.ndarray, dtrain: xgb.DMatrix):"""Used when custom objective is supplied."""y = dtrain.get_label()n_classes = predt.size // y.shape[0]# Like custom objective, the predt is untransformed leaf weight when custom objective# is provided.# With the use of `custom_metric` parameter in train function, custom metric receives# raw input only when custom objective is also being used. Otherwise custom metric# will receive transformed prediction.assert predt.shape == (d_train.num_row(), n_classes)out = np.zeros(dtrain.num_row())for r in range(predt.shape[0]):i = np.argmax(predt[r])out[r] = iassert y.shape == out.shapeerrors = np.zeros(dtrain.num_row())errors[y != out] = 1.0return 'PyMError', np.sum(errors) / dtrain.num_row()
仅当想要使用自定义目标并且 XGBoost 不知道如何转换预测时才需要上述函数。多类误差函数的正常实现是:
def merror(predt: np.ndarray, dtrain: xgb.DMatrix):"""Used when there's no custom objective."""# No need to do transform, XGBoost handles it internally.errors = np.zeros(dtrain.num_row())errors[y != out] = 1.0return 'PyMError', np.sum(errors) / dtrain.num_row()
接下来需要自定义 softprob 目标:
def softprob_obj(predt: np.ndarray, data: xgb.DMatrix):"""Loss function. Computing the gradient and approximated hessian (diagonal).Reimplements the `multi:softprob` inside XGBoost."""# Full implementation is available in the Python demo script linked below...return grad, hess
最后可以使用 obj
和 custom_metric
Xy = xgb.DMatrix(X, y)
booster = xgb.train({"num_class": kClasses, "disable_default_eval_metric": True},m,num_boost_round=kRounds,obj=softprob_obj,custom_metric=merror_with_transform,evals_result=custom_results,evals=[(m, "train")],
booster = xgb.train({"num_class": kClasses,"disable_default_eval_metric": True,"objective": "multi:softmax",},m,num_boost_round=kRounds,# Use a simpler metric implementation.custom_metric=merror,evals_result=custom_results,evals=[(m, "train")],
时,输出预测数组的形状是(n_samples, n_classes)
,它是(n_samples, )
Scikit-Learn 接口
from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_absolute_errorX, y = load_diabetes(return_X_y=True)
reg = xgb.XGBRegressor(tree_method="hist",eval_metric=mean_absolute_error,
reg.fit(X, y, eval_set=[(X, y)])
def softprob_obj(labels: np.ndarray, predt: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:rows = labels.shape[0]classes = predt.shape[1]grad = np.zeros((rows, classes), dtype=float)hess = np.zeros((rows, classes), dtype=float)eps = 1e-6for r in range(predt.shape[0]):target = labels[r]p = softmax(predt[r, :])for c in range(predt.shape[1]):g = p[c] - 1.0 if c == target else p[c]h = max((2.0 * p[c] * (1.0 - p[c])).item(), eps)grad[r, c] = ghess[r, c] = hgrad = grad.reshape((rows * classes, 1))hess = hess.reshape((rows * classes, 1))return grad, hessclf = xgb.XGBClassifier(tree_method="hist", objective=softprob_obj)
