本文主要是介绍ATM源码分析,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
example/example.py
from atm import ATMatm = ATM()results = atm.run(train_path="/home/tqc/PycharmProjects/automl/ATM/demos/pollution_1.csv")
results.describe()
atm.worker.Worker#select_hyperpartition
调试打印的信息和论文描述的一致,超划分hyperpartition
表示条件参数树 ( c o n d i t i o n a l p a r a m e t e r t r e e , C P T ) (conditional parameter tree, CPT) (conditionalparametertree,CPT)从root到leaf的一个路径
>>> pprint(hyperpartitions)
[<dt: [('criterion', 'entropy')]>,<dt: [('criterion', 'gini')]>,<knn: [('weights', 'uniform'), ('algorithm', 'ball_tree'), ('metric', 'minkowski')]>,<knn: [('weights', 'uniform'), ('algorithm', 'ball_tree'), ('metric', 'euclidean')]>,...]
观察这个打印信息,会发现
>>> hyperpartitions[0].categoricals
[('criterion', 'entropy')]
>>> pprint(hyperpartitions[0].tunables)
[('max_features',<btb.hyper_parameter.FloatHyperParameter object at 0x7fd946ae83c8>),('max_depth',<btb.hyper_parameter.IntHyperParameter object at 0x7fd946ae82e8>),('min_samples_split',<btb.hyper_parameter.IntHyperParameter object at 0x7fd946ae8e80>),('min_samples_leaf',<btb.hyper_parameter.IntHyperParameter object at 0x7fd946ae8f28>)]
超划分的作用就是从一个支离破碎的结构空间中取一个连续N维空间,从而使GP可以在这个空间中发挥作用。
btb.selection.uniform.Uniform#select
atm.worker.Worker#select_hyperpartition
atm.worker.Worker#run_classifier
hyperpartition = self.select_hyperpartition()
随机选择一个超划分。貌似在进行MAB
pprint(params)
{'_scale': True,'algorithm': 'kd_tree','leaf_size': 38,'metric': 'chebyshev','n_neighbors': 13,'weights': 'uniform'}
atm.database.Database#start_classifier
将超参实例化为分类器对象
classifier = self.Classifier(hyperpartition_id=hyperpartition_id,datarun_id=datarun_id,host=host,hyperparameter_values=hyperparameter_values,start_time=datetime.now(),status=ClassifierStatus.RUNNING)
又是个阴间代码
atm/database.py:382
目测是在用ORM操作数据库
model, metrics = self.test_classifier(hyperpartition.method, params)
>>> model.pipeline
Pipeline(memory=None,steps=[('standard_scale',StandardScaler(copy=True, with_mean=True, with_std=True)),('knn',KNeighborsClassifier(algorithm='ball_tree', leaf_size=20,metric='euclidean', metric_params=None,n_jobs=None, n_neighbors=16, p=2,weights='distance'))],verbose=False)
>>> metrics
{'cv': [{'accuracy': 1.0, 'cohen_kappa': 1.0, 'f1': 1.0, 'mcc': 1.0, 'roc_auc': 1.0, 'ap': 1.0}, ...
感觉总体流程也就这样
selector
和tuner
默认为uniform
的随机搜索
selector (str):Type of selector to use. Optional. Defaults to ``'uniform'``.tuner (str):Type of tuner to use. Optional. Defaults to ``'uniform'``.
这篇关于ATM源码分析的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!