介绍¶
TPOT 为用户提供了大量选项来自定义搜索空间,从超参数范围到模型选择再到流水线配置。TPOT 能够选择模型、优化其超参数并构建复杂的流水线结构。每个级别的细节都有多种自定义选项。本教程将首先探讨如何为单一方法设置超参数搜索空间。接下来,我们将介绍如何设置同时进行模型选择和超参数调优。最后,我们将介绍如何利用这些步骤配置固定多步流水线的搜索空间,以及如何让 TPOT 优化流水线结构本身。
使用 ConfigSpace 的超参数搜索空间¶
超参数搜索空间使用此处找到的 ConfigSpace 包定义。有关如何设置超参数空间的更多信息,请参见其此处文档。
TPOT 使用 ConfigSpace.ConfigurationSpace
对象来定义单个模型的超参数搜索空间。此对象可用于跟踪所需的超参数,并提供从该空间随机采样的函数。
简而言之,您可以使用 ConfigSpace
的 Integer
、Float
和 Categorical
函数来定义每个参数使用的值范围。或者,可以使用包含 (min,max) 整数或浮点数的元组来指定整数/浮点数搜索空间,并使用列表来指定分类搜索空间。对于未调优的参数,还可以提供固定值。ConfigurationSpace
的 space 参数接受一个将参数名称映射到这些范围的字典。
注意:如果您想要可重现的结果,需要在搜索空间中设置一个固定的 random_state。
以下是 RandomForest 的超参数范围示例
from ConfigSpace import ConfigurationSpace
from ConfigSpace import ConfigurationSpace, Integer, Float, Categorical, Normal
from sklearn.ensemble import RandomForestClassifier
import tpot
import numpy as np
import sklearn
import sklearn.datasets
rf_configspace = ConfigurationSpace(
space = {
'n_estimators': 128, #as recommended by Oshiro et al. (2012
'max_features': Float("max_features", bounds=(0.01,1), log=True), #log scale like autosklearn?
'criterion': Categorical("criterion", ['gini', 'entropy']),
'min_samples_split': Integer("min_samples_split", bounds=(2, 20)),
'min_samples_leaf': Integer("min_samples_leaf", bounds=(1, 20)),
'bootstrap': Categorical("bootstrap", [True, False]),
#random_state = 1, # If you want results to be reproducible, you can set a fixed random_state.
}
)
hyperparameters = dict(rf_configspace.sample_configuration())
print("sampled hyperparameters")
print(hyperparameters)
rf = RandomForestClassifier(**hyperparameters)
rf
sampled hyperparameters {'bootstrap': True, 'criterion': 'gini', 'max_features': 0.8874647037836, 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 128}
/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
RandomForestClassifier(max_features=0.8874647037836, min_samples_leaf=2, min_samples_split=5, n_estimators=128)在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
RandomForestClassifier(max_features=0.8874647037836, min_samples_leaf=2, min_samples_split=5, n_estimators=128)
更简单地说
rf_configspace = ConfigurationSpace(
space = {
'n_estimators': 128, #as recommended by Oshiro et al. (2012
'max_features':(0.01,1), #not log scaled
'criterion': ['gini', 'entropy'],
'min_samples_split': (2, 20),
'min_samples_leaf': (1, 20),
'bootstrap': [True, False],
#random_state = 1, # If you want results to be reproducible, you can set a fixed random_state.
}
)
hyperparameters = dict(rf_configspace.sample_configuration())
print("sampled hyperparameters")
print(hyperparameters)
rf = RandomForestClassifier(**hyperparameters)
rf
sampled hyperparameters {'bootstrap': False, 'criterion': 'entropy', 'max_features': 0.8418685817308, 'min_samples_leaf': 5, 'min_samples_split': 2, 'n_estimators': 128}
RandomForestClassifier(bootstrap=False, criterion='entropy', max_features=0.8418685817308, min_samples_leaf=5, n_estimators=128)在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
RandomForestClassifier(bootstrap=False, criterion='entropy', max_features=0.8418685817308, min_samples_leaf=5, n_estimators=128)
TPOT 搜索空间¶
TPOT 允许您为单个方法创建超参数搜索空间以及流水线结构搜索空间。例如,TPOT 可以创建线性流水线、树或图。
TPOT 搜索空间位于 search_spaces
模块中。主要有两种搜索空间:节点搜索空间和流水线搜索空间。节点搜索空间指定一个 sklearn BaseEstimator
搜索空间。流水线搜索空间定义了一组节点搜索空间的可能结构。它们接收节点搜索空间并使用该搜索空间中的节点生成流水线。由于 sklearn Pipelines 也是 BaseEstimator
,所以流水线搜索空间在技术上也是节点搜索空间。这意味着流水线搜索空间可以接收其他流水线搜索空间,以定义更复杂的结构。节点搜索空间和流水线搜索空间的主要区别在于,流水线搜索空间必须接收另一个搜索空间作为输入,以供给其单个节点。因此,所有搜索空间最终在最低层都以节点搜索空间结束。请注意,流水线搜索空间的参数可能不同,有些只接受单个搜索空间,有些接受列表,有些接受多个已定义的参数。
节点搜索空间¶
名称 | 信息 |
---|---|
EstimatorNode | 接受一个 ConfigSpace 以及方法的类。此节点将优化单一方法的超参数。 |
GeneticFeatureSelectorNode | 使用进化来优化一组特征,导出一个简单的 sklearn 选择器,该选择器仅选择由节点选择的特征。 |
FSSNode | FSS 代表 FeatureSetSelector。此节点接收一个用户定义的特征子集列表,并选择一个预定义的子集。请注意,TPOT 不会创建新的子集,也不会每个节点选择多个子集。如果使用线性流水线,此节点应设置为第一步。在线性流水线中,建议您只使用少量特征集。我建议探索在允许 TPOT 同时选择多个 FSSNode 的流水线中使用 FSSNode。例如,DynamicUnionPipeline 和 GraphPipeline 都是 FSSNode 的优秀组合。在线性流水线开始时,在 DynamicUnionPipeline 中使用 FFSNode 来探索线性流水线中子集的最佳组合。设置 GraphSearchPipeline 的 leaf_search_space TPOT 可以用不同的方式使用多个特征集,例如,对不同的集使用不同的转换器。 |
流水线搜索空间¶
位于 tpot2.search_spaces.pipelines
WrapperPipeline - 此搜索空间用于将一个 sklearn 估计器包装在一个接受另一个估计器和超参数作为参数的方法中。例如,它可以与 sklearn.ensemble.BaggingClassifier 或 sklearn.ensemble.AdaBoostClassifier 一起使用。
名称 | 信息 |
---|---|
ChoicePipeline | 接受一个搜索空间列表。将从搜索空间中选择一个节点。 |
SequentialPipeline | 接受一个搜索空间列表。将生成一个顺序长度的流水线。流水线中的每一步将对应于同一索引中提供的搜索空间。 |
DynamicLinearPipeline | 接受一个单一搜索空间。将生成一个可变长度的线性流水线。流水线中的每一步都将从提供的搜索空间中提取。 |
UnionPipeline | 接受一个搜索空间列表。返回的流水线将包含每个搜索空间的一个估计器,这些估计器在一个 sklearn FeatureUnion 中连接。适用于在同一层中有多个步骤。 |
DynamicUnionPipeline | 接受一个单一搜索空间。它将从搜索空间中提取 1 到 max_estimators 个估计器,并在 FeatureUnion 中连接它们。 |
TreePipeline | 生成一个可变长度的流水线。流水线将具有类似于 TPOT1 的树结构。 |
GraphSearchPipeline | 生成一个可变大小的有向无环图。如果需要,可以分别为根节点、叶节点和内部节点定义搜索空间。 |
WrapperPipeline | 此搜索空间用于将一个 sklearn 估计器包装在一个接受另一个估计器和超参数作为参数的方法中。例如,它可以与 sklearn.ensemble.BaggingClassifier 或 sklearn.ensemble.AdaBoostClassifier 一起使用。 |
import tpot
from ConfigSpace import ConfigurationSpace
from ConfigSpace import ConfigurationSpace, Integer, Float, Categorical, Normal
from sklearn.neighbors import KNeighborsClassifier
knn_configspace = ConfigurationSpace(
space = {
'n_neighbors': Integer("n_neighbors", bounds=(1, 10)),
'weights': Categorical("weights", ['uniform', 'distance']),
'p': Integer("p", bounds=(1, 3)),
'metric': Categorical("metric", ['euclidean', 'minkowski']),
'n_jobs': 1,
}
)
knn_node = tpot.search_spaces.nodes.EstimatorNode(
method = KNeighborsClassifier,
space = knn_configspace,
)
您可以使用 generate() 函数采样生成个体。此个体从搜索空间采样,并提供变异和交叉函数来修改当前样本。
knn_individual = knn_node.generate()
knn_individual
<tpot.search_spaces.nodes.estimator_node.EstimatorNodeIndividual at 0x103cb5a80>
print("sampled hyperparameters")
print(knn_individual.hyperparameters)
sampled hyperparameters {'metric': 'minkowski', 'n_jobs': 1, 'n_neighbors': 4, 'p': 1, 'weights': 'uniform'}
所有 Individual 对象都具有 TPOT 用于优化流水线的变异和交叉算子。
knn_individual.mutate() # mutate the individual
print("mutated hyperparameters")
print(knn_individual.hyperparameters)
mutated hyperparameters {'metric': 'minkowski', 'n_jobs': 1, 'n_neighbors': 6, 'p': 2, 'weights': 'distance'}
在 TPOT 中,交叉只修改调用交叉函数的个体,第二个个体保持不变
knn_individual1 = knn_node.generate()
knn_individual2 = knn_node.generate()
print("original hyperparameters for individual 1")
print(knn_individual1.hyperparameters)
print("original hyperparameters for individual 2")
print(knn_individual2.hyperparameters)
print()
knn_individual1.crossover(knn_individual2) # crossover the individuals
print("post crossover hyperparameters for individual 1")
print(knn_individual1.hyperparameters)
print("post crossover hyperparameters for individual 2")
print(knn_individual2.hyperparameters)
original hyperparameters for individual 1 {'metric': 'euclidean', 'n_jobs': 1, 'n_neighbors': 8, 'p': 3, 'weights': 'uniform'} original hyperparameters for individual 2 {'metric': 'minkowski', 'n_jobs': 1, 'n_neighbors': 3, 'p': 2, 'weights': 'distance'} post crossover hyperparameters for individual 1 {'metric': 'minkowski', 'n_jobs': 1, 'n_neighbors': 3, 'p': 2, 'weights': 'distance'} post crossover hyperparameters for individual 2 {'metric': 'minkowski', 'n_jobs': 1, 'n_neighbors': 3, 'p': 2, 'weights': 'distance'}
所有搜索空间都有一个 export_pipeline 函数,该函数返回一个 sklearn BaseEstimator
est = knn_individual1.export_pipeline()
est
KNeighborsClassifier(n_jobs=1, n_neighbors=3, weights='distance')在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
KNeighborsClassifier(n_jobs=1, n_neighbors=3, weights='distance')
如果传递的是参数字典而不是 ConfigSpace 对象,则超参数将始终固定且不会被学习。
import tpot
from ConfigSpace import ConfigurationSpace
from ConfigSpace import ConfigurationSpace, Integer, Float, Categorical, Normal
from sklearn.neighbors import KNeighborsClassifier
space = {
'n_neighbors':10,
}
knn_node = tpot.search_spaces.nodes.EstimatorNode(
method = KNeighborsClassifier,
space = space,
)
knn_node.generate().export_pipeline()
KNeighborsClassifier(n_neighbors=10)在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
KNeighborsClassifier(n_neighbors=10)
FSSNode 和 GeneticFeatureSelectorNode¶
这两者都有各自的教程。FFSNode 请参见教程 3,GeneticFeatureSelectorNode 请参见教程 5。
流水线搜索空间示例¶
流水线搜索空间用于定义 TPOT 可以搜索的流水线的结构和限制。与节点搜索空间不同,所有流水线搜索空间都接收其他搜索空间作为输入。流水线搜索空间不采样超参数,而是可以从输入搜索空间中选择模型,并将其组织在线性 sklearn Pipeline 或 TPOT GraphPipeline 中。
ChoicePipeline¶
最简单的流水线搜索空间是 ChoicePipeline。它接收一个搜索空间列表,然后简单地从中选择并采样一个。在此示例中,我们将构建一个搜索空间,该空间包含分类器的几个选项。然后,由此产生的搜索空间将首先从 KNeighborsClassifier、LogisticRegression 或 DecisionTreeClassifier 中选择一个模型,然后选择给定模型的超参数。
import tpot
from ConfigSpace import ConfigurationSpace
from ConfigSpace import ConfigurationSpace, Integer, Float, Categorical, Normal
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
knn_configspace = ConfigurationSpace(
space = {
'n_neighbors': Integer("n_neighbors", bounds=(1, 10)),
'weights': Categorical("weights", ['uniform', 'distance']),
'p': Integer("p", bounds=(1, 3)),
'metric': Categorical("metric", ['euclidean', 'minkowski']),
'n_jobs': 1,
}
)
lr_configspace = ConfigurationSpace(
space = {
'solver': Categorical("solver", ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']),
'penalty': Categorical("penalty", ['l1', 'l2']),
'dual': Categorical("dual", [True, False]),
'C': Float("C", bounds=(1e-4, 1e4), log=True),
'class_weight': Categorical("class_weight", ['balanced']),
'n_jobs': 1,
'max_iter': 1000,
}
)
dt_configspace = ConfigurationSpace(
space = {
'criterion': Categorical("criterion", ['gini', 'entropy']),
'max_depth': Integer("max_depth", bounds=(1, 11)),
'min_samples_split': Integer("min_samples_split", bounds=(2, 21)),
'min_samples_leaf': Integer("min_samples_leaf", bounds=(1, 21)),
'max_features': Categorical("max_features", ['sqrt', 'log2']),
'min_weight_fraction_leaf': 0.0,
}
)
knn_node = tpot.search_spaces.nodes.EstimatorNode(
method = KNeighborsClassifier,
space = knn_configspace,
)
lr_node = tpot.search_spaces.nodes.EstimatorNode(
method = LogisticRegression,
space = lr_configspace,
)
dt_node = tpot.search_spaces.nodes.EstimatorNode(
method = DecisionTreeClassifier,
space = dt_configspace,
)
classifier_node = tpot.search_spaces.pipelines.ChoicePipeline(
search_spaces=[
knn_node,
lr_node,
dt_node,
]
)
tpot.search_spaces.pipelines.ChoicePipeline(
search_spaces = [
tpot.search_spaces.nodes.EstimatorNode(
method = KNeighborsClassifier,
space = knn_configspace,
),
tpot.search_spaces.nodes.EstimatorNode(
method = LogisticRegression,
space = lr_configspace,
),
tpot.search_spaces.nodes.EstimatorNode(
method = DecisionTreeClassifier,
space = dt_configspace,
),
]
)
<tpot.search_spaces.pipelines.choice.ChoicePipeline at 0x32f769780>
流水线搜索空间提供的搜索空间对象的工作方式与节点搜索空间相同。请注意,交叉仅在两个个体采样相同方法时才有效。
classifier_individual = classifier_node.generate()
print("sampled pipeline")
classifier_individual.export_pipeline()
sampled pipeline
KNeighborsClassifier(n_jobs=1, n_neighbors=3)在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
KNeighborsClassifier(n_jobs=1, n_neighbors=3)
print("mutated pipeline")
classifier_individual.mutate()
classifier_individual.export_pipeline()
mutated pipeline
KNeighborsClassifier(metric='euclidean', n_jobs=1, n_neighbors=9)在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
KNeighborsClassifier(metric='euclidean', n_jobs=1, n_neighbors=9)
EstimatorNode 和 ChoicePipeline 的内置搜索空间¶
TPOT 还提供了预定义的超参数搜索空间。当前的搜索空间改编自原始 TPOT 包以及 AutoSklearn 中使用的搜索空间。辅助函数 tpot.config.get_search_space
接受一个字符串或字符串列表,并分别返回一个 EstimatorNode 或一个 ChoicePipeline(包含列表中的所有方法)。
字符串 | 对应方法 |
---|---|
SGDClassifier | <class 'sklearn.linear_model._stochastic_gradient.SGDClassifier'> |
RandomForestClassifier | <class 'sklearn.ensemble._forest.RandomForestClassifier'> |
ExtraTreesClassifier | <class 'sklearn.ensemble._forest.ExtraTreesClassifier'> |
GradientBoostingClassifier | <class 'sklearn.ensemble._gb.GradientBoostingClassifier'> |
MLPClassifier | <class 'sklearn.neural_network._multilayer_perceptron.MLPClassifier'> |
DecisionTreeClassifier | <class 'sklearn.tree._classes.DecisionTreeClassifier'> |
XGBClassifier | <class 'xgboost.sklearn.XGBClassifier'> |
KNeighborsClassifier | <class 'sklearn.neighbors._classification.KNeighborsClassifier'> |
SVC | <class 'sklearn.svm._classes.SVC'> |
LogisticRegression | <class 'sklearn.linear_model._logistic.LogisticRegression'> |
LGBMClassifier | <class 'lightgbm.sklearn.LGBMClassifier'> |
LinearSVC | <class 'sklearn.svm._classes.LinearSVC'> |
GaussianNB | <class 'sklearn.naive_bayes.GaussianNB'> |
BernoulliNB | <class 'sklearn.naive_bayes.BernoulliNB'> |
MultinomialNB | <class 'sklearn.naive_bayes.MultinomialNB'> |
ExtraTreesRegressor | <class 'sklearn.ensemble._forest.ExtraTreesRegressor'> |
RandomForestRegressor | <class 'sklearn.ensemble._forest.RandomForestRegressor'> |
GradientBoostingRegressor | <class 'sklearn.ensemble._gb.GradientBoostingRegressor'> |
BaggingRegressor | <class 'sklearn.ensemble._bagging.BaggingRegressor'> |
DecisionTreeRegressor | <class 'sklearn.tree._classes.DecisionTreeRegressor'> |
KNeighborsRegressor | <class 'sklearn.neighbors._regression.KNeighborsRegressor'> |
XGBRegressor | <class 'xgboost.sklearn.XGBRegressor'> |
ZeroCount | <class 'tpot.builtin_modules.zero_count.ZeroCount'> |
ColumnOneHotEncoder | <class 'tpot.builtin_modules.column_one_hot_encoder.ColumnOneHotEncoder'> |
Binarizer | <class 'sklearn.preprocessing._data.Binarizer'> |
FastICA | <class 'sklearn.decomposition._fastica.FastICA'> |
FeatureAgglomeration | <class 'sklearn.cluster._agglomerative.FeatureAgglomeration'> |
MaxAbsScaler | <class 'sklearn.preprocessing._data.MaxAbsScaler'> |
MinMaxScaler | <class 'sklearn.preprocessing._data.MinMaxScaler'> |
Normalizer | <class 'sklearn.preprocessing._data.Normalizer'> |
Nystroem | <class 'sklearn.kernel_approximation.Nystroem'> |
PCA | <class 'sklearn.decomposition._pca.PCA'> |
PolynomialFeatures | <class 'sklearn.preprocessing._polynomial.PolynomialFeatures'> |
RBFSampler | <class 'sklearn.kernel_approximation.RBFSampler'> |
RobustScaler | <class 'sklearn.preprocessing._data.RobustScaler'> |
StandardScaler | <class 'sklearn.preprocessing._data.StandardScaler'> |
SelectFwe | <class 'sklearn.feature_selection._univariate_selection.SelectFwe'> |
SelectPercentile | <class 'sklearn.feature_selection._univariate_selection.SelectPercentile'> |
VarianceThreshold | <class 'sklearn.feature_selection._variance_threshold.VarianceThreshold'> |
SGDRegressor | <class 'sklearn.linear_model._stochastic_gradient.SGDRegressor'> |
Ridge | <class 'sklearn.linear_model._ridge.Ridge'> |
Lasso | <class 'sklearn.linear_model._coordinate_descent.Lasso'> |
ElasticNet | <class 'sklearn.linear_model._coordinate_descent.ElasticNet'> |
Lars | <class 'sklearn.linear_model._least_angle.Lars'> |
LassoLars | <class 'sklearn.linear_model._least_angle.LassoLars'> |
LassoLarsCV | <class 'sklearn.linear_model._least_angle.LassoLarsCV'> |
RidgeCV | <class 'sklearn.linear_model._ridge.RidgeCV'> |
SVR | <class 'sklearn.svm._classes.SVR'> |
LinearSVR | <class 'sklearn.svm._classes.LinearSVR'> |
AdaBoostRegressor | <class 'sklearn.ensemble._weight_boosting.AdaBoostRegressor'> |
ElasticNetCV | <class 'sklearn.linear_model._coordinate_descent.ElasticNetCV'> |
AdaBoostClassifier | <class 'sklearn.ensemble._weight_boosting.AdaBoostClassifier'> |
MLPRegressor | <class 'sklearn.neural_network._multilayer_perceptron.MLPRegressor'> |
GaussianProcessRegressor | <class 'sklearn.gaussian_process._gpr.GaussianProcessRegressor'> |
HistGradientBoostingClassifier | <class 'sklearn.ensemble._hist_gradient_boosting.gradient_boosting.HistGradientBoostingClassifier'> |
HistGradientBoostingRegressor | <class 'sklearn.ensemble._hist_gradient_boosting.gradient_boosting.HistGradientBoostingRegressor'> |
AddTransformer | <class 'tpot.builtin_modules.arithmetictransformer.AddTransformer'> |
mul_neg_1_Transformer | <class 'tpot.builtin_modules.arithmetictransformer.mul_neg_1_Transformer'> |
MulTransformer | <class 'tpot.builtin_modules.arithmetictransformer.MulTransformer'> |
SafeReciprocalTransformer | <class 'tpot.builtin_modules.arithmetictransformer.SafeReciprocalTransformer'> |
EQTransformer | <class 'tpot.builtin_modules.arithmetictransformer.EQTransformer'> |
NETransformer | <class 'tpot.builtin_modules.arithmetictransformer.NETransformer'> |
GETransformer | <class 'tpot.builtin_modules.arithmetictransformer.GETransformer'> |
GTTransformer | <class 'tpot.builtin_modules.arithmetictransformer.GTTransformer'> |
LETransformer | <class 'tpot.builtin_modules.arithmetictransformer.LETransformer'> |
LTTransformer | <class 'tpot.builtin_modules.arithmetictransformer.LTTransformer'> |
MinTransformer | <class 'tpot.builtin_modules.arithmetictransformer.MinTransformer'> |
MaxTransformer | <class 'tpot.builtin_modules.arithmetictransformer.MaxTransformer'> |
ZeroTransformer | <class 'tpot.builtin_modules.arithmetictransformer.ZeroTransformer'> |
OneTransformer | <class 'tpot.builtin_modules.arithmetictransformer.OneTransformer'> |
NTransformer | <class 'tpot.builtin_modules.arithmetictransformer.NTransformer'> |
PowerTransformer | <class 'sklearn.preprocessing._data.PowerTransformer'> |
QuantileTransformer | <class 'sklearn.preprocessing._data.QuantileTransformer'> |
ARDRegression | <class 'sklearn.linear_model._bayes.ARDRegression'> |
QuadraticDiscriminantAnalysis | <class 'sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis'> |
PassiveAggressiveClassifier | <class 'sklearn.linear_model._passive_aggressive.PassiveAggressiveClassifier'> |
LinearDiscriminantAnalysis | <class 'sklearn.discriminant_analysis.LinearDiscriminantAnalysis'> |
DominantEncoder | <class 'tpot.builtin_modules.genetic_encoders.DominantEncoder'> |
RecessiveEncoder | <class 'tpot.builtin_modules.genetic_encoders.RecessiveEncoder'> |
HeterosisEncoder | <class 'tpot.builtin_modules.genetic_encoders.HeterosisEncoder'> |
UnderDominanceEncoder | <class 'tpot.builtin_modules.genetic_encoders.UnderDominanceEncoder'> |
OverDominanceEncoder | <class 'tpot.builtin_modules.genetic_encoders.OverDominanceEncoder'> |
GaussianProcessClassifier | <class 'sklearn.gaussian_process._gpc.GaussianProcessClassifier'> |
BaggingClassifier | <class 'sklearn.ensemble._bagging.BaggingClassifier'> |
LGBMRegressor | <class 'lightgbm.sklearn.LGBMRegressor'> |
直通 | <class 'tpot.builtin_modules.passthrough.Passthrough'> |
SkipTransformer | <class 'tpot.builtin_modules.passthrough.SkipTransformer'> |
PassKBinsDiscretizer | <class 'tpot.builtin_modules.passkbinsdiscretizer.PassKBinsDiscretizer'> |
SimpleImputer | <class 'sklearn.impute._base.SimpleImputer'> |
IterativeImputer | <class 'sklearn.impute._iterative.IterativeImputer'> |
KNNImputer | <class 'sklearn.impute._knn.KNNImputer'> |
MDR | <class 'mdr.mdr.MDR'> |
ContinuousMDR | <class 'mdr.continuous_mdr.ContinuousMDR'> |
ReliefF | <class 'skrebate.relieff.ReliefF'> |
SURF | <class 'skrebate.surf.SURF'> |
SURFstar | <class 'skrebate.surfstar.SURFstar'> |
MultiSURF | <class 'skrebate.multisurf.MultiSURF'> |
LinearRegression_sklearnex | <class 'sklearnex.linear_model.linear.LinearRegression'> |
Ridge_sklearnex | <class 'daal4py.sklearn.linear_model._ridge.Ridge'> |
Lasso_sklearnex | <class 'daal4py.sklearn.linear_model._coordinate_descent.Lasso'> |
ElasticNet_sklearnex | <class 'daal4py.sklearn.linear_model._coordinate_descent.ElasticNet'> |
SVR_sklearnex | <class 'sklearnex.svm.svr.SVR'> |
NuSVR_sklearnex | <class 'sklearnex.svm.nusvr.NuSVR'> |
RandomForestRegressor_sklearnex | <class 'sklearnex.ensemble._forest.RandomForestRegressor'> |
KNeighborsRegressor_sklearnex | <class 'sklearnex.neighbors.knn_regression.KNeighborsRegressor'> |
RandomForestClassifier_sklearnex | <class 'sklearnex.ensemble._forest.RandomForestClassifier'> |
KNeighborsClassifier_sklearnex | <class 'sklearnex.neighbors.knn_classification.KNeighborsClassifier'> |
SVC_sklearnex | <class 'sklearnex.svm.svc.SVC'> |
NuSVC_sklearnex | <class 'sklearnex.svm.nusvc.NuSVC'> |
LogisticRegression_sklearnex | <class 'sklearnex.linear_model.logistic_regression.LogisticRegression'> |
某些方法需要一个包装的估计器。为了兼顾回归和分类,这些方法已分别使用各自的特殊字符串进行分组。
包装器特殊字符串 | 注释 |
---|---|
RFE_classification | 使用学习到的 ExtraTreesClassifier 进行 RFE |
RFE_regression | 使用学习到的 ExtraTreesRegressor 进行 RFE |
SelectFromModel_classification | 使用学习到的 ExtraTreesClassifier 进行 SelectFromModel |
SelectFromModel_regression | 使用学习到的 ExtraTreesRegressor 进行 SelectFromModel |
IterativeImputer_learned_estimators | 使用学习到的 ExtraTreesRegressor 进行 IterativeImputer |
还有一些包含预定义方法列表的特殊字符串。这些将返回一个包含这些方法的 ChoicePipeline。
列表特殊字符串 | 包含的方法 |
---|---|
"selectors" | ["SelectFwe", "SelectPercentile", "VarianceThreshold",] |
"selectors_classification" | ["SelectFwe", "SelectPercentile", "VarianceThreshold", "RFE_classification", "SelectFromModel_classification"] |
"selectors_regression" | ["SelectFwe", "SelectPercentile", "VarianceThreshold", "RFE_regression", "SelectFromModel_regression"] |
"classifiers" | ["LGBMClassifier", "BaggingClassifier", 'AdaBoostClassifier', 'BernoulliNB', 'DecisionTreeClassifier', 'ExtraTreesClassifier', 'GaussianNB', 'HistGradientBoostingClassifier', 'KNeighborsClassifier','LinearDiscriminantAnalysis', 'LogisticRegression', "LinearSVC", "SVC", 'MLPClassifier', 'MultinomialNB', "QuadraticDiscriminantAnalysis", 'RandomForestClassifier', 'SGDClassifier', 'XGBClassifier'] |
"regressors" | ["LGBMRegressor", 'AdaBoostRegressor', "ARDRegression", 'DecisionTreeRegressor', 'ExtraTreesRegressor', 'HistGradientBoostingRegressor', 'KNeighborsRegressor', 'LinearSVR', "MLPRegressor", 'RandomForestRegressor', 'SGDRegressor', 'SVR', 'XGBRegressor'] |
"transformers" | ["PassKBinsDiscretizer", "Binarizer", "PCA", "ZeroCount", "ColumnOneHotEncoder", "FastICA", "FeatureAgglomeration", "Nystroem", "RBFSampler", "QuantileTransformer", "PowerTransformer"] |
"scalers" | ["MinMaxScaler", "RobustScaler", "StandardScaler", "MaxAbsScaler", "Normalizer", ] |
"all_transformers" | ["transformers", "scalers"] |
"arithmatic" | ["AddTransformer", "mul_neg_1_Transformer", "MulTransformer", "SafeReciprocalTransformer", "EQTransformer", "NETransformer", "GETransformer", "GTTransformer", "LETransformer", "LTTransformer", "MinTransformer", "MaxTransformer"] |
"imputers" | ["SimpleImputer", "IterativeImputer", "KNNImputer"] |
"skrebate" | ["ReliefF", "SURF", "SURFstar", "MultiSURF"] |
"genetic_encoders" | ["DominantEncoder", "RecessiveEncoder", "HeterosisEncoder", "UnderDominanceEncoder", "OverDominanceEncoder"] |
"classifiers_sklearnex" | ["RandomForestClassifier_sklearnex", "LogisticRegression_sklearnex", "KNeighborsClassifier_sklearnex", "SVC_sklearnex","NuSVC_sklearnex"] |
"regressors_sklearnex" | ["LinearRegression_sklearnex", "Ridge_sklearnex", "Lasso_sklearnex", "ElasticNet_sklearnex", "SVR_sklearnex", "NuSVR_sklearnex", "RandomForestRegressor_sklearnex", "KNeighborsRegressor_sklearnex"] |
遗传编码器 | ["DominantEncoder", "RecessiveEncoder", "HeterosisEncoder", "UnderDominanceEncoder", "OverDominanceEncoder"] |
以下是一些使用 get_search_space
函数获取搜索空间的示例。
#same pipeline search space as before.
classifier_choice = tpot.config.get_search_space(["KNeighborsClassifier", "LogisticRegression", "DecisionTreeClassifier"])
print("sampled pipeline 1")
classifier_choice.generate().export_pipeline()
sampled pipeline 1
KNeighborsClassifier(n_jobs=1, n_neighbors=15, p=1, weights='distance')在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
KNeighborsClassifier(n_jobs=1, n_neighbors=15, p=1, weights='distance')
print("sampled pipeline 2")
classifier_choice.generate().export_pipeline()
sampled pipeline 2
LogisticRegression(C=5.9018435257131, max_iter=1000, n_jobs=1, solver='saga')在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
LogisticRegression(C=5.9018435257131, max_iter=1000, n_jobs=1, solver='saga')
#search space for all classifiers
classifier_choice = tpot.config.get_search_space("classifiers")
print("sampled pipeline 1")
classifier_choice.generate().export_pipeline()
sampled pipeline 1
SGDClassifier(alpha=0.0007786971309, class_weight='balanced', eta0=0.0209976430718, l1_ratio=0.8571538017043, learning_rate='constant', loss='modified_huber', n_jobs=1, penalty='elasticnet')在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
SGDClassifier(alpha=0.0007786971309, class_weight='balanced', eta0=0.0209976430718, l1_ratio=0.8571538017043, learning_rate='constant', loss='modified_huber', n_jobs=1, penalty='elasticnet')
print("sampled pipeline 2")
classifier_choice.generate().export_pipeline()
sampled pipeline 2
BernoulliNB(alpha=0.0667141454883, fit_prior=False)在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
BernoulliNB(alpha=0.0667141454883, fit_prior=False)
关于可重现性的说明¶
许多 sklearn 估计器(例如 RandomForestClassifier)是随机的,需要 random_state 参数才能获得确定性结果。如果您希望 TPOT 运行可重现,重要的是 TPOT 使用的估计器设置了随机状态。TPOT 不会自动设置此值。这可以在每个搜索空间中手动设置,也可以通过将随机状态传递给 get_search_space
函数来设置。例如
reproducible_random_forest = tpot.config.get_search_space("RandomForestClassifier", random_state=1)
reproducible_random_forest.generate().export_pipeline()
RandomForestClassifier(bootstrap=False, max_features=0.0234127070363, min_samples_leaf=3, min_samples_split=8, n_estimators=128, n_jobs=1, random_state=1)在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
RandomForestClassifier(bootstrap=False, max_features=0.0234127070363, min_samples_leaf=3, min_samples_split=8, n_estimators=128, n_jobs=1, random_state=1)
SequentialPipeline¶
SequentialPipelines 具有固定长度,并在每个步骤中从预定义分布中采样。
selector_choicepipeline = tpot.config.get_search_space("VarianceThreshold")
transformer_choicepipeline = tpot.config.get_search_space("PCA")
classifier_choicepipeline = tpot.config.get_search_space("LogisticRegression")
stc_pipeline = tpot.search_spaces.pipelines.SequentialPipeline([
selector_choicepipeline,
transformer_choicepipeline,
classifier_choicepipeline,
])
print("sampled pipeline")
stc_pipeline.generate().export_pipeline()
sampled pipeline
Pipeline(steps=[('variancethreshold', VarianceThreshold(threshold=0.00023551581)), ('pca', PCA(n_components=0.9764631370244)), ('logisticregression', LogisticRegression(C=1.9396611393109, max_iter=1000, n_jobs=1, penalty='l1', solver='saga'))])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('variancethreshold', VarianceThreshold(threshold=0.00023551581)), ('pca', PCA(n_components=0.9764631370244)), ('logisticregression', LogisticRegression(C=1.9396611393109, max_iter=1000, n_jobs=1, penalty='l1', solver='saga'))])
VarianceThreshold(threshold=0.00023551581)
PCA(n_components=0.9764631370244)
LogisticRegression(C=1.9396611393109, max_iter=1000, n_jobs=1, penalty='l1', solver='saga')
以下是 Selector-Transformer-Classifier 形式的示例。
请注意,这次序列中的每一步都是一个 ChoicePipeline。在这里,SequentialPipeline 可以按顺序从提供的搜索空间中采样。
selector_choicepipeline = tpot.config.get_search_space("selectors")
transformer_choicepipeline = tpot.config.get_search_space("transformers")
classifier_choicepipeline = tpot.config.get_search_space("classifiers")
stc_pipeline = tpot.search_spaces.pipelines.SequentialPipeline([
selector_choicepipeline,
transformer_choicepipeline,
classifier_choicepipeline,
])
print("sampled pipeline")
stc_pipeline.generate().export_pipeline()
sampled pipeline
Pipeline(steps=[('variancethreshold', VarianceThreshold(threshold=0.0004317798946)), ('kbinsdiscretizer', KBinsDiscretizer(encode='onehot-dense', n_bins=77)), ('lgbmclassifier', LGBMClassifier(boosting_type='dart', max_depth=5, n_estimators=76, n_jobs=1, num_leaves=192, verbose=-1))])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('variancethreshold', VarianceThreshold(threshold=0.0004317798946)), ('kbinsdiscretizer', KBinsDiscretizer(encode='onehot-dense', n_bins=77)), ('lgbmclassifier', LGBMClassifier(boosting_type='dart', max_depth=5, n_estimators=76, n_jobs=1, num_leaves=192, verbose=-1))])
VarianceThreshold(threshold=0.0004317798946)
KBinsDiscretizer(encode='onehot-dense', n_bins=77)
LGBMClassifier(boosting_type='dart', max_depth=5, n_estimators=76, n_jobs=1, num_leaves=192, verbose=-1)
print("sampled pipeline")
stc_pipeline.generate().export_pipeline()
sampled pipeline
Pipeline(steps=[('selectpercentile', SelectPercentile(percentile=4.5788544361168)), ('columnonehotencoder', ColumnOneHotEncoder()), ('decisiontreeclassifier', DecisionTreeClassifier(criterion='entropy', max_depth=10, min_samples_split=13))])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('selectpercentile', SelectPercentile(percentile=4.5788544361168)), ('columnonehotencoder', ColumnOneHotEncoder()), ('decisiontreeclassifier', DecisionTreeClassifier(criterion='entropy', max_depth=10, min_samples_split=13))])
SelectPercentile(percentile=4.5788544361168)
ColumnOneHotEncoder()
DecisionTreeClassifier(criterion='entropy', max_depth=10, min_samples_split=13)
DynamicLinearPipeline¶
DynamicLinearPipeline 接受一个单一搜索空间,并随机采样估计器并将其放入列表中,没有预定义的顺序。DynamicLinearPipeline 最常与 LinearPipeline 配对使用。常见的策略是使用 DynamicLinearPipeline 优化一系列预处理或特征工程步骤,然后是最终的分类器或回归器。
import tpot.config
linear_feature_engineering = tpot.search_spaces.pipelines.DynamicLinearPipeline(search_space = tpot.config.get_search_space(["all_transformers","selectors_classification"]), max_length=10)
print("sampled pipeline")
linear_feature_engineering.generate().export_pipeline()
sampled pipeline
Pipeline(steps=[('pca-1', PCA(n_components=0.6376571946485)), ('pca-2', PCA(n_components=0.7836827180307)), ('quantiletransformer', QuantileTransformer(n_quantiles=334, output_distribution='normal'))])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('pca-1', PCA(n_components=0.6376571946485)), ('pca-2', PCA(n_components=0.7836827180307)), ('quantiletransformer', QuantileTransformer(n_quantiles=334, output_distribution='normal'))])
PCA(n_components=0.6376571946485)
PCA(n_components=0.7836827180307)
QuantileTransformer(n_quantiles=334, output_distribution='normal')
print("sampled pipeline")
linear_feature_engineering.generate().export_pipeline()
sampled pipeline
Pipeline(steps=[('selectfwe', SelectFwe(alpha=0.0004164619371)), ('binarizer', Binarizer(threshold=0.2392693027442)), ('rbfsampler', RBFSampler(gamma=0.3669672326084, n_components=35))])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('selectfwe', SelectFwe(alpha=0.0004164619371)), ('binarizer', Binarizer(threshold=0.2392693027442)), ('rbfsampler', RBFSampler(gamma=0.3669672326084, n_components=35))])
SelectFwe(alpha=0.0004164619371)
Binarizer(threshold=0.2392693027442)
RBFSampler(gamma=0.3669672326084, n_components=35)
full_search_space = tpot.search_spaces.pipelines.SequentialPipeline([
linear_feature_engineering,
tpot.config.get_search_space("classifiers"),
])
print("sampled pipeline")
full_search_space.generate().export_pipeline()
sampled pipeline
Pipeline(steps=[('pipeline', Pipeline(steps=[('binarizer', Binarizer(threshold=0.2150677779496)), ('maxabsscaler', MaxAbsScaler()), ('columnonehotencoder', ColumnOneHotEncoder())])), ('gaussiannb', GaussianNB())])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('pipeline', Pipeline(steps=[('binarizer', Binarizer(threshold=0.2150677779496)), ('maxabsscaler', MaxAbsScaler()), ('columnonehotencoder', ColumnOneHotEncoder())])), ('gaussiannb', GaussianNB())])
Pipeline(steps=[('binarizer', Binarizer(threshold=0.2150677779496)), ('maxabsscaler', MaxAbsScaler()), ('columnonehotencoder', ColumnOneHotEncoder())])
Binarizer(threshold=0.2150677779496)
MaxAbsScaler()
ColumnOneHotEncoder()
GaussianNB()
print("sampled pipeline")
full_search_space.generate().export_pipeline()
sampled pipeline
Pipeline(steps=[('pipeline', Pipeline(steps=[('zerocount', ZeroCount()), ('selectfrommodel', SelectFromModel(estimator=ExtraTreesClassifier(class_weight='balanced', max_features=0.1619832293406, min_samples_leaf=7, min_samples_split=7, n_jobs=1), threshold=0.6414209870839)), ('variancethreshold', VarianceThreshold(threshold=0.0113542845765))])), ('multinomialnb', MultinomialNB(alpha=0.0815128367119))])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('pipeline', Pipeline(steps=[('zerocount', ZeroCount()), ('selectfrommodel', SelectFromModel(estimator=ExtraTreesClassifier(class_weight='balanced', max_features=0.1619832293406, min_samples_leaf=7, min_samples_split=7, n_jobs=1), threshold=0.6414209870839)), ('variancethreshold', VarianceThreshold(threshold=0.0113542845765))])), ('multinomialnb', MultinomialNB(alpha=0.0815128367119))])
Pipeline(steps=[('zerocount', ZeroCount()), ('selectfrommodel', SelectFromModel(estimator=ExtraTreesClassifier(class_weight='balanced', max_features=0.1619832293406, min_samples_leaf=7, min_samples_split=7, n_jobs=1), threshold=0.6414209870839)), ('variancethreshold', VarianceThreshold(threshold=0.0113542845765))])
ZeroCount()
SelectFromModel(estimator=ExtraTreesClassifier(class_weight='balanced', max_features=0.1619832293406, min_samples_leaf=7, min_samples_split=7, n_jobs=1), threshold=0.6414209870839)
ExtraTreesClassifier(class_weight='balanced', max_features=0.1619832293406, min_samples_leaf=7, min_samples_split=7, n_jobs=1)
ExtraTreesClassifier(class_weight='balanced', max_features=0.1619832293406, min_samples_leaf=7, min_samples_split=7, n_jobs=1)
VarianceThreshold(threshold=0.0113542845765)
MultinomialNB(alpha=0.0815128367119)
UnionPipeline¶
当您想在单层中进行多次转换时,联合流水线会很有用。另一种常见策略是与转换器和直通进行联合,以便在转换的同时保留原始数据。
transform_and_passthrough = tpot.search_spaces.pipelines.UnionPipeline([
tpot.config.get_search_space("transformers"),
tpot.config.get_search_space("Passthrough"),
])
transform_and_passthrough.generate().export_pipeline()
FeatureUnion(transformer_list=[('pca', PCA(n_components=0.7674007136568)), ('passthrough', Passthrough())])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
FeatureUnion(transformer_list=[('pca', PCA(n_components=0.7674007136568)), ('passthrough', Passthrough())])
PCA(n_components=0.7674007136568)
Passthrough()
UnionPipelines 是扩展线性搜索空间能力的优秀工具。
stc_pipeline2 = tpot.search_spaces.pipelines.SequentialPipeline([
tpot.config.get_search_space("selectors"),
transform_and_passthrough,
tpot.config.get_search_space("classifiers"),
])
stc_pipeline2.generate().export_pipeline()
Pipeline(steps=[('selectpercentile', SelectPercentile(percentile=29.1049436421441)), ('featureunion', FeatureUnion(transformer_list=[('powertransformer', PowerTransformer()), ('passthrough', Passthrough())])), ('extratreesclassifier', ExtraTreesClassifier(max_features=0.8376611419015, min_samples_leaf=9, min_samples_split=17, n_jobs=1))])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('selectpercentile', SelectPercentile(percentile=29.1049436421441)), ('featureunion', FeatureUnion(transformer_list=[('powertransformer', PowerTransformer()), ('passthrough', Passthrough())])), ('extratreesclassifier', ExtraTreesClassifier(max_features=0.8376611419015, min_samples_leaf=9, min_samples_split=17, n_jobs=1))])
SelectPercentile(percentile=29.1049436421441)
FeatureUnion(transformer_list=[('powertransformer', PowerTransformer()), ('passthrough', Passthrough())])
PowerTransformer()
Passthrough()
ExtraTreesClassifier(max_features=0.8376611419015, min_samples_leaf=9, min_samples_split=17, n_jobs=1)
如果您正在尝试创建树状搜索空间,联合流水线也可以用于创建“分支”。当与 FeatureSetSelector 节点 (FSSNode) 配对时,这会特别有用,因为每个分支可以学习针对不同特征子集的不同特征工程。
st_pipeline = tpot.search_spaces.pipelines.SequentialPipeline([
tpot.config.get_search_space("selectors"),
tpot.config.get_search_space("transformers"),
])
branched_pipeline = tpot.search_spaces.pipelines.SequentialPipeline([
tpot.search_spaces.pipelines.UnionPipeline([
st_pipeline,
st_pipeline,
]),
tpot.config.get_search_space("classifiers"),
])
branched_pipeline.generate().export_pipeline()
Pipeline(steps=[('featureunion', FeatureUnion(transformer_list=[('pipeline-1', Pipeline(steps=[('selectfwe', SelectFwe(alpha=0.0080564930162)), ('quantiletransformer', QuantileTransformer(n_quantiles=450, output_distribution='normal'))])), ('pipeline-2', Pipeline(steps=[('variancethreshold', VarianceThreshold(threshold=0.155443085484)), ('columnonehotencoder... feature_types=None, gamma=14.5866790094856, grow_policy=None, importance_type=None, interaction_constraints=None, learning_rate=0.2226908938347, max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None, max_delta_step=None, max_depth=11, max_leaves=None, min_child_weight=3, missing=nan, monotone_constraints=None, multi_strategy=None, n_estimators=100, n_jobs=1, nthread=1, num_parallel_tree=None, ...))])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('featureunion', FeatureUnion(transformer_list=[('pipeline-1', Pipeline(steps=[('selectfwe', SelectFwe(alpha=0.0080564930162)), ('quantiletransformer', QuantileTransformer(n_quantiles=450, output_distribution='normal'))])), ('pipeline-2', Pipeline(steps=[('variancethreshold', VarianceThreshold(threshold=0.155443085484)), ('columnonehotencoder... feature_types=None, gamma=14.5866790094856, grow_policy=None, importance_type=None, interaction_constraints=None, learning_rate=0.2226908938347, max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None, max_delta_step=None, max_depth=11, max_leaves=None, min_child_weight=3, missing=nan, monotone_constraints=None, multi_strategy=None, n_estimators=100, n_jobs=1, nthread=1, num_parallel_tree=None, ...))])
FeatureUnion(transformer_list=[('pipeline-1', Pipeline(steps=[('selectfwe', SelectFwe(alpha=0.0080564930162)), ('quantiletransformer', QuantileTransformer(n_quantiles=450, output_distribution='normal'))])), ('pipeline-2', Pipeline(steps=[('variancethreshold', VarianceThreshold(threshold=0.155443085484)), ('columnonehotencoder', ColumnOneHotEncoder())]))])
SelectFwe(alpha=0.0080564930162)
QuantileTransformer(n_quantiles=450, output_distribution='normal')
VarianceThreshold(threshold=0.155443085484)
ColumnOneHotEncoder()
XGBClassifier(base_score=None, booster=None, callbacks=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, device=None, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, feature_types=None, gamma=14.5866790094856, grow_policy=None, importance_type=None, interaction_constraints=None, learning_rate=0.2226908938347, max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None, max_delta_step=None, max_depth=11, max_leaves=None, min_child_weight=3, missing=nan, monotone_constraints=None, multi_strategy=None, n_estimators=100, n_jobs=1, nthread=1, num_parallel_tree=None, ...)
DynamicUnionPipeline¶
DynamicUnionPipeline 的工作原理与 UnionPipeline 类似。UnionPipeline 是固定长度的,每个索引对应于作为列表提供的搜索空间,而 DynamicUnionPipeline 接受一个单一搜索空间,并将采样 1 个或多个估计器/流水线,并将其与 FeatureUnion 连接。
请注意,DynamicUnionPipeline 将检查流水线的唯一性,因此它永远不会连接两个完全相同的流水线。换句话说,特征联合中的所有步骤都将是唯一的。
当您想要多个转换器(在某些情况下甚至是流水线),但不确定需要多少或哪些时,这会很有用。
dynamic_transformers = tpot.search_spaces.pipelines.DynamicUnionPipeline(tpot.config.get_search_space("transformers"), max_estimators=4)
dynamic_transformers.generate().export_pipeline()
FeatureUnion(transformer_list=[('fastica', FastICA(n_components=4))])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
FeatureUnion(transformer_list=[('fastica', FastICA(n_components=4))])
FastICA(n_components=4)
一个好的策略是将其与特征联合中的 Passthrough 配对,以便您输出所有转换以及原始数据。
dynamic_transformers_with_passthrough = tpot.search_spaces.pipelines.UnionPipeline([
dynamic_transformers,
tpot.config.get_search_space("Passthrough")],
)
dynamic_transformers_with_passthrough.generate().export_pipeline()
FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('pca', PCA(n_components=0.9386236966835)), ('zerocount', ZeroCount()), ('featureagglomeration', FeatureAgglomeration(n_clusters=94, pooling_func=<function max at 0x1048f3470>))])), ('passthrough', Passthrough())])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('pca', PCA(n_components=0.9386236966835)), ('zerocount', ZeroCount()), ('featureagglomeration', FeatureAgglomeration(n_clusters=94, pooling_func=<function max at 0x1048f3470>))])), ('passthrough', Passthrough())])
PCA(n_components=0.9386236966835)
ZeroCount()
FeatureAgglomeration(n_clusters=94, pooling_func=<function max at 0x1048f3470>)
Passthrough()
stc_pipeline3 = tpot.search_spaces.pipelines.SequentialPipeline([
tpot.config.get_search_space("selectors"),
dynamic_transformers_with_passthrough,
tpot.config.get_search_space("classifiers"),
])
stc_pipeline3.generate().export_pipeline()
Pipeline(steps=[('variancethreshold', VarianceThreshold(threshold=0.0003352949622)), ('featureunion', FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('featureagglomeration', FeatureAgglomeration(linkage='complete', metric='cosine', n_clusters=25)), ('columnordinalencoder', ColumnOrdinalEncoder())])), ('passthrough', Passthrough())])), ('mlpclassifier', MLPClassifier(activation='identity', alpha=0.000256185492, early_stopping=True, hidden_layer_sizes=[146, 146, 146], learning_rate='invscaling', learning_rate_init=0.0006442167601, n_iter_no_change=32))])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('variancethreshold', VarianceThreshold(threshold=0.0003352949622)), ('featureunion', FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('featureagglomeration', FeatureAgglomeration(linkage='complete', metric='cosine', n_clusters=25)), ('columnordinalencoder', ColumnOrdinalEncoder())])), ('passthrough', Passthrough())])), ('mlpclassifier', MLPClassifier(activation='identity', alpha=0.000256185492, early_stopping=True, hidden_layer_sizes=[146, 146, 146], learning_rate='invscaling', learning_rate_init=0.0006442167601, n_iter_no_change=32))])
VarianceThreshold(threshold=0.0003352949622)
FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('featureagglomeration', FeatureAgglomeration(linkage='complete', metric='cosine', n_clusters=25)), ('columnordinalencoder', ColumnOrdinalEncoder())])), ('passthrough', Passthrough())])
FeatureAgglomeration(linkage='complete', metric='cosine', n_clusters=25)
ColumnOrdinalEncoder()
Passthrough()
MLPClassifier(activation='identity', alpha=0.000256185492, early_stopping=True, hidden_layer_sizes=[146, 146, 146], learning_rate='invscaling', learning_rate_init=0.0006442167601, n_iter_no_change=32)
WrapperPipeline¶
某些 sklearn 估计器接受其他 sklearn 估计器作为参数。包装器流水线用于同时调优原始估计器的超参数和内部估计器的超参数。实际上,WrapperPipeline 中的内部估计器可以是本教程中描述的任何方法定义的任何搜索空间。
get_search_space
将自动为不需要内部估计器的 sklearn 估计器创建内部搜索空间。例如,“SelectFromModel_classification”将返回以下搜索空间
SelectFromModel_configspace_part = ConfigurationSpace(
space = {
'threshold': Float('threshold', bounds=(1e-4, 1.0), log=True),
}
)
extratrees_estimator_node = tpot.config.get_search_space("ExtraTreesClassifier") #this exports an ExtraTreesClassifier node
extratrees_estimator_node.generate().export_pipeline()
ExtraTreesClassifier(class_weight='balanced', max_features=0.9851993193336, min_samples_leaf=5, min_samples_split=6, n_jobs=1)在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
ExtraTreesClassifier(class_weight='balanced', max_features=0.9851993193336, min_samples_leaf=5, min_samples_split=6, n_jobs=1)
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.feature_selection import SelectFromModel
select_from_model_wrapper_searchspace = tpot.search_spaces.pipelines.WrapperPipeline(
method=SelectFromModel,
space = SelectFromModel_configspace_part,
estimator_search_space= extratrees_estimator_node,
)
select_from_model_wrapper_searchspace.generate().export_pipeline()
SelectFromModel(estimator=ExtraTreesClassifier(max_features=0.277440186742, min_samples_leaf=9, min_samples_split=17, n_jobs=1), threshold=0.0032005860778)在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
SelectFromModel(estimator=ExtraTreesClassifier(max_features=0.277440186742, min_samples_leaf=9, min_samples_split=17, n_jobs=1), threshold=0.0032005860778)
ExtraTreesClassifier(max_features=0.277440186742, min_samples_leaf=9, min_samples_split=17, n_jobs=1)
ExtraTreesClassifier(max_features=0.277440186742, min_samples_leaf=9, min_samples_split=17, n_jobs=1)
包装器流水线策略用于集成/内部分类器和回归器 (EstimatorTransformer)¶
Sklearn Pipelines 只允许将分类器/回归器作为最后一步。所有其他步骤都应实现 transform 函数。我们可以通过将其包装在另一个转换器类中来解决此问题,该类在 transform() 函数内部返回 predict 或 predict_proba 的输出。
要将分类器包装为转换器,可以使用以下类:tpot.builtin_modules.EstimatorTransformer
。您可以使用 method
参数指定是传递 predict、predict_proba 还是 decision function 的输出。
cross_val_predict_cv¶
另一个需要考虑的问题是是否使用 cross_val_predict_cv
。如果设置了此参数,在模型训练期间,任何非最终预测器的分类器或回归器将使用 sklearn.model_selection.cross_val_predict
将样本外预测传递给模型的后续步骤。模型仍将拟合完整数据,这些数据将在训练后用于预测。在样本外预测上训练下游模型通常可以防止过拟合并提高性能。原因在于这让下游模型能够估计上游模型在未见过的数据上的表现。否则,如果上游模型严重过拟合数据,下游模型可能会简单地学会盲目信任看似预测良好的模型,从而将过拟合传播到最终结果。
缺点是 cross_val_predict_cv 计算量更大,并且对于您给定的数据集可能不是必需的。
注意:对于 GraphSearchPipeline
来说,这不是必需的,因为导出的 GraphPipeline 估计器内置支持内部/回归器。您可以选择在初始化 GraphSearchPipeline
对象时设置 cross_val_predict_cv
参数,而不是使用包装器。
classifiers = tpot.config.get_search_space("classifiers")
wrapped_estimators = tpot.search_spaces.pipelines.WrapperPipeline(tpot.builtin_modules.EstimatorTransformer, {}, classifiers)
est = wrapped_estimators.generate().export_pipeline() #returns an estimator with a transform function
est
EstimatorTransformer(estimator=MLPClassifier(alpha=0.000648285661, hidden_layer_sizes=[380], learning_rate='invscaling', learning_rate_init=0.0008851810314, n_iter_no_change=32))在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
EstimatorTransformer(estimator=MLPClassifier(alpha=0.000648285661, hidden_layer_sizes=[380], learning_rate='invscaling', learning_rate_init=0.0008851810314, n_iter_no_change=32))
MLPClassifier(alpha=0.000648285661, hidden_layer_sizes=[380], learning_rate='invscaling', learning_rate_init=0.0008851810314, n_iter_no_change=32)
MLPClassifier(alpha=0.000648285661, hidden_layer_sizes=[380], learning_rate='invscaling', learning_rate_init=0.0008851810314, n_iter_no_change=32)
import numpy as np
X, y = np.random.rand(100, 10), np.random.randint(0, 2, 100)
est.fit_transform(X, y)[0:5]
/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/neural_network/_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. warnings.warn(
array([[0.34363566, 0.65636434], [0.14785295, 0.85214705], [0.45816571, 0.54183429], [0.81083741, 0.18916259], [0.56944478, 0.43055522]])
您可以像设置 EstimatorNode 一样手动设置估计器的参数。以下是另一个使用 cross_val_predict 和 method 的示例。
classifiers = tpot.config.get_search_space("classifiers")
wrapped_estimators_cv = tpot.search_spaces.pipelines.WrapperPipeline(tpot.builtin_modules.EstimatorTransformer, {'cross_val_predict_cv':10, 'method':'predict'}, classifiers)
est = wrapped_estimators_cv.generate().export_pipeline() #returns an estimator with a transform function
est.fit_transform(X, y)[0:5]
array([[0], [0], [0], [0], [0]])
这些现在可以在线性流水线中使用。这与默认的线性流水线搜索空间非常相似。
dynamic_wrapped_classifiers_with_passthrough = tpot.search_spaces.pipelines.UnionPipeline([
tpot.search_spaces.pipelines.DynamicUnionPipeline(wrapped_estimators_cv, max_estimators=4),
tpot.config.get_search_space("Passthrough")
])
stc_pipeline4 = tpot.search_spaces.pipelines.SequentialPipeline([
tpot.config.get_search_space("scalers"),
dynamic_transformers_with_passthrough,
dynamic_wrapped_classifiers_with_passthrough,
tpot.config.get_search_space("classifiers"),
])
stc_pipeline4.generate().export_pipeline()
Pipeline(steps=[('robustscaler', RobustScaler(quantile_range=(0.2632669052042, 0.892009308738))), ('featureunion-1', FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('columnonehotencoder', ColumnOneHotEncoder()), ('kbinsdiscretizer', KBinsDiscretizer(encode='onehot-dense', n_bins=58, strategy='kmeans'))])), ('passthrough', Passth... estimator=LogisticRegression(C=334.8557628287718, max_iter=1000, n_jobs=1, solver='saga'), method='predict')), ('estimatortransformer-2', EstimatorTransformer(cross_val_predict_cv=10, estimator=QuadraticDiscriminantAnalysis(reg_param=0.0011738914966), method='predict'))])), ('passthrough', Passthrough())])), ('lineardiscriminantanalysis', LinearDiscriminantAnalysis())])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('robustscaler', RobustScaler(quantile_range=(0.2632669052042, 0.892009308738))), ('featureunion-1', FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('columnonehotencoder', ColumnOneHotEncoder()), ('kbinsdiscretizer', KBinsDiscretizer(encode='onehot-dense', n_bins=58, strategy='kmeans'))])), ('passthrough', Passth... estimator=LogisticRegression(C=334.8557628287718, max_iter=1000, n_jobs=1, solver='saga'), method='predict')), ('estimatortransformer-2', EstimatorTransformer(cross_val_predict_cv=10, estimator=QuadraticDiscriminantAnalysis(reg_param=0.0011738914966), method='predict'))])), ('passthrough', Passthrough())])), ('lineardiscriminantanalysis', LinearDiscriminantAnalysis())])
RobustScaler(quantile_range=(0.2632669052042, 0.892009308738))
FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('columnonehotencoder', ColumnOneHotEncoder()), ('kbinsdiscretizer', KBinsDiscretizer(encode='onehot-dense', n_bins=58, strategy='kmeans'))])), ('passthrough', Passthrough())])
ColumnOneHotEncoder()
KBinsDiscretizer(encode='onehot-dense', n_bins=58, strategy='kmeans')
Passthrough()
FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('estimatortransformer-1', EstimatorTransformer(cross_val_predict_cv=10, estimator=LogisticRegression(C=334.8557628287718, max_iter=1000, n_jobs=1, solver='saga'), method='predict')), ('estimatortransformer-2', EstimatorTransformer(cross_val_predict_cv=10, estimator=QuadraticDiscriminantAnalysis(reg_param=0.0011738914966), method='predict'))])), ('passthrough', Passthrough())])
LogisticRegression(C=334.8557628287718, max_iter=1000, n_jobs=1, solver='saga')
LogisticRegression(C=334.8557628287718, max_iter=1000, n_jobs=1, solver='saga')
QuadraticDiscriminantAnalysis(reg_param=0.0011738914966)
QuadraticDiscriminantAnalysis(reg_param=0.0011738914966)
Passthrough()
LinearDiscriminantAnalysis()
GraphSearchPipeline¶
GraphSearchPipeline 是一个灵活的搜索空间,没有先验的流水线结构限制。使用 GraphSearchPipeline,TPOT 将创建一个有向无环图形状的流水线。在整个优化过程中,TPOT 可以添加/删除节点、添加/删除边,并为每个节点执行模型选择和超参数调优。
graph_search_space 的主要参数是 root_search_space、inner_search_space 和 leaf_search_space。
参数 | 类型 | 描述 |
---|---|---|
root_search_space | SklearnIndividualGenerator | 图的根节点的搜索空间。此节点将是流水线中的最终估计器。 |
inner_search_space | SklearnIndividualGenerator,可选 | 图的内部节点的搜索空间。如果未定义,则没有内部节点。 |
leaf_search_space | SklearnIndividualGenerator,可选 | 图的叶节点的搜索空间。如果未定义,叶节点将从 inner_search_space 中提取。 |
crossover_same_depth | 布尔值,可选 | 如果为 True,交叉仅发生在图的同一深度的节点之间。如果为 False,交叉发生在任何深度的节点之间。 |
cross_val_predict_cv | 整数,交叉验证生成器或可迭代对象,可选 | 确定内部分类器或回归器中使用的交叉验证分割策略。 |
method | 字符串,可选 | 内部分类器或回归器使用的预测方法。如果为“auto”,将按 predict_proba、decision_function 或 predict 的顺序尝试使用。 |
此搜索空间导出一个 tpot.GraphPipeline
。这类似于 scikit-learn Pipeline,但用于有向无环图流水线。您可以在教程 6 中了解更多关于使用此模块的信息。
graph_search_space = tpot.search_spaces.pipelines.GraphSearchPipeline(
root_search_space= tpot.config.get_search_space(["KNeighborsClassifier", "LogisticRegression", "DecisionTreeClassifier"]),
leaf_search_space = tpot.config.get_search_space("selectors"),
inner_search_space = tpot.config.get_search_space(["transformers"]),
max_size = 10,
)
ind = graph_search_space.generate()
est1 = ind.export_pipeline()
est1.plot() #GraphPipelines have a helpful plotting function to visualize the pipeline
让我们再添加几次变异并绘制最终流水线,以了解使用此搜索空间可以生成的流水线的多样性。
for i in range(0,50):
ind.mutate()
if i%5==0:
est = ind.export_pipeline()
est.plot()
TreePipeline¶
TreePipelines 的工作方式与 GraphPipelines 相同,但它们仅限于树结构。这与原始 TPOT 中的搜索空间类似。
(此搜索空间仍在实验中,目前基于 GraphSearchPipeline 构建。未来可能会使用自己的代码重写。)
tree_search_space = tpot.search_spaces.pipelines.TreePipeline(
root_search_space= tpot.config.get_search_space(["KNeighborsClassifier", "LogisticRegression", "DecisionTreeClassifier"]),
leaf_search_space = tpot.config.get_search_space("selectors"),
inner_search_space = tpot.config.get_search_space(["transformers"]),
max_size = 10,
)
ind = graph_search_space.generate()
exp = ind.export_pipeline()
exp.plot()
技巧与窍门¶
- 与搜索空间一起使用的两个非常有用的转换器是
tpot.buildin_models.Passthrough
和tpot.builtin_models.SkipTransformer
。Passthrough 会简单地将其接收到的精确输入传递到下一步。这在 UnionSearchSpace 中特别有用,因为它允许将转换后的数据以及原始数据传递到下一步。SkipTransformer 始终返回空值。这在与 Passthrough 和可选的第二个方法联合使用时非常有用。例如,如果您不确定是否需要转换器,可以将 SkipTransformer 作为一种选项,如果选中它,将跳过转换步骤。
在此示例中,FeatureUnion 层将始终至少选择一个转换器,并且始终包含一个直通。
from tpot.search_spaces.pipelines import *
from tpot.config import get_search_space
#This FeatureUnion layer will always have at least one transformer selected and will always have one passthrough
transformers_with_passthrough = UnionPipeline([
DynamicUnionPipeline(get_search_space(["transformers"])),
get_search_space("Passthrough")
]
)
transformers_with_passthrough.generate().export_pipeline()
FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('kbinsdiscretizer', KBinsDiscretizer(encode='onehot-dense', n_bins=80, strategy='kmeans')), ('fastica', FastICA(algorithm='deflation', n_components=91))])), ('passthrough', Passthrough())])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('kbinsdiscretizer', KBinsDiscretizer(encode='onehot-dense', n_bins=80, strategy='kmeans')), ('fastica', FastICA(algorithm='deflation', n_components=91))])), ('passthrough', Passthrough())])
KBinsDiscretizer(encode='onehot-dense', n_bins=80, strategy='kmeans')
FastICA(algorithm='deflation', n_components=91)
Passthrough()
在此示例中,FeatureUnion 层将始终包含一个直通。此外,它可能会选择一个或多个转换器,但它也可能完全跳过转换器,只包含一个直通。
final_transformers_layer =UnionPipeline([
ChoicePipeline([
DynamicUnionPipeline(get_search_space(["transformers"])),
get_search_space("SkipTransformer"),
]),
get_search_space("Passthrough")
]
)
final_transformers_layer.generate().export_pipeline()
FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()), ('passthrough', Passthrough())])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()), ('passthrough', Passthrough())])
SkipTransformer()
Passthrough()
inner_estimators_layer = UnionPipeline([
ChoicePipeline([
DynamicUnionPipeline(wrapped_estimators, max_estimators=4),
get_search_space("SkipTransformer"),
]),
get_search_space("Passthrough")]
)
inner_estimators_layer.generate().export_pipeline()
FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()), ('passthrough', Passthrough())])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()), ('passthrough', Passthrough())])
SkipTransformer()
Passthrough()
final_linear_pipeline = SequentialPipeline([
get_search_space("scalers"),
final_transformers_layer,
inner_estimators_layer,
get_search_space("classifiers"),
])
final_linear_pipeline.generate().export_pipeline()
Pipeline(steps=[('normalizer', Normalizer(norm='l1')), ('featureunion-1', FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()), ('passthrough', Passthrough())])), ('featureunion-2', FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()), ('passthrough', Passthrough())])), ('bernoullinb', BernoulliNB(alpha=5.0573782838899, fit_prior=False))])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('normalizer', Normalizer(norm='l1')), ('featureunion-1', FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()), ('passthrough', Passthrough())])), ('featureunion-2', FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()), ('passthrough', Passthrough())])), ('bernoullinb', BernoulliNB(alpha=5.0573782838899, fit_prior=False))])
Normalizer(norm='l1')
FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()), ('passthrough', Passthrough())])
SkipTransformer()
Passthrough()
FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()), ('passthrough', Passthrough())])
SkipTransformer()
Passthrough()
BernoulliNB(alpha=5.0573782838899, fit_prior=False)
模板搜索空间¶
如教程 1 中所述,TPOT 有几个内置搜索空间。下表是相同的表格。
字符串 | 描述 |
---|---|
linear | 一个线性流水线,其结构为“选择器->(转换器+直通)->(分类器/回归器+直通)->最终分类器/回归器”。对于转换器层和内部估计器层,TPOT 可以选择一个或多个转换器/分类器,也可以不选择。内部分类器/回归器层是可选的。 |
linear-light | 与 linear 搜索空间相同,但没有内部分类器/回归器层,并减少了一组运行更快的估计器。 |
graph | TPOT 将优化一个有向无环图形状的流水线。图的节点可以包括选择器、缩放器、转换器或分类器/回归器(内部分类器/回归器可以选择不包含)。这将返回一个自定义的 GraphPipeline,而不是 sklearn Pipeline。更多详细信息请参见教程 6。 |
graph-light | 与 graph 搜索空间相同,但没有内部分类器/回归器,并减少了一组运行更快的估计器。 |
mdr | TPOT 将在一系列特征选择器和多因素维度缩减模型中进行搜索,以找到最大化预测准确性的一系列算子。TPOT MDR 配置专门用于全基因组关联研究 (GWAS),详细信息请见此处在线文档。 |
您无需创建自己的搜索空间,只需将字符串传递给 search_space
参数即可。或者,您可以直接访问 tpot.config.template_search_spaces.get_template_search_spaces
,它为每个模板提供了更多可自定义的选项,包括 cross_val_predict_cv
以及是否允许堆叠分类器/回归器。或者您可以复制代码并手动自定义!
`tpot.config.template_search_spaces.get_template_search_spaces`
Returns a search space which can be optimized by TPOT.
Parameters
----------
search_space: str or SearchSpace
The default search space to use. If a string, it should be one of the following:
- 'linear': A search space for linear pipelines
- 'linear-light': A search space for linear pipelines with a smaller, faster search space
- 'graph': A search space for graph pipelines
- 'graph-light': A search space for graph pipelines with a smaller, faster search space
- 'mdr': A search space for MDR pipelines
If a SearchSpace object, it should be a valid search space object for TPOT.
classification: bool, default=True
Whether the problem is a classification problem or a regression problem.
inner_predictors: bool, default=None
Whether to include additional classifiers/regressors before the final classifier/regressor (allowing for ensembles).
Defaults to False for 'linear-light' and 'graph-light' search spaces, and True otherwise. (Not used for 'mdr' search space)
cross_val_predict_cv: int, default=None
The number of folds to use for cross_val_predict.
Defaults to 0 for 'linear-light' and 'graph-light' search spaces, and 5 otherwise. (Not used for 'mdr' search space)
get_search_space_params: dict
Additional parameters to pass to the get_search_space function.
linear_search_space = tpot.config.template_search_spaces.get_template_search_spaces("linear", inner_predictors=True, cross_val_predict_cv=5)
linear_search_space.generate().export_pipeline()
Pipeline(steps=[('standardscaler', StandardScaler()), ('rfe', RFE(estimator=ExtraTreesClassifier(max_features=0.8009842720563, min_samples_leaf=4, min_samples_split=9, n_jobs=1), step=0.4315847507401)), ('featureunion-1', FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()), ('passthrough', Passthrough())])), ('featureunion-2', FeatureUnion(tran... max_features='sqrt', min_samples_leaf=17, min_samples_split=8))), ('estimatortransformer-2', EstimatorTransformer(cross_val_predict_cv=5, estimator=LGBMClassifier(max_depth=3, n_estimators=84, n_jobs=1, num_leaves=244, verbose=-1)))])), ('passthrough', Passthrough())])), ('lineardiscriminantanalysis', LinearDiscriminantAnalysis(shrinkage=0.369619691802, solver='eigen'))])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('standardscaler', StandardScaler()), ('rfe', RFE(estimator=ExtraTreesClassifier(max_features=0.8009842720563, min_samples_leaf=4, min_samples_split=9, n_jobs=1), step=0.4315847507401)), ('featureunion-1', FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()), ('passthrough', Passthrough())])), ('featureunion-2', FeatureUnion(tran... max_features='sqrt', min_samples_leaf=17, min_samples_split=8))), ('estimatortransformer-2', EstimatorTransformer(cross_val_predict_cv=5, estimator=LGBMClassifier(max_depth=3, n_estimators=84, n_jobs=1, num_leaves=244, verbose=-1)))])), ('passthrough', Passthrough())])), ('lineardiscriminantanalysis', LinearDiscriminantAnalysis(shrinkage=0.369619691802, solver='eigen'))])
StandardScaler()
RFE(estimator=ExtraTreesClassifier(max_features=0.8009842720563, min_samples_leaf=4, min_samples_split=9, n_jobs=1), step=0.4315847507401)
ExtraTreesClassifier(max_features=0.8009842720563, min_samples_leaf=4, min_samples_split=9, n_jobs=1)
ExtraTreesClassifier(max_features=0.8009842720563, min_samples_leaf=4, min_samples_split=9, n_jobs=1)
FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()), ('passthrough', Passthrough())])
SkipTransformer()
Passthrough()
FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('estimatortransformer-1', EstimatorTransformer(cross_val_predict_cv=5, estimator=DecisionTreeClassifier(criterion='entropy', max_depth=1, max_features='sqrt', min_samples_leaf=17, min_samples_split=8))), ('estimatortransformer-2', EstimatorTransformer(cross_val_predict_cv=5, estimator=LGBMClassifier(max_depth=3, n_estimators=84, n_jobs=1, num_leaves=244, verbose=-1)))])), ('passthrough', Passthrough())])
DecisionTreeClassifier(criterion='entropy', max_depth=1, max_features='sqrt', min_samples_leaf=17, min_samples_split=8)
DecisionTreeClassifier(criterion='entropy', max_depth=1, max_features='sqrt', min_samples_leaf=17, min_samples_split=8)
LGBMClassifier(max_depth=3, n_estimators=84, n_jobs=1, num_leaves=244, verbose=-1)
LGBMClassifier(max_depth=3, n_estimators=84, n_jobs=1, num_leaves=244, verbose=-1)
Passthrough()
LinearDiscriminantAnalysis(shrinkage=0.369619691802, solver='eigen')
linear_search_space = tpot.config.template_search_spaces.get_template_search_spaces("linear", inner_predictors=True, cross_val_predict_cv=5)
linear_est = tpot.TPOTEstimator(
search_space = linear_search_space,
scorers=['roc_auc_ovr',tpot.objectives.complexity_scorer],
scorers_weights=[1,-1],
classification=True,
verbose=1,
)
#alternatively, you can use the template search space to generate a pipeline
linear_est = tpot.TPOTEstimator(
search_space = "linear",
scorers=['roc_auc_ovr',tpot.objectives.complexity_scorer],
scorers_weights=[1,-1],
n_jobs=32,
classification=True,
verbose=1,
)
使用 TPOTEstimator 优化搜索空间¶
构建好搜索空间后,您可以使用 TPOTEstimator 在该空间内优化流水线。只需将该搜索空间传递给 search_space
参数即可。以下单元格您可以选择本教程中创建的不同搜索空间。
all_search_spaces ={
"classifiers_only" : classifier_choice,
"stc_pipeline" : stc_pipeline,
"stc_pipeline2": stc_pipeline2,
"stc_pipeline3": stc_pipeline3,
"stc_pipeline4": stc_pipeline4,
"final_linear_pipeline": final_linear_pipeline,
"graph_pipeline": graph_search_space,
}
X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.5)
selected_search_space = all_search_spaces["stc_pipeline"] #change this to select a different search space
est = tpot.TPOTEstimator(
scorers=["roc_auc_ovr", tpot.objectives.complexity_scorer],
scorers_weights=[1.0, -1.0],
classification = True,
cv = 5,
search_space = selected_search_space,
max_time_mins=10,
max_eval_time_mins = 10,
early_stop = 2,
verbose = 2,
n_jobs=4,
)
est.fit(X_train, y_train)
Generation: : 5it [00:58, 11.77s/it]
TPOTEstimator(classification=True, cv=5, early_stop=2, max_time_mins=10, n_jobs=4, scorers=['roc_auc_ovr', <function complexity_scorer at 0x32f4e0550>], scorers_weights=[1.0, -1.0], search_space=<tpot.search_spaces.pipelines.sequential.SequentialPipeline object at 0x32f7692d0>, verbose=2)在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
TPOTEstimator(classification=True, cv=5, early_stop=2, max_time_mins=10, n_jobs=4, scorers=['roc_auc_ovr', <function complexity_scorer at 0x32f4e0550>], scorers_weights=[1.0, -1.0], search_space=<tpot.search_spaces.pipelines.sequential.SequentialPipeline object at 0x32f7692d0>, verbose=2)
# score the model
auroc_scorer = sklearn.metrics.get_scorer("roc_auc")
auroc_score = auroc_scorer(est, X_test, y_test)
print("auroc score", auroc_score)
auroc score 0.9947351959966638
#plot the best pipeline
if isinstance(est.fitted_pipeline_, tpot.GraphPipeline):
est.fitted_pipeline_.plot()
est.fitted_pipeline_
Pipeline(steps=[('selectfwe', SelectFwe(alpha=0.0001569023321)), ('powertransformer', PowerTransformer()), ('mlpclassifier', MLPClassifier(activation='identity', alpha=0.0008696190619, hidden_layer_sizes=[203, 203], learning_rate_init=0.0135276110446, n_iter_no_change=32))])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('selectfwe', SelectFwe(alpha=0.0001569023321)), ('powertransformer', PowerTransformer()), ('mlpclassifier', MLPClassifier(activation='identity', alpha=0.0008696190619, hidden_layer_sizes=[203, 203], learning_rate_init=0.0135276110446, n_iter_no_change=32))])
SelectFwe(alpha=0.0001569023321)
PowerTransformer()
MLPClassifier(activation='identity', alpha=0.0008696190619, hidden_layer_sizes=[203, 203], learning_rate_init=0.0135276110446, n_iter_no_change=32)
仅包含转换器的流水线 - 填充优化示例¶
流水线不一定需要以分类器或回归器结束。只要您有相应的自定义目标函数,仅包含转换器的流水线也是可能的。
import sklearn
import sklearn.datasets
import numpy as np
import tpot
#in practice, cross validation is likely better, but this simple example is fine for demonstration purposes
def rmse_obective(est, X, missing_add=.2, rng=1, fitted=False):
rng = np.random.default_rng(rng)
X_missing = X.copy()
missing_idx = rng.random(X.shape) < missing_add
X_missing[missing_idx] = np.nan
if not fitted:
est.fit(X_missing)
X_filled = est.transform(X_missing)
return np.sqrt(np.mean((X_filled[missing_idx] - X[missing_idx])**2))
from sklearn.impute import SimpleImputer
X, y = sklearn.datasets.load_diabetes(return_X_y=True)
imp = SimpleImputer(strategy="mean")
rmse_obective(imp, X)
0.04690299241236334
import tpot.search_spaces
from ConfigSpace import ConfigurationSpace, Integer, Float, Categorical, Normal
#set up an imputation search space that includes simple imputer, knn imputer, and iterative imputer (with an optimized ExtraTreesRegressor)
simple_imputer = tpot.config.get_search_space("SimpleImputer")
knn_imputer = tpot.config.get_search_space("KNNImputer")
space = ConfigurationSpace({ 'initial_strategy' : Categorical('initial_strategy',
['mean', 'median',
'most_frequent', 'constant']),
'n_nearest_features' : Integer('n_nearest_features',
bounds=(1, X.shape[1])),
'imputation_order' : Categorical('imputation_order',
['ascending', 'descending',
'roman', 'arabic', 'random']),
})
# This optimizes both the iterative imputer parameters and the ExtraTreesRegressor parameters
iterative_imputer_sp = tpot.search_spaces.pipelines.WrapperPipeline(
method = sklearn.impute.IterativeImputer,
space = space,
estimator_search_space = tpot.config.get_search_space("ExtraTreesRegressor"),
)
#this is equivalent to
# iterative_imputer_sp = tpot.config.get_search_space("IterativeImputer_learned_estimators")
imputation_search_space = tpot.search_spaces.pipelines.ChoicePipeline(
search_spaces = [simple_imputer, knn_imputer, iterative_imputer_sp],
)
imputation_search_space.generate().export_pipeline()
SimpleImputer()在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
SimpleImputer()
from functools import partial
final_objective = partial(rmse_obective, X=X, missing_add=.2)
est = tpot.TPOTEstimator(
scorers = [],
scorers_weights = [],
other_objective_functions = [final_objective],
other_objective_functions_weights = [-1],
objective_function_names = ["rmse"],
classification = True,
search_space = imputation_search_space,
max_time_mins=10,
max_eval_time_mins = 60*5,
verbose = 3,
early_stop = 2,
n_jobs=20,
)
est.fit(X, y=y)
/Users/ketrong/Desktop/tpotvalidation/tpot/tpot/tpot_estimator/estimator.py:535: UserWarning: Labels are not encoded as ints from 0 to N. For compatibility with some classifiers such as sklearn, TPOT has encoded y with the sklearn LabelEncoder. When using pipelines outside the main TPOT estimator class, you can encode the labels with est.label_encoder_ warnings.warn("Labels are not encoded as ints from 0 to N. For compatibility with some classifiers such as sklearn, TPOT has encoded y with the sklearn LabelEncoder. When using pipelines outside the main TPOT estimator class, you can encode the labels with est.label_encoder_") Generation: : 1it [00:19, 19.42s/it]
Generation: 1 Best rmse score: 0.03494378757292814
Generation: : 2it [00:35, 17.45s/it]
Generation: 2 Best rmse score: 0.03494378757292814
Generation: : 3it [00:51, 16.76s/it]
Generation: 3 Best rmse score: 0.034787576318641794
Generation: : 3it [01:10, 23.47s/it]
Generation: 4 Best rmse score: 0.034283600126080886 Early stop
TPOTEstimator(classification=True, early_stop=2, max_eval_time_mins=300, max_time_mins=10, n_jobs=20, objective_function_names=['rmse'], other_objective_functions=[functools.partial(<function rmse_obective at 0x33edfd480>, X=array([[ 0.03807591, 0.05068012, 0.06169621, ..., -0.00259226, 0.01990749, -0.01764613], [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338, -... -0.04688253, 0.01549073], [-0.04547248, -0.04464164, 0.03906215, ..., 0.02655962, 0.04452873, -0.02593034], [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338, -0.00422151, 0.00306441]]), missing_add=0.2)], other_objective_functions_weights=[-1], scorers=[], scorers_weights=[], search_space=<tpot.search_spaces.pipelines.choice.ChoicePipeline object at 0x36ff3e770>, verbose=3)在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
TPOTEstimator(classification=True, early_stop=2, max_eval_time_mins=300, max_time_mins=10, n_jobs=20, objective_function_names=['rmse'], other_objective_functions=[functools.partial(<function rmse_obective at 0x33edfd480>, X=array([[ 0.03807591, 0.05068012, 0.06169621, ..., -0.00259226, 0.01990749, -0.01764613], [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338, -... -0.04688253, 0.01549073], [-0.04547248, -0.04464164, 0.03906215, ..., 0.02655962, 0.04452873, -0.02593034], [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338, -0.00422151, 0.00306441]]), missing_add=0.2)], other_objective_functions_weights=[-1], scorers=[], scorers_weights=[], search_space=<tpot.search_spaces.pipelines.choice.ChoicePipeline object at 0x36ff3e770>, verbose=3)
# score the model
rmse_score = final_objective(est, fitted=True)
print("final rmse score", rmse_score)
final rmse score 0.02796745384428642
est.fitted_pipeline_
IterativeImputer(estimator=ExtraTreesRegressor(criterion='friedman_mse', max_features=0.6404215718013, min_samples_leaf=2, min_samples_split=10, n_jobs=1), imputation_order='arabic', n_nearest_features=9)在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
IterativeImputer(estimator=ExtraTreesRegressor(criterion='friedman_mse', max_features=0.6404215718013, min_samples_leaf=2, min_samples_split=10, n_jobs=1), imputation_order='arabic', n_nearest_features=9)
ExtraTreesRegressor(criterion='friedman_mse', max_features=0.6404215718013, min_samples_leaf=2, min_samples_split=10, n_jobs=1)
ExtraTreesRegressor(criterion='friedman_mse', max_features=0.6404215718013, min_samples_leaf=2, min_samples_split=10, n_jobs=1)
组合搜索空间示例¶
from tpot.search_spaces.pipelines import *
from tpot.config import get_search_space
selectors = get_search_space(["selectors_classification", "Passthrough"])
estimators = get_search_space(["classifiers"])
# this allows us to wrap the classifiers in the EstimatorTransformer
# this is necessary so that classifiers can be used inside of sklearn pipelines
wrapped_estimators = WrapperPipeline(tpot.builtin_modules.EstimatorTransformer, {}, estimators)
scalers = get_search_space(["scalers","Passthrough"])
transformers_layer =UnionPipeline([
ChoicePipeline([
DynamicUnionPipeline(get_search_space(["transformers"])),
get_search_space("SkipTransformer"),
]),
get_search_space("Passthrough")
]
)
inner_estimators_layer = UnionPipeline([
ChoicePipeline([
DynamicUnionPipeline(wrapped_estimators),
get_search_space("SkipTransformer"),
]),
get_search_space("Passthrough")]
)
search_space = SequentialPipeline(search_spaces=[
scalers,
selectors,
transformers_layer,
inner_estimators_layer,
estimators,
])
est = tpot.TPOTEstimator(
scorers = ["roc_auc"],
scorers_weights = [1],
classification = True,
cv = 5,
search_space = search_space,
max_time_mins=10,
max_eval_time_mins = 60*5,
verbose = 2,
n_jobs=20,
)
est.fit(X_train, y_train)
Generation: : 25it [10:00, 24.01s/it]
TPOTEstimator(classification=True, cv=5, max_eval_time_mins=300, max_time_mins=10, n_jobs=20, scorers=['roc_auc'], scorers_weights=[1], search_space=<tpot.search_spaces.pipelines.sequential.SequentialPipeline object at 0x35798c880>, verbose=2)在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
TPOTEstimator(classification=True, cv=5, max_eval_time_mins=300, max_time_mins=10, n_jobs=20, scorers=['roc_auc'], scorers_weights=[1], search_space=<tpot.search_spaces.pipelines.sequential.SequentialPipeline object at 0x35798c880>, verbose=2)
est.fitted_pipeline_
Pipeline(steps=[('maxabsscaler', MaxAbsScaler()), ('selectfwe', SelectFwe(alpha=0.0004883916878)), ('featureunion-1', FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('powertransformer', PowerTransformer())])), ('passthrough', Passthrough())])), ('featureunion-2', FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('estimatortransformer', EstimatorTransformer(estimator=LinearDiscriminantAnalysis(shrinkage=0.5801392483719, solver='lsqr')))])), ('passthrough', Passthrough())])), ('mlpclassifier', MLPClassifier(activation='identity', alpha=0.0310773820788, hidden_layer_sizes=[54, 54, 54], learning_rate_init=0.0017701050157, n_iter_no_change=32))])在 Jupyter 环境中,请重新运行此单元格以显示 HTML 表示或信任该笔记本。
在 GitHub 上,HTML 表示无法渲染,请尝试使用 nbviewer.org 加载此页面。
Pipeline(steps=[('maxabsscaler', MaxAbsScaler()), ('selectfwe', SelectFwe(alpha=0.0004883916878)), ('featureunion-1', FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('powertransformer', PowerTransformer())])), ('passthrough', Passthrough())])), ('featureunion-2', FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('estimatortransformer', EstimatorTransformer(estimator=LinearDiscriminantAnalysis(shrinkage=0.5801392483719, solver='lsqr')))])), ('passthrough', Passthrough())])), ('mlpclassifier', MLPClassifier(activation='identity', alpha=0.0310773820788, hidden_layer_sizes=[54, 54, 54], learning_rate_init=0.0017701050157, n_iter_no_change=32))])
MaxAbsScaler()
SelectFwe(alpha=0.0004883916878)
FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('powertransformer', PowerTransformer())])), ('passthrough', Passthrough())])
PowerTransformer()
Passthrough()
FeatureUnion(transformer_list=[('featureunion', FeatureUnion(transformer_list=[('estimatortransformer', EstimatorTransformer(estimator=LinearDiscriminantAnalysis(shrinkage=0.5801392483719, solver='lsqr')))])), ('passthrough', Passthrough())])
LinearDiscriminantAnalysis(shrinkage=0.5801392483719, solver='lsqr')
LinearDiscriminantAnalysis(shrinkage=0.5801392483719, solver='lsqr')
Passthrough()
MLPClassifier(activation='identity', alpha=0.0310773820788, hidden_layer_sizes=[54, 54, 54], learning_rate_init=0.0017701050157, n_iter_no_change=32)