跳到内容

获取配置空间

此文件是 TPOT 库的一部分。

当前版本的 TPOT 在 Cedars-Sinai 开发,开发者包括: - Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/) - Anil Saini (anil.saini@cshs.org) - Jose Hernandez (jgh9094@gmail.com) - Jay Moran (jay.moran@cshs.org) - Nicholas Matsumoto (nicholas.matsumoto@cshs.org) - Hyunjun Choi (hyunjun.choi@cshs.org) - Gabriel Ketron (gabriel.ketron@cshs.org) - Miguel E. Hernandez (miguel.e.hernandez@cshs.org) - Jason Moore (moorejh28@gmail.com)

TPOT 的原始版本主要由宾夕法尼亚大学开发,开发者包括: - Randal S. Olson (rso@randalolson.com) - Weixuan Fu (weixuanf@upenn.edu) - Daniel Angell (dpa34@drexel.edu) - Jason Moore (moorejh28@gmail.com) - 以及许多慷慨的开源贡献者

TPOT 是免费软件:您可以根据自由软件基金会发布的 GNU 宽松通用公共许可证(GNU Lesser General Public License)的条款重新分发和/或修改它,无论是该许可证的第 3 版,还是(由您选择)任何后续版本。

分发 TPOT 是希望它会有用,但 不提供任何担保;甚至不包括适销性或特定用途适用性的默示担保。详情请参阅 GNU 宽松通用公共许可证。

您应该随 TPOT 一起收到一份 GNU 宽松通用公共许可证的副本。如果未收到,请参阅 https://gnu.ac.cn/licenses/

get_configspace(name, n_classes=3, n_samples=1000, n_features=100, random_state=None, n_jobs=1)

此函数返回给定 scikit-learn 方法的 ConfigSpace.ConfigurationSpace,其中包含超参数范围。它还使用 n_classes、n_samples、n_features 和 random_state 来设置依赖于这些值的超参数。

参数

名称 类型 描述 默认值
name str

要为其创建 ConfigurationSpace 的 scikit-learn 方法的字符串名称。(例如,sklearn.ensemble.RandomForestClassifier 的 'RandomForestClassifier')

必需
n_classes int

目标变量中的类别数。默认值为 3。

3
n_samples int

数据集中的样本数。默认值为 1000。

1000
n_features int

数据集中的特征数。默认值为 100。

100
random_state int

在 ConfigurationSpace 中使用的 random_state。默认值为 None。如果为 None,则 ConfigurationSpace 中不包含 random_state 超参数。如果您想确保可重现性,请使用此参数为单个方法设置随机状态。

None
n_jobs int(默认值=1)

为具有 n_jobs 参数的评估器设置该参数。默认值为 1。

1
源代码位于 tpot/config/get_configspace.py
def get_configspace(name, n_classes=3, n_samples=1000, n_features=100, random_state=None, n_jobs=1):
    """
    This function returns the ConfigSpace.ConfigurationSpace with the hyperparameter ranges for the given
    scikit-learn method. It also uses the n_classes, n_samples, n_features, and random_state to set the
    hyperparameters that depend on these values.

    Parameters
    ----------
    name : str
        The str name of the scikit-learn method for which to create the ConfigurationSpace. (e.g. 'RandomForestClassifier' for sklearn.ensemble.RandomForestClassifier)
    n_classes : int
        The number of classes in the target variable. Default is 3.
    n_samples : int
        The number of samples in the dataset. Default is 1000.
    n_features : int
        The number of features in the dataset. Default is 100.
    random_state : int
        The random_state to use in the ConfigurationSpace. Default is None.
        If None, the random_state hyperparameter is not included in the ConfigurationSpace.
        Use this to set the random state for the individual methods if you want to ensure reproducibility.
    n_jobs : int (default=1)
        Sets the n_jobs parameter for estimators that have it. Default is 1.

    """
    match name:

        #autoqtl_builtins.py
        case "FeatureEncodingFrequencySelector":
            return autoqtl_builtins.FeatureEncodingFrequencySelector_ConfigurationSpace
        case "DominantEncoder":
            return {}
        case "RecessiveEncoder":
            return {}
        case "HeterosisEncoder":
            return {}
        case "UnderDominanceEncoder":
            return {}
        case "OverDominanceEncoder":
            return {}

        case "Passthrough":
            return {}
        case "SkipTransformer":
            return {}

        #classifiers.py
        case "LinearDiscriminantAnalysis":
            return classifiers.get_LinearDiscriminantAnalysis_ConfigurationSpace()
        case "AdaBoostClassifier":
            return classifiers.get_AdaBoostClassifier_ConfigurationSpace(random_state=random_state)
        case "LogisticRegression":
            return classifiers.get_LogisticRegression_ConfigurationSpace(random_state=random_state, n_jobs=n_jobs)
        case "KNeighborsClassifier":
            return classifiers.get_KNeighborsClassifier_ConfigurationSpace(n_samples=n_samples, n_jobs=n_jobs)
        case "DecisionTreeClassifier":
            return classifiers.get_DecisionTreeClassifier_ConfigurationSpace(n_featues=n_features, random_state=random_state)
        case "SVC":
            return classifiers.get_SVC_ConfigurationSpace(random_state=random_state)
        case "LinearSVC":
            return classifiers.get_LinearSVC_ConfigurationSpace(random_state=random_state)
        case "RandomForestClassifier":
            return classifiers.get_RandomForestClassifier_ConfigurationSpace(random_state=random_state, n_jobs=n_jobs)
        case "GradientBoostingClassifier":
            return classifiers.get_GradientBoostingClassifier_ConfigurationSpace(n_classes=n_classes, random_state=random_state)
        case "HistGradientBoostingClassifier":
            return classifiers.get_HistGradientBoostingClassifier_ConfigurationSpace(random_state=random_state)
        case "XGBClassifier":
            return classifiers.get_XGBClassifier_ConfigurationSpace(random_state=random_state, n_jobs=n_jobs)
        case "LGBMClassifier":
            return classifiers.get_LGBMClassifier_ConfigurationSpace(random_state=random_state, n_jobs=n_jobs)
        case "ExtraTreesClassifier":
            return classifiers.get_ExtraTreesClassifier_ConfigurationSpace(random_state=random_state, n_jobs=n_jobs)
        case "SGDClassifier":
            return classifiers.get_SGDClassifier_ConfigurationSpace(random_state=random_state, n_jobs=n_jobs)
        case "MLPClassifier":
            return classifiers.get_MLPClassifier_ConfigurationSpace(random_state=random_state)
        case "BernoulliNB":
            return classifiers.get_BernoulliNB_ConfigurationSpace()
        case "MultinomialNB":
            return classifiers.get_MultinomialNB_ConfigurationSpace()
        case "GaussianNB":
            return {}
        case "LassoLarsCV":
            return {}
        case "ElasticNetCV":
            return regressors.ElasticNetCV_configspace
        case "RidgeCV":
            return {}
        case "PassiveAggressiveClassifier":
            return classifiers.get_PassiveAggressiveClassifier_ConfigurationSpace(random_state=random_state)
        case "QuadraticDiscriminantAnalysis":
            return classifiers.get_QuadraticDiscriminantAnalysis_ConfigurationSpace()
        case "GaussianProcessClassifier":
            return classifiers.get_GaussianProcessClassifier_ConfigurationSpace(n_features=n_features, random_state=random_state)
        case "BaggingClassifier":
            return classifiers.get_BaggingClassifier_ConfigurationSpace(random_state=random_state, n_jobs=n_jobs)

        #regressors.py
        case "RandomForestRegressor":
            return regressors.get_RandomForestRegressor_ConfigurationSpace(random_state=random_state, n_jobs=n_jobs)
        case "SGDRegressor":
            return regressors.get_SGDRegressor_ConfigurationSpace(random_state=random_state)
        case "Ridge":
            return regressors.get_Ridge_ConfigurationSpace(random_state=random_state)
        case "Lasso":
            return regressors.get_Lasso_ConfigurationSpace(random_state=random_state)
        case "ElasticNet":
            return regressors.get_ElasticNet_ConfigurationSpace(random_state=random_state)
        case "Lars":
            return regressors.get_Lars_ConfigurationSpace(random_state=random_state)
        case "OthogonalMatchingPursuit":
            return regressors.get_OthogonalMatchingPursuit_ConfigurationSpace()
        case "BayesianRidge":
            return regressors.get_BayesianRidge_ConfigurationSpace()
        case "LassoLars":
            return regressors.get_LassoLars_ConfigurationSpace(random_state=random_state)
        case "BaggingRegressor":
            return regressors.get_BaggingRegressor_ConfigurationSpace(random_state=random_state)
        case "ARDRegression":
            return regressors.get_ARDRegression_ConfigurationSpace()
        case "TheilSenRegressor":
            return regressors.get_TheilSenRegressor_ConfigurationSpace(random_state=random_state)
        case "Perceptron":
            return regressors.get_Perceptron_ConfigurationSpace(random_state=random_state)
        case "DecisionTreeRegressor":
            return regressors.get_DecisionTreeRegressor_ConfigurationSpace(random_state=random_state)
        case "LinearSVR":
            return regressors.get_LinearSVR_ConfigurationSpace(random_state=random_state)
        case "SVR":
            return regressors.get_SVR_ConfigurationSpace()
        case "XGBRegressor":
            return regressors.get_XGBRegressor_ConfigurationSpace(random_state=random_state, n_jobs=n_jobs)
        case "AdaBoostRegressor":
            return regressors.get_AdaBoostRegressor_ConfigurationSpace(random_state=random_state)
        case "ExtraTreesRegressor":
            return regressors.get_ExtraTreesRegressor_ConfigurationSpace(random_state=random_state, n_jobs=n_jobs)
        case "GradientBoostingRegressor":
            return regressors.get_GradientBoostingRegressor_ConfigurationSpace(random_state=random_state)
        case "HistGradientBoostingRegressor":
            return regressors.get_HistGradientBoostingRegressor_ConfigurationSpace(random_state=random_state)
        case "MLPRegressor":
            return regressors.get_MLPRegressor_ConfigurationSpace(random_state=random_state)
        case "KNeighborsRegressor":
            return regressors.get_KNeighborsRegressor_ConfigurationSpace(n_samples=n_samples, n_jobs=n_jobs)
        case "GaussianProcessRegressor":
            return regressors.get_GaussianProcessRegressor_ConfigurationSpace(n_features=n_features, random_state=random_state)
        case "LGBMRegressor":
            return regressors.get_LGBMRegressor_ConfigurationSpace(random_state=random_state, n_jobs=n_jobs)
        case "BaggingRegressor":
            return regressors.get_BaggingRegressor_ConfigurationSpace(random_state=random_state, n_jobs=n_jobs)

        #transformers.py
        case "Binarizer":
            return transformers.Binarizer_configspace
        case "Normalizer":
            return transformers.Normalizer_configspace
        case "PCA":
            return transformers.PCA_configspace
        case "ZeroCount":
            return transformers.ZeroCount_configspace
        case "FastICA":
            return transformers.get_FastICA_configspace(n_features=n_features, random_state=random_state)
        case "FeatureAgglomeration":
            return transformers.get_FeatureAgglomeration_configspace(n_features=n_features)
        case "Nystroem":
            return transformers.get_Nystroem_configspace(n_features=n_features, random_state=random_state)
        case "RBFSampler":
            return transformers.get_RBFSampler_configspace(n_features=n_features, random_state=random_state)
        case "MinMaxScaler":
            return {}
        case "PowerTransformer":
            return {}
        case "QuantileTransformer":
            return transformers.get_QuantileTransformer_configspace(n_samples=n_samples, random_state=random_state)
        case "RobustScaler":
            return transformers.RobustScaler_configspace
        case "MaxAbsScaler":
            return {}
        case "PolynomialFeatures":
            return transformers.PolynomialFeatures_configspace
        case "StandardScaler":
            return {}
        case "PassKBinsDiscretizer":
            return transformers.get_passkbinsdiscretizer_configspace(random_state=random_state)
        case "KBinsDiscretizer":
            return transformers.get_passkbinsdiscretizer_configspace(random_state=random_state)
        case "ColumnOneHotEncoder":
            return {}
        case "ColumnOrdinalEncoder":
            return {}

        #selectors.py
        case "SelectFwe":
            return selectors.SelectFwe_configspace 
        case "SelectPercentile":
            return selectors.SelectPercentile_configspace
        case "VarianceThreshold":
            return selectors.VarianceThreshold_configspace
        case "RFE":
            return selectors.RFE_configspace_part
        case "SelectFromModel":
            return selectors.SelectFromModel_configspace_part


        #special_configs.py
        case "AddTransformer":
            return {}
        case "mul_neg_1_Transformer":
            return {}
        case "MulTransformer":
            return {}
        case "SafeReciprocalTransformer":
            return {}
        case "EQTransformer":
            return {}
        case "NETransformer":
            return {}
        case "GETransformer":
            return {}
        case "GTTransformer":
            return {}
        case "LETransformer":
            return {}
        case "LTTransformer":
            return {}        
        case "MinTransformer":
            return {}
        case "MaxTransformer":
            return {}
        case "ZeroTransformer":
            return {}
        case "OneTransformer":
            return {}
        case "NTransformer":
            return ConfigurationSpace(

                space = {

                    'n': Float("n", bounds=(-1e2, 1e2)),
                }
            ) 

        #imputers.py
        case "SimpleImputer":
            return imputers.simple_imputer_cs
        case "IterativeImputer":
            return imputers.get_IterativeImputer_config_space(n_features=n_features, random_state=random_state)
        case "IterativeImputer_no_estimator":
            return imputers.get_IterativeImputer_config_space_no_estimator(n_features=n_features, random_state=random_state)

        case "KNNImputer":
            return imputers.get_KNNImputer_config_space(n_samples=n_samples)

        #mdr_configs.py
        case "MDR":
            return mdr_configs.MDR_configspace
        case "ContinuousMDR":
            return mdr_configs.MDR_configspace
        case "ReliefF":
            return mdr_configs.get_skrebate_ReliefF_config_space(n_features=n_features)
        case "SURF":
            return mdr_configs.get_skrebate_SURF_config_space(n_features=n_features)
        case "SURFstar":
            return mdr_configs.get_skrebate_SURFstar_config_space(n_features=n_features)
        case "MultiSURF":
            return mdr_configs.get_skrebate_MultiSURF_config_space(n_features=n_features)

        #classifiers_sklearnex.py
        case "RandomForestClassifier_sklearnex":
            return classifiers_sklearnex.get_RandomForestClassifier_ConfigurationSpace(random_state=random_state, n_jobs=n_jobs)
        case "LogisticRegression_sklearnex":
            return classifiers_sklearnex.get_LogisticRegression_ConfigurationSpace(random_state=random_state)
        case "KNeighborsClassifier_sklearnex":
            return classifiers_sklearnex.get_KNeighborsClassifier_ConfigurationSpace(n_samples=n_samples)
        case "SVC_sklearnex":
            return classifiers_sklearnex.get_SVC_ConfigurationSpace(random_state=random_state)
        case "NuSVC_sklearnex":
            return classifiers_sklearnex.get_NuSVC_ConfigurationSpace(random_state=random_state)

        #regressors_sklearnex.py
        case "LinearRegression_sklearnex":
            return {}
        case "Ridge_sklearnex":
            return regressors_sklearnex.get_Ridge_ConfigurationSpace(random_state=random_state)
        case "Lasso_sklearnex":
            return regressors_sklearnex.get_Lasso_ConfigurationSpace(random_state=random_state)
        case "ElasticNet_sklearnex":
            return regressors_sklearnex.get_ElasticNet_ConfigurationSpace(random_state=random_state)
        case "SVR_sklearnex":
            return regressors_sklearnex.get_SVR_ConfigurationSpace(random_state=random_state)
        case "NuSVR_sklearnex":
            return regressors_sklearnex.get_NuSVR_ConfigurationSpace(random_state=random_state)
        case "RandomForestRegressor_sklearnex":
            return regressors_sklearnex.get_RandomForestRegressor_ConfigurationSpace(random_state=random_state)
        case "KNeighborsRegressor_sklearnex":
            return regressors_sklearnex.get_KNeighborsRegressor_ConfigurationSpace(n_samples=n_samples)

    #raise error
    raise ValueError(f"Could not find configspace for {name}")

get_node(name, n_classes=3, n_samples=100, n_features=100, random_state=None, base_node=EstimatorNode, n_jobs=1)

get_search_space 的辅助函数。返回给定 scikit-learn 方法的单个 EstimatorNode。还包括需要自定义解析超参数的节点或包装其他方法的方法的特殊情况。

参数

名称 类型 描述 默认值
name strlist

要为其创建搜索空间的 scikit-learn 方法或方法组的名称。 - str:scikit-learn 方法的名称。(例如,sklearn.ensemble.RandomForestClassifier 的 'RandomForestClassifier')或者,方法组的名称。(例如,表示所有分类器的 'classifiers')。 - list:scikit-learn 方法名称列表。(例如,['RandomForestClassifier', 'ExtraTreesClassifier'])

必需
n_classes int(默认值=3)

目标变量中的类别数。

3
n_samples int(默认值=1000)

数据集中的样本数。

100
n_features int(默认值=100)

数据集中的特征数。

100
random_state int(默认值=None)

一个固定的 random_state,传递给所有具有 random_state 超参数的方法。

None
return_choice_pipeline bool(默认值=True)

如果为 False,则返回 TPOT.search_spaces.nodes.EstimatorNode 对象列表。如果为 True,则返回一个包含所有 EstimatorNode 并从中采样的单个 TPOT.search_spaces.pipelines.ChoicePipeline(选择管道)。

必需
base_node

将配置空间传递给的 SearchSpace。如果您想尝试自定义变异/交叉算子,可以在此处传递自定义 SearchSpace 节点。

EstimatorNode
n_jobs int(默认值=1)

为具有 n_jobs 参数的评估器设置该参数。默认值为 1。

1

返回值

类型 描述
返回一个可由 TPOT 优化的 SearchSpace 对象。
  • TPOT.search_spaces.nodes.EstimatorNode(或 base_node)。
  • 如果方法需要包装评估器,则返回 TPOT.search_spaces.pipelines.WrapperPipeline(包装管道)对象。
源代码位于 tpot/config/get_configspace.py
def get_node(name, n_classes=3, n_samples=100, n_features=100, random_state=None, base_node=EstimatorNode, n_jobs=1):
    """
    Helper function for get_search_space. Returns a single EstimatorNode for the given scikit-learn method. Also includes special cases for nodes that require custom parsing of the hyperparameters or methods that wrap other methods.

    Parameters
    ----------

    name : str or list
        The name of the scikit-learn method or group of methods for which to create the search space.
        - str: The name of the scikit-learn method. (e.g. 'RandomForestClassifier' for sklearn.ensemble.RandomForestClassifier)
        Alternatively, the name of a group of methods. (e.g. 'classifiers' for all classifiers).
        - list: A list of scikit-learn method names. (e.g. ['RandomForestClassifier', 'ExtraTreesClassifier'])
    n_classes : int (default=3)
        The number of classes in the target variable.
    n_samples : int (default=1000)
        The number of samples in the dataset.
    n_features : int (default=100)
        The number of features in the dataset.
    random_state : int (default=None)
        A fixed random_state to pass through to all methods that have a random_state hyperparameter. 
    return_choice_pipeline : bool (default=True)
        If False, returns a list of TPOT.search_spaces.nodes.EstimatorNode objects.
        If True, returns a single TPOT.search_spaces.pipelines.ChoicePipeline that includes and samples from all EstimatorNodes.
    base_node: TPOT.search_spaces.base.SearchSpace (default=TPOT.search_spaces.nodes.EstimatorNode)
        The SearchSpace to pass the configuration space to. If you want to experiment with custom mutation/crossover operators, you can pass a custom SearchSpace node here.
    n_jobs : int (default=1)
        Sets the n_jobs parameter for estimators that have it. Default is 1.

    Returns
    -------
        Returns an SearchSpace object that can be optimized by TPOT.
        - TPOT.search_spaces.nodes.EstimatorNode (or base_node).
        - TPOT.search_spaces.pipelines.WrapperPipeline object if the method requires a wrapped estimator.


    """

    if name == "LinearSVC_wrapped":
        ext = get_node("LinearSVC", n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return WrapperPipeline(estimator_search_space=ext, method=sklearn.calibration.CalibratedClassifierCV, space={})
    if name == "RFE_classification":
        rfe_sp = get_configspace(name="RFE", n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        ext = get_node("ExtraTreesClassifier", n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return WrapperPipeline(estimator_search_space=ext, method=RFE, space=rfe_sp)
    if name == "RFE_regression":
        rfe_sp = get_configspace(name="RFE", n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        ext = get_node("ExtraTreesRegressor", n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return WrapperPipeline(estimator_search_space=ext, method=RFE, space=rfe_sp)
    if name == "SelectFromModel_classification":
        sfm_sp = get_configspace(name="SelectFromModel", n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        ext = get_node("ExtraTreesClassifier", n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return WrapperPipeline(estimator_search_space=ext, method=SelectFromModel, space=sfm_sp)
    if name == "SelectFromModel_regression":
        sfm_sp = get_configspace(name="SelectFromModel", n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        ext = get_node("ExtraTreesRegressor", n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return WrapperPipeline(estimator_search_space=ext, method=SelectFromModel, space=sfm_sp)
    # TODO Add IterativeImputer with more estimator methods
    if name == "IterativeImputer_learned_estimators":
        iteative_sp = get_configspace(name="IterativeImputer_no_estimator", n_features=n_features, random_state=random_state, n_jobs=n_jobs)
        regressor_searchspace = get_node("ExtraTreesRegressor", n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return WrapperPipeline(estimator_search_space=regressor_searchspace, method=IterativeImputer, space=iteative_sp)

    #these are nodes that have special search spaces which require custom parsing of the hyperparameters
    if name == "IterativeImputer":
        configspace = get_configspace(name, n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return EstimatorNode(STRING_TO_CLASS[name], configspace, hyperparameter_parser=imputers.IterativeImputer_hyperparameter_parser)
    if name == "RobustScaler":
        configspace = get_configspace(name, n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return base_node(STRING_TO_CLASS[name], configspace, hyperparameter_parser=transformers.robust_scaler_hyperparameter_parser)
    if name == "GradientBoostingClassifier":
        configspace = get_configspace(name, n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return base_node(STRING_TO_CLASS[name], configspace, hyperparameter_parser=classifiers.GradientBoostingClassifier_hyperparameter_parser)
    if name == "HistGradientBoostingClassifier":
        configspace = get_configspace(name, n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return base_node(STRING_TO_CLASS[name], configspace, hyperparameter_parser=classifiers.HistGradientBoostingClassifier_hyperparameter_parser)
    if name == "GradientBoostingRegressor":
        configspace = get_configspace(name, n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return base_node(STRING_TO_CLASS[name], configspace, hyperparameter_parser=regressors.GradientBoostingRegressor_hyperparameter_parser)
    if  name == "HistGradientBoostingRegressor":
        configspace = get_configspace(name, n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return base_node(STRING_TO_CLASS[name], configspace, hyperparameter_parser=regressors.HistGradientBoostingRegressor_hyperparameter_parser)
    if name == "MLPClassifier":
        configspace = get_configspace(name, n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return base_node(STRING_TO_CLASS[name], configspace, hyperparameter_parser=classifiers.MLPClassifier_hyperparameter_parser)
    if name == "MLPRegressor":
        configspace = get_configspace(name, n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return base_node(STRING_TO_CLASS[name], configspace, hyperparameter_parser=regressors.MLPRegressor_hyperparameter_parser)
    if name == "GaussianProcessRegressor":
        configspace = get_configspace(name, n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return base_node(STRING_TO_CLASS[name], configspace, hyperparameter_parser=regressors.GaussianProcessRegressor_hyperparameter_parser)
    if name == "GaussianProcessClassifier":
        configspace = get_configspace(name, n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return base_node(STRING_TO_CLASS[name], configspace, hyperparameter_parser=classifiers.GaussianProcessClassifier_hyperparameter_parser)
    if name == "FeatureAgglomeration":
        configspace = get_configspace(name, n_classes=n_classes, n_samples=n_samples, random_state=random_state, n_jobs=n_jobs)
        return base_node(STRING_TO_CLASS[name], configspace, hyperparameter_parser=transformers.FeatureAgglomeration_hyperparameter_parser)

    configspace = get_configspace(name, n_classes=n_classes, n_samples=n_samples, n_features=n_features, random_state=random_state, n_jobs=n_jobs)
    if configspace is None:
        #raise warning
        warnings.warn(f"Could not find configspace for {name}")
        return None

    return base_node(STRING_TO_CLASS[name], configspace)

get_search_space(name, n_classes=3, n_samples=1000, n_features=100, random_state=None, return_choice_pipeline=True, base_node=EstimatorNode, n_jobs=1)

返回给定 scikit-learn 方法或方法组的 TPOT 搜索空间。

参数

名称 类型 描述 默认值
name strlist

要为其创建搜索空间的 scikit-learn 方法或方法组的名称。 - str:scikit-learn 方法的名称。(例如,sklearn.ensemble.RandomForestClassifier 的 'RandomForestClassifier')或者,方法组的名称。(例如,表示所有分类器的 'classifiers')。 - list:scikit-learn 方法名称列表。(例如,['RandomForestClassifier', 'ExtraTreesClassifier'])

必需
n_classes int(默认值=3)

目标变量中的类别数。

3
n_samples int(默认值=1000)

数据集中的样本数。

1000
n_features int(默认值=100)

数据集中的特征数。

100
random_state int(默认值=None)

一个固定的 random_state,传递给所有具有 random_state 超参数的方法。

None
return_choice_pipeline bool(默认值=True)

如果为 False,则返回 TPOT.search_spaces.nodes.EstimatorNode 对象列表。如果为 True,则返回一个包含所有 EstimatorNode 并从中采样的单个 TPOT.search_spaces.pipelines.ChoicePipeline(选择管道)。

True
base_node

将配置空间传递给的 SearchSpace。如果您想尝试自定义变异/交叉算子,可以在此处传递自定义 SearchSpace 节点。

EstimatorNode
n_jobs int(默认值=1)

为具有 n_jobs 参数的评估器设置该参数。默认值为 1。

1

返回值

类型 描述
返回一个可由 TPOT 优化的 SearchSpace 对象。
  • 如果只有一个搜索空间,则返回 TPOT.search_spaces.nodes.EstimatorNode(或 base_node)。
  • 如果有多个搜索空间,则返回 TPOT.search_spaces.nodes.EstimatorNode(或 base_node)对象列表。
  • 如果 return_choice_pipeline 为 True,则返回 TPOT.search_spaces.pipelines.ChoicePipeline(选择管道)对象。注意:对于某些使用包装评估器方法的特殊情况,返回的搜索空间是 TPOT.search_spaces.pipelines.WrapperPipeline(包装管道)对象。
源代码位于 tpot/config/get_configspace.py
def get_search_space(name, n_classes=3, n_samples=1000, n_features=100, random_state=None, return_choice_pipeline=True, base_node=EstimatorNode, n_jobs=1):
    """
    Returns a TPOT search space for a given scikit-learn method or group of methods.

    Parameters
    ----------
    name : str or list
        The name of the scikit-learn method or group of methods for which to create the search space.
        - str: The name of the scikit-learn method. (e.g. 'RandomForestClassifier' for sklearn.ensemble.RandomForestClassifier)
        Alternatively, the name of a group of methods. (e.g. 'classifiers' for all classifiers).
        - list: A list of scikit-learn method names. (e.g. ['RandomForestClassifier', 'ExtraTreesClassifier'])
    n_classes : int (default=3)
        The number of classes in the target variable.
    n_samples : int (default=1000)
        The number of samples in the dataset.
    n_features : int (default=100)
        The number of features in the dataset.
    random_state : int (default=None)
        A fixed random_state to pass through to all methods that have a random_state hyperparameter. 
    return_choice_pipeline : bool (default=True)
        If False, returns a list of TPOT.search_spaces.nodes.EstimatorNode objects.
        If True, returns a single TPOT.search_spaces.pipelines.ChoicePipeline that includes and samples from all EstimatorNodes.
    base_node: TPOT.search_spaces.base.SearchSpace (default=TPOT.search_spaces.nodes.EstimatorNode)
        The SearchSpace to pass the configuration space to. If you want to experiment with custom mutation/crossover operators, you can pass a custom SearchSpace node here.
    n_jobs : int (default=1)
        Sets the n_jobs parameter for estimators that have it. Default is 1.

    Returns
    -------
        Returns an SearchSpace object that can be optimized by TPOT.
        - TPOT.search_spaces.nodes.EstimatorNode (or base_node) if there is only one search space.
        - List of TPOT.search_spaces.nodes.EstimatorNode (or base_node) objects if there are multiple search spaces.
        - TPOT.search_spaces.pipelines.ChoicePipeline object if return_choice_pipeline is True.
        Note: for some special cases with methods using wrapped estimators, the returned search space is a TPOT.search_spaces.pipelines.WrapperPipeline object.

    """
    name = flatten_group_names(name)

    #if list of names, return a list of EstimatorNodes
    if isinstance(name, list) or isinstance(name, np.ndarray):
        search_spaces = [get_search_space(n, n_classes=n_classes, n_samples=n_samples, n_features=n_features, random_state=random_state, return_choice_pipeline=False, base_node=base_node, n_jobs=n_jobs) for n in name]
        #remove Nones
        search_spaces = [s for s in search_spaces if s is not None]

        if return_choice_pipeline:
            return ChoicePipeline(search_spaces=np.hstack(search_spaces))
        else:
            return np.hstack(search_spaces)

    # if name in GROUPNAMES:
    #     name_list = GROUPNAMES[name]
    #     return get_search_space(name_list, n_classes=n_classes, n_samples=n_samples, n_features=n_features, random_state=random_state, return_choice_pipeline=return_choice_pipeline, base_node=base_node)

    return get_node(name, n_classes=n_classes, n_samples=n_samples, n_features=n_features, random_state=random_state, base_node=base_node, n_jobs=n_jobs)