跳到内容

EstimatorTransformer

此文件是 TPOT 库的一部分。

TPOT 当前版本由 Cedars-Sinai 开发,开发者包括: - Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/) - Anil Saini (anil.saini@cshs.org) - Jose Hernandez (jgh9094@gmail.com) - Jay Moran (jay.moran@cshs.org) - Nicholas Matsumoto (nicholas.matsumoto@cshs.org) - Hyunjun Choi (hyunjun.choi@cshs.org) - Gabriel Ketron (gabriel.ketron@cshs.org) - Miguel E. Hernandez (miguel.e.hernandez@cshs.org) - Jason Moore (moorejh28@gmail.com)

TPOT 的原始版本主要由宾夕法尼亚大学开发,开发者包括: - Randal S. Olson (rso@randalolson.com) - Weixuan Fu (weixuanf@upenn.edu) - Daniel Angell (dpa34@drexel.edu) - Jason Moore (moorejh28@gmail.com) - 以及许多慷慨的开源贡献者

TPOT 是自由软件:您可以根据自由软件基金会发布的 GNU 宽通用公共许可证(版本 3 或您选择的任何更高版本)的条款重新分发和/或修改它。

发布 TPOT 是希望它会有用,但 不提供任何担保;甚至不包含对适销性或特定用途适用性的默示担保。详情请参见 GNU 宽通用公共许可证。

您应该已经随 TPOT 收到了 GNU 宽通用公共许可证的副本。如果未收到,请访问 https://gnu.ac.cn/licenses/

EstimatorTransformer

基类:BaseEstimator, TransformerMixin

源代码位于 tpot/builtin_modules/estimatortransformer.py
class EstimatorTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, estimator, method='auto', passthrough=False, cross_val_predict_cv=None):
        """
        A class for using a sklearn estimator as a transformer. When calling fit_transform, this class returns the out put of cross_val_predict
        and trains the estimator on the full dataset. When calling transform, this class uses the estimator fit on the full dataset to transform the data.

        Parameters
        ----------
        estimator : sklear.base. BaseEstimator
            The estimator to use as a transformer.
        method : str, default='auto'
            The method to use for the transformation. If 'auto', will try to use predict_proba, decision_function, or predict in that order.
            - predict_proba: use the predict_proba method of the estimator.
            - decision_function: use the decision_function method of the estimator.
            - predict: use the predict method of the estimator.
        passthrough : bool, default=False
            Whether to pass the original input through.
        cross_val_predict_cv : int, default=0
            Number of folds to use for the cross_val_predict function for inner classifiers and regressors. Estimators will still be fit on the full dataset, but the following node will get the outputs from cross_val_predict.

            - 0-1 : When set to 0 or 1, the cross_val_predict function will not be used. The next layer will get the outputs from fitting and transforming the full dataset.
            - >=2 : When fitting pipelines with inner classifiers or regressors, they will still be fit on the full dataset.
                    However, the output to the next node will come from cross_val_predict with the specified number of folds.

        """
        self.estimator = estimator
        self.method = method
        self.passthrough = passthrough
        self.cross_val_predict_cv = cross_val_predict_cv

    def fit(self, X, y=None):
        self.estimator.fit(X, y)
        return self

    def transform(self, X, y=None):
        #Does not do cross val predict, just uses the estimator to transform the data. This is used for the actual transformation in practice, so the real transformation without fitting is needed
        if self.method == 'auto':
            if hasattr(self.estimator, 'predict_proba'):
                method = 'predict_proba'
            elif hasattr(self.estimator, 'decision_function'):
                method = 'decision_function'
            elif hasattr(self.estimator, 'predict'):
                method = 'predict'
            else:
                raise ValueError('Estimator has no valid method')
        else:
            method = self.method

        output = getattr(self.estimator, method)(X)
        output=np.array(output)

        if len(output.shape) == 1:
            output = output.reshape(-1,1)

        if self.passthrough:
            return np.hstack((output, X))
        else:
            return output



    def fit_transform(self, X, y=None):
        #Does use cross_val_predict if cross_val_predict_cv is greater than 0. this function is only used in training the model. 
        self.estimator.fit(X,y)

        if self.method == 'auto':
            if hasattr(self.estimator, 'predict_proba'):
                method = 'predict_proba'
            elif hasattr(self.estimator, 'decision_function'):
                method = 'decision_function'
            elif hasattr(self.estimator, 'predict'):
                method = 'predict'
            else:
                raise ValueError('Estimator has no valid method')
        else:
            method = self.method

        if self.cross_val_predict_cv is not None:
            output = cross_val_predict(self.estimator, X, y=y, cv=self.cross_val_predict_cv)
        else:
            output = getattr(self.estimator, method)(X)
            #reshape if needed

        if len(output.shape) == 1:
            output = output.reshape(-1,1)

        output=np.array(output)
        if self.passthrough:
            return np.hstack((output, X))
        else:
            return output

    def _estimator_has(attr):
        '''Check if we can delegate a method to the underlying estimator.
        First, we check the first fitted final estimator if available, otherwise we
        check the unfitted final estimator.
        '''
        return  lambda self: (self.estimator is not None and
            hasattr(self.estimator, attr)
        )

    @available_if(_estimator_has('predict'))
    def predict(self, X, **predict_params):
        check_is_fitted(self.estimator)
        #X = check_array(X)

        preds = self.estimator.predict(X,**predict_params)
        return preds

    @available_if(_estimator_has('predict_proba'))
    def predict_proba(self, X, **predict_params):
        check_is_fitted(self.estimator)
        #X = check_array(X)
        return self.estimator.predict_proba(X,**predict_params)

    @available_if(_estimator_has('decision_function'))
    def decision_function(self, X, **predict_params):
        check_is_fitted(self.estimator)
        #X = check_array(X)
        return self.estimator.decision_function(X,**predict_params)

    def __sklearn_is_fitted__(self):
        """
        Check fitted status and return a Boolean value.
        """
        return check_is_fitted(self.estimator)


    # @property
    # def _estimator_type(self):
    #     return self.estimator._estimator_type



    @property
    def classes_(self):
        """The classes labels. Only exist if the last step is a classifier."""
        return self.estimator._classes

classes_ property

类标签。仅当最后一步是分类器时存在。

__init__(estimator, method='auto', passthrough=False, cross_val_predict_cv=None)

一个用于将 sklearn 估计器用作转换器的类。调用 fit_transform 时,此类返回 cross_val_predict 的输出并在完整数据集上训练估计器。调用 transform 时,此类使用在完整数据集上拟合的估计器来转换数据。

参数

名称 类型 描述 默认值
estimator BaseEstimator

要用作转换器的估计器。

必需
method str

用于转换的方法。如果为 'auto',将按 predict_proba、decision_function 或 predict 的顺序尝试使用。 - predict_proba:使用估计器的 predict_proba 方法。 - decision_function:使用估计器的 decision_function 方法。 - predict:使用估计器的 predict 方法。

'auto'
passthrough bool

是否透传原始输入。

False
cross_val_predict_cv int

对内部分类器和回归器使用 cross_val_predict 函数时使用的折叠数(folds)。估计器仍将在完整数据集上进行拟合,但后续节点将获得 cross_val_predict 的输出。

  • 0-1:当设置为 0 或 1 时,不使用 cross_val_predict 函数。下一层将获得在完整数据集上拟合和转换后的输出。
  • =2:在拟合包含内部分类器或回归器的 pipelines 时,它们仍将在完整数据集上进行拟合。然而,到下一节点的输出将来自使用指定折叠数的 cross_val_predict。

0
源代码位于 tpot/builtin_modules/estimatortransformer.py
def __init__(self, estimator, method='auto', passthrough=False, cross_val_predict_cv=None):
    """
    A class for using a sklearn estimator as a transformer. When calling fit_transform, this class returns the out put of cross_val_predict
    and trains the estimator on the full dataset. When calling transform, this class uses the estimator fit on the full dataset to transform the data.

    Parameters
    ----------
    estimator : sklear.base. BaseEstimator
        The estimator to use as a transformer.
    method : str, default='auto'
        The method to use for the transformation. If 'auto', will try to use predict_proba, decision_function, or predict in that order.
        - predict_proba: use the predict_proba method of the estimator.
        - decision_function: use the decision_function method of the estimator.
        - predict: use the predict method of the estimator.
    passthrough : bool, default=False
        Whether to pass the original input through.
    cross_val_predict_cv : int, default=0
        Number of folds to use for the cross_val_predict function for inner classifiers and regressors. Estimators will still be fit on the full dataset, but the following node will get the outputs from cross_val_predict.

        - 0-1 : When set to 0 or 1, the cross_val_predict function will not be used. The next layer will get the outputs from fitting and transforming the full dataset.
        - >=2 : When fitting pipelines with inner classifiers or regressors, they will still be fit on the full dataset.
                However, the output to the next node will come from cross_val_predict with the specified number of folds.

    """
    self.estimator = estimator
    self.method = method
    self.passthrough = passthrough
    self.cross_val_predict_cv = cross_val_predict_cv

__sklearn_is_fitted__()

检查拟合状态并返回布尔值。

源代码位于 tpot/builtin_modules/estimatortransformer.py
def __sklearn_is_fitted__(self):
    """
    Check fitted status and return a Boolean value.
    """
    return check_is_fitted(self.estimator)