跳到内容

填充器

此文件是 TPOT 库的一部分。

TPOT 当前版本由 Cedars-Sinai 计算生物医学部的以下人员开发: - Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/) - Anil Saini (anil.saini@cshs.org) - Jose Hernandez (jgh9094@gmail.com) - Jay Moran (jay.moran@cshs.org) - Nicholas Matsumoto (nicholas.matsumoto@cshs.org) - Hyunjun Choi (hyunjun.choi@cshs.org) - Gabriel Ketron (gabriel.ketron@cshs.org) - Miguel E. Hernandez (miguel.e.hernandez@cshs.org) - Jason Moore (moorejh28@gmail.com)

TPOT 原始版本主要由宾夕法尼亚大学的以下人员开发: - Randal S. Olson (rso@randalolson.com) - Weixuan Fu (weixuanf@upenn.edu) - Daniel Angell (dpa34@drexel.edu) - Jason Moore (moorejh28@gmail.com) - 以及更多慷慨的开源贡献者

TPOT 是自由软件:您可以根据自由软件基金会发布的 GNU 宽通用公共许可证(版本 3 或您选择的任何更高版本)的条款重新分发和/或修改它。

分发 TPOT 是希望它会有用,但没有任何担保;甚至不包括适销性或特定用途适用性的默示担保。详情请参阅 GNU 宽通用公共许可证。

您应该已经随 TPOT 收到了一份 GNU 宽通用公共许可证的副本。如果没有,请参阅 https://gnu.ac.cn/licenses/

ColumnSimpleImputer

基类:BaseEstimator, TransformerMixin

源代码位于 tpot/builtin_modules/imputer.py
class ColumnSimpleImputer(BaseEstimator, TransformerMixin):
    def __init__(self,  columns="all",         
                        missing_values=np.nan,
                        strategy="mean",
                        fill_value=None,
                        copy=True,
                        add_indicator=False,
                        keep_empty_features=False,):
        """"
        A wrapper for SimpleImputer that allows for imputation of specific columns in a DataFrame or np array.
        Passes through columns that are not imputed.

        Parameters
        ----------
        columns : str, list, default='all'
            Determines which columns to impute with sklearn.impute.SimpleImputer.
            - 'categorical' : Automatically select categorical features
            - 'numeric' : Automatically select numeric features
            - 'all' : Select all features
            - list : A list of columns to select

        # See documentation from sklearn.impute.SimpleImputer for the following parameters
        missing_values, strategy, fill_value, copy, add_indicator, keep_empty_features

        """

        self.columns = columns
        self.missing_values = missing_values
        self.strategy = strategy
        self.fill_value = fill_value
        self.copy = copy
        self.add_indicator = add_indicator
        self.keep_empty_features = keep_empty_features


    def fit(self, X, y=None):
        if (self.columns == "categorical" or self.columns == "numeric") and not isinstance(X, pd.DataFrame):
            raise ValueError(f"Invalid value for columns: {self.columns}. "
                             "Only 'all' or <list> is supported for np arrays")

        if self.columns == "categorical":
            self.columns_ = list(X.select_dtypes(exclude='number').columns)
        elif self.columns == "numeric":
            self.columns_ =  [col for col in X.columns if is_numeric_dtype(X[col])]
        elif self.columns == "all":
            if isinstance(X, pd.DataFrame):
                self.columns_ = X.columns
            else:
                self.columns_ = list(range(X.shape[1]))
        elif isinstance(self.columns, list):
            self.columns_ = self.columns
        else:
            raise ValueError(f"Invalid value for columns: {self.columns}")

        if len(self.columns_) == 0:
            return self

        self.imputer = sklearn.impute.SimpleImputer(missing_values=self.missing_values,
                                                    strategy=self.strategy,
                                                    fill_value=self.fill_value,
                                                    copy=self.copy,
                                                    add_indicator=self.add_indicator,
                                                    keep_empty_features=self.keep_empty_features)

        if isinstance(X, pd.DataFrame):
            self.imputer.set_output(transform="pandas")

        if isinstance(X, pd.DataFrame):
            self.imputer.fit(X[self.columns_], y)
        else:
            self.imputer.fit(X[:, self.columns_], y)

        return self

    def transform(self, X):
        if len(self.columns_) == 0:
            return X

        if isinstance(X, pd.DataFrame):
            X = X.copy()
            X[self.columns_] = self.imputer.transform(X[self.columns_])
            return X
        else:
            X = np.copy(X)
            X[:, self.columns_] = self.imputer.transform(X[:, self.columns_])
            return X

__init__(columns='all', missing_values=np.nan, strategy='mean', fill_value=None, copy=True, add_indicator=False, keep_empty_features=False)

" 一个 SimpleImputer 的包装器,允许对 DataFrame 或 np 数组中的特定列进行填充。未填充的列会透传。

参数

名称 类型 描述 默认值
columns (str, list)

使用 sklearn.impute.SimpleImputer 确定要填充哪些列。 - 'categorical':自动选择分类特征 - 'numeric':自动选择数值特征 - 'all':选择所有特征 - list:要选择的列列表

'all'
missing_values
nan
strategy
nan
fill_value
nan
copy
nan
add_indicator
nan
keep_empty_features
nan
源代码位于 tpot/builtin_modules/imputer.py
def __init__(self,  columns="all",         
                    missing_values=np.nan,
                    strategy="mean",
                    fill_value=None,
                    copy=True,
                    add_indicator=False,
                    keep_empty_features=False,):
    """"
    A wrapper for SimpleImputer that allows for imputation of specific columns in a DataFrame or np array.
    Passes through columns that are not imputed.

    Parameters
    ----------
    columns : str, list, default='all'
        Determines which columns to impute with sklearn.impute.SimpleImputer.
        - 'categorical' : Automatically select categorical features
        - 'numeric' : Automatically select numeric features
        - 'all' : Select all features
        - list : A list of columns to select

    # See documentation from sklearn.impute.SimpleImputer for the following parameters
    missing_values, strategy, fill_value, copy, add_indicator, keep_empty_features

    """

    self.columns = columns
    self.missing_values = missing_values
    self.strategy = strategy
    self.fill_value = fill_value
    self.copy = copy
    self.add_indicator = add_indicator
    self.keep_empty_features = keep_empty_features