此文件是 TPOT 库的一部分。
TPOT 当前版本由 Cedars-Sinai 计算生物医学部的以下人员开发: - Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/) - Anil Saini (anil.saini@cshs.org) - Jose Hernandez (jgh9094@gmail.com) - Jay Moran (jay.moran@cshs.org) - Nicholas Matsumoto (nicholas.matsumoto@cshs.org) - Hyunjun Choi (hyunjun.choi@cshs.org) - Gabriel Ketron (gabriel.ketron@cshs.org) - Miguel E. Hernandez (miguel.e.hernandez@cshs.org) - Jason Moore (moorejh28@gmail.com)
TPOT 原始版本主要由宾夕法尼亚大学的以下人员开发: - Randal S. Olson (rso@randalolson.com) - Weixuan Fu (weixuanf@upenn.edu) - Daniel Angell (dpa34@drexel.edu) - Jason Moore (moorejh28@gmail.com) - 以及更多慷慨的开源贡献者
TPOT 是自由软件:您可以根据自由软件基金会发布的 GNU 宽通用公共许可证(版本 3 或您选择的任何更高版本)的条款重新分发和/或修改它。
分发 TPOT 是希望它会有用,但没有任何担保;甚至不包括适销性或特定用途适用性的默示担保。详情请参阅 GNU 宽通用公共许可证。
您应该已经随 TPOT 收到了一份 GNU 宽通用公共许可证的副本。如果没有,请参阅 https://gnu.ac.cn/licenses/。
ColumnSimpleImputer
基类:BaseEstimator, TransformerMixin
源代码位于 tpot/builtin_modules/imputer.py
| class ColumnSimpleImputer(BaseEstimator, TransformerMixin):
def __init__(self, columns="all",
missing_values=np.nan,
strategy="mean",
fill_value=None,
copy=True,
add_indicator=False,
keep_empty_features=False,):
""""
A wrapper for SimpleImputer that allows for imputation of specific columns in a DataFrame or np array.
Passes through columns that are not imputed.
Parameters
----------
columns : str, list, default='all'
Determines which columns to impute with sklearn.impute.SimpleImputer.
- 'categorical' : Automatically select categorical features
- 'numeric' : Automatically select numeric features
- 'all' : Select all features
- list : A list of columns to select
# See documentation from sklearn.impute.SimpleImputer for the following parameters
missing_values, strategy, fill_value, copy, add_indicator, keep_empty_features
"""
self.columns = columns
self.missing_values = missing_values
self.strategy = strategy
self.fill_value = fill_value
self.copy = copy
self.add_indicator = add_indicator
self.keep_empty_features = keep_empty_features
def fit(self, X, y=None):
if (self.columns == "categorical" or self.columns == "numeric") and not isinstance(X, pd.DataFrame):
raise ValueError(f"Invalid value for columns: {self.columns}. "
"Only 'all' or <list> is supported for np arrays")
if self.columns == "categorical":
self.columns_ = list(X.select_dtypes(exclude='number').columns)
elif self.columns == "numeric":
self.columns_ = [col for col in X.columns if is_numeric_dtype(X[col])]
elif self.columns == "all":
if isinstance(X, pd.DataFrame):
self.columns_ = X.columns
else:
self.columns_ = list(range(X.shape[1]))
elif isinstance(self.columns, list):
self.columns_ = self.columns
else:
raise ValueError(f"Invalid value for columns: {self.columns}")
if len(self.columns_) == 0:
return self
self.imputer = sklearn.impute.SimpleImputer(missing_values=self.missing_values,
strategy=self.strategy,
fill_value=self.fill_value,
copy=self.copy,
add_indicator=self.add_indicator,
keep_empty_features=self.keep_empty_features)
if isinstance(X, pd.DataFrame):
self.imputer.set_output(transform="pandas")
if isinstance(X, pd.DataFrame):
self.imputer.fit(X[self.columns_], y)
else:
self.imputer.fit(X[:, self.columns_], y)
return self
def transform(self, X):
if len(self.columns_) == 0:
return X
if isinstance(X, pd.DataFrame):
X = X.copy()
X[self.columns_] = self.imputer.transform(X[self.columns_])
return X
else:
X = np.copy(X)
X[:, self.columns_] = self.imputer.transform(X[:, self.columns_])
return X
|
__init__(columns='all', missing_values=np.nan, strategy='mean', fill_value=None, copy=True, add_indicator=False, keep_empty_features=False)
" 一个 SimpleImputer 的包装器,允许对 DataFrame 或 np 数组中的特定列进行填充。未填充的列会透传。
参数
| 名称 |
类型 |
描述 |
默认值 |
columns |
(str, list)
|
使用 sklearn.impute.SimpleImputer 确定要填充哪些列。 - 'categorical':自动选择分类特征 - 'numeric':自动选择数值特征 - 'all':选择所有特征 - list:要选择的列列表
|
'all'
|
missing_values |
|
|
nan
|
strategy |
|
|
nan
|
fill_value |
|
|
nan
|
copy |
|
|
nan
|
add_indicator |
|
|
nan
|
keep_empty_features |
|
|
nan
|
源代码位于 tpot/builtin_modules/imputer.py
| def __init__(self, columns="all",
missing_values=np.nan,
strategy="mean",
fill_value=None,
copy=True,
add_indicator=False,
keep_empty_features=False,):
""""
A wrapper for SimpleImputer that allows for imputation of specific columns in a DataFrame or np array.
Passes through columns that are not imputed.
Parameters
----------
columns : str, list, default='all'
Determines which columns to impute with sklearn.impute.SimpleImputer.
- 'categorical' : Automatically select categorical features
- 'numeric' : Automatically select numeric features
- 'all' : Select all features
- list : A list of columns to select
# See documentation from sklearn.impute.SimpleImputer for the following parameters
missing_values, strategy, fill_value, copy, add_indicator, keep_empty_features
"""
self.columns = columns
self.missing_values = missing_values
self.strategy = strategy
self.fill_value = fill_value
self.copy = copy
self.add_indicator = add_indicator
self.keep_empty_features = keep_empty_features
|