此文件是 TPOT 库的一部分。
TPOT 的当前版本由 Cedars-Sinai 的以下人员开发: - Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/) - Anil Saini (anil.saini@cshs.org) - Jose Hernandez (jgh9094@gmail.com) - Jay Moran (jay.moran@cshs.org) - Nicholas Matsumoto (nicholas.matsumoto@cshs.org) - Hyunjun Choi (hyunjun.choi@cshs.org) - Gabriel Ketron (gabriel.ketron@cshs.org) - Miguel E. Hernandez (miguel.e.hernandez@cshs.org) - Jason Moore (moorejh28@gmail.com)
TPOT 的原始版本主要由宾夕法尼亚大学的以下人员开发: - Randal S. Olson (rso@randalolson.com) - Weixuan Fu (weixuanf@upenn.edu) - Daniel Angell (dpa34@drexel.edu) - Jason Moore (moorejh28@gmail.com) - 以及许多慷慨的开源贡献者
TPOT 是自由软件:你可以根据自由软件基金会发布的 GNU 宽通用公共许可证的条款重新分发和/或修改它,无论是许可证的第 3 版,还是(由你选择)任何更高版本。
TPOT 的分发是希望它有用,但没有任何担保;甚至不包括适销性或特定用途适用性的默示担保。详情请参阅 GNU 宽通用公共许可证。
你应该已随 TPOT 收到一份 GNU 宽通用公共许可证的副本。如果没有,请访问 https://gnu.ac.cn/licenses/。
FeatureSetSelector
基础类:BaseEstimator
, SelectorMixin
选择预定义的特征子集。
源代码位于 tpot/builtin_modules/feature_set_selector.py
| class FeatureSetSelector(BaseEstimator, SelectorMixin):
"""
Select predefined feature subsets.
"""
def __init__(self, sel_subset=None, name=None):
"""Create a FeatureSetSelector object.
Parameters
----------
sel_subset: list or int
If X is a dataframe, items in sel_subset list must correspond to column names
If X is a numpy array, items in sel_subset list must correspond to column indexes
int: index of a single column
Returns
-------
None
"""
self.name = name
self.sel_subset = sel_subset
def fit(self, X, y=None):
"""Fit FeatureSetSelector for feature selection
Parameters
----------
X: array-like of shape (n_samples, n_features)
The training input samples.
y: array-like, shape (n_samples,)
The target values (integers that correspond to classes in classification, real numbers in regression).
Returns
-------
self: object
Returns a copy of the estimator
"""
if isinstance(self.sel_subset, int) or isinstance(self.sel_subset, str):
self.sel_subset = [self.sel_subset]
#generate self.feat_list_idx
if isinstance(X, pd.DataFrame):
self.feature_names_in_ = X.columns.tolist()
self.feat_list_idx = sorted([self.feature_names_in_.index(feat) for feat in self.sel_subset])
elif isinstance(X, np.ndarray):
self.feature_names_in_ = None#list(range(X.shape[1]))
self.feat_list_idx = sorted(self.sel_subset)
n_features = X.shape[1]
self.mask = np.zeros(n_features, dtype=bool)
self.mask[np.asarray(self.feat_list_idx)] = True
return self
#TODO keep returned as dataframe if input is dataframe? may not be consistent with sklearn
# def transform(self, X):
def _get_tags(self):
tags = {"allow_nan": True, "requires_y": False}
return tags
def _get_support_mask(self):
"""
Get the boolean mask indicating which features are selected
Returns
-------
support : boolean array of shape [# input features]
An element is True iff its corresponding feature is selected for
retention.
"""
return self.mask
|
__init__(sel_subset=None, name=None)
创建一个 FeatureSetSelector 对象。
参数
名称 |
类型 |
描述 |
默认值 |
sel_subset |
|
如果 X 是一个数据框,sel_subset 列表中的项必须与列名对应。如果 X 是一个 numpy 数组,sel_subset 列表中的项必须与列索引对应。int: 单个列的索引
|
None
|
返回
源代码位于 tpot/builtin_modules/feature_set_selector.py
| def __init__(self, sel_subset=None, name=None):
"""Create a FeatureSetSelector object.
Parameters
----------
sel_subset: list or int
If X is a dataframe, items in sel_subset list must correspond to column names
If X is a numpy array, items in sel_subset list must correspond to column indexes
int: index of a single column
Returns
-------
None
"""
self.name = name
self.sel_subset = sel_subset
|
fit(X, y=None)
拟合 FeatureSetSelector 进行特征选择
参数
名称 |
类型 |
描述 |
默认值 |
X |
|
|
必需的
|
y |
|
|
None
|
返回
源代码位于 tpot/builtin_modules/feature_set_selector.py
| def fit(self, X, y=None):
"""Fit FeatureSetSelector for feature selection
Parameters
----------
X: array-like of shape (n_samples, n_features)
The training input samples.
y: array-like, shape (n_samples,)
The target values (integers that correspond to classes in classification, real numbers in regression).
Returns
-------
self: object
Returns a copy of the estimator
"""
if isinstance(self.sel_subset, int) or isinstance(self.sel_subset, str):
self.sel_subset = [self.sel_subset]
#generate self.feat_list_idx
if isinstance(X, pd.DataFrame):
self.feature_names_in_ = X.columns.tolist()
self.feat_list_idx = sorted([self.feature_names_in_.index(feat) for feat in self.sel_subset])
elif isinstance(X, np.ndarray):
self.feature_names_in_ = None#list(range(X.shape[1]))
self.feat_list_idx = sorted(self.sel_subset)
n_features = X.shape[1]
self.mask = np.zeros(n_features, dtype=bool)
self.mask[np.asarray(self.feat_list_idx)] = True
return self
|