本文件是 TPOT 库的一部分。
当前版本的 TPOT 由 Cedars-Sinai 的以下人员开发:- Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/) - Anil Saini (anil.saini@cshs.org) - Jose Hernandez (jgh9094@gmail.com) - Jay Moran (jay.moran@cshs.org) - Nicholas Matsumoto (nicholas.matsumoto@cshs.org) - Hyunjun Choi (hyunjun.choi@cshs.org) - Gabriel Ketron (gabriel.ketron@cshs.org) - Miguel E. Hernandez (miguel.e.hernandez@cshs.org) - Jason Moore (moorejh28@gmail.com)
TPOT 的原始版本主要由宾夕法尼亚大学的以下人员开发:- Randal S. Olson (rso@randalolson.com) - Weixuan Fu (weixuanf@upenn.edu) - Daniel Angell (dpa34@drexel.edu) - Jason Moore (moorejh28@gmail.com) - 以及许多慷慨的开源贡献者
TPOT 是免费软件:您可以根据自由软件基金会发布的 GNU 宽通用公共许可证(第三版或您选择的任何更高版本)的条款重新分发和/或修改它。
TPOT 的分发是希望它有用,但不提供任何担保;甚至不提供适销性或特定用途适用性的默示担保。详情请参阅 GNU 宽通用公共许可证。
您应该已经随 TPOT 收到了一份 GNU 宽通用公共许可证的副本。如果没有,请参阅 https://gnu.ac.cn/licenses/。
基础类:BaseEstimator, TransformerMixin
用于选择分类特征并使用 OneHotEncoder 对其进行转换的元转换器。
参数
| 名称 |
类型 |
描述 |
默认值 |
threshold |
int
|
每个特征的最大唯一值数量,用于判断特征是否为分类特征。
|
10
|
minimum_fraction |
|
特征中唯一值的最小比例,用于判断特征是否为分类特征。
|
None
|
源代码位于 tpot/builtin_modules/feature_transformers.py 中
| class CategoricalSelector(BaseEstimator, TransformerMixin):
"""Meta-transformer for selecting categorical features and transform them using OneHotEncoder.
Parameters
----------
threshold : int, default=10
Maximum number of unique values per feature to consider the feature
to be categorical.
minimum_fraction: float, default=None
Minimum fraction of unique values in a feature to consider the feature
to be categorical.
"""
def __init__(self, threshold=10, minimum_fraction=None):
"""Create a CategoricalSelector object."""
self.threshold = threshold
self.minimum_fraction = minimum_fraction
def fit(self, X, y=None):
"""Do nothing and return the estimator unchanged
This method is just there to implement the usual API and hence
work in pipelines.
Parameters
----------
X : array-like
"""
X = check_array(X, accept_sparse='csr')
return self
def transform(self, X):
"""Select categorical features and transform them using OneHotEncoder.
Parameters
----------
X: numpy ndarray, {n_samples, n_components}
New data, where n_samples is the number of samples and n_components is the number of components.
Returns
-------
array-like, {n_samples, n_components}
"""
selected = auto_select_categorical_features(X, threshold=self.threshold)
X_sel, _, n_selected, _ = _X_selected(X, selected)
if n_selected == 0:
# No features selected.
raise ValueError('No categorical feature was found!')
else:
ohe = OneHotEncoder(categorical_features='all', sparse=False, minimum_fraction=self.minimum_fraction)
return ohe.fit_transform(X_sel)
|
创建一个 CategoricalSelector 对象。
源代码位于 tpot/builtin_modules/feature_transformers.py 中
| def __init__(self, threshold=10, minimum_fraction=None):
"""Create a CategoricalSelector object."""
self.threshold = threshold
self.minimum_fraction = minimum_fraction
|
不执行任何操作并返回未更改的估计器。此方法仅用于实现常用的 API,因此可在 pipeline 中工作。
参数
| 名称 |
类型 |
描述 |
默认值 |
X |
array - like
|
|
必需
|
源代码位于 tpot/builtin_modules/feature_transformers.py 中
| def fit(self, X, y=None):
"""Do nothing and return the estimator unchanged
This method is just there to implement the usual API and hence
work in pipelines.
Parameters
----------
X : array-like
"""
X = check_array(X, accept_sparse='csr')
return self
|
选择分类特征并使用 OneHotEncoder 对其进行转换。
参数
| 名称 |
类型 |
描述 |
默认值 |
X |
|
新数据,其中 n_samples 是样本数量,n_components 是组件数量。
|
必需
|
返回值
| 类型 |
描述 |
(array - like, {n_samples, n_components})
|
|
源代码位于 tpot/builtin_modules/feature_transformers.py 中
| def transform(self, X):
"""Select categorical features and transform them using OneHotEncoder.
Parameters
----------
X: numpy ndarray, {n_samples, n_components}
New data, where n_samples is the number of samples and n_components is the number of components.
Returns
-------
array-like, {n_samples, n_components}
"""
selected = auto_select_categorical_features(X, threshold=self.threshold)
X_sel, _, n_selected, _ = _X_selected(X, selected)
if n_selected == 0:
# No features selected.
raise ValueError('No categorical feature was found!')
else:
ohe = OneHotEncoder(categorical_features='all', sparse=False, minimum_fraction=self.minimum_fraction)
return ohe.fit_transform(X_sel)
|
基础类:BaseEstimator, TransformerMixin
用于选择连续特征并使用 PCA 对其进行转换的元转换器。
参数
| 名称 |
类型 |
描述 |
默认值 |
threshold |
int
|
每个特征的最大唯一值数量,用于判断特征是否为分类特征。
|
10
|
svd_solver |
string {'auto', 'full', 'arpack', 'randomized'}
|
auto:求解器根据 X.shape 和 n_components 的默认策略选择:如果输入数据大于 500x500 且要提取的组件数量小于数据最小维度的 80%,则启用更高效的 'randomized' 方法。否则,计算精确的完整 SVD,并在之后可选地截断。full:通过 scipy.linalg.svd 调用标准 LAPACK 求解器运行精确的完整 SVD,并通过后处理选择组件。arpack:通过 scipy.sparse.linalg.svds 调用 ARPACK 求解器运行截断到 n_components 的 SVD。它严格要求 0 < n_components < X.shape[1]。randomized:通过 Halko 等人的方法运行随机 SVD。
|
'randomized'
|
iterated_power |
int >= 0,或 'auto',(默认 'auto')
|
对于 svd_solver == 'randomized' 计算的幂方法的迭代次数。
|
'auto'
|
源代码位于 tpot/builtin_modules/feature_transformers.py 中
| class ContinuousSelector(BaseEstimator, TransformerMixin):
"""Meta-transformer for selecting continuous features and transform them using PCA.
Parameters
----------
threshold : int, default=10
Maximum number of unique values per feature to consider the feature
to be categorical.
svd_solver : string {'auto', 'full', 'arpack', 'randomized'}
auto :
the solver is selected by a default policy based on `X.shape` and
`n_components`: if the input data is larger than 500x500 and the
number of components to extract is lower than 80% of the smallest
dimension of the data, then the more efficient 'randomized'
method is enabled. Otherwise the exact full SVD is computed and
optionally truncated afterwards.
full :
run exact full SVD calling the standard LAPACK solver via
`scipy.linalg.svd` and select the components by postprocessing
arpack :
run SVD truncated to n_components calling ARPACK solver via
`scipy.sparse.linalg.svds`. It requires strictly
0 < n_components < X.shape[1]
randomized :
run randomized SVD by the method of Halko et al.
iterated_power : int >= 0, or 'auto', (default 'auto')
Number of iterations for the power method computed by
svd_solver == 'randomized'.
"""
def __init__(self, threshold=10, svd_solver='randomized' ,iterated_power='auto', random_state=42):
"""Create a ContinuousSelector object."""
self.threshold = threshold
self.svd_solver = svd_solver
self.iterated_power = iterated_power
self.random_state = random_state
def fit(self, X, y=None):
"""Do nothing and return the estimator unchanged
This method is just there to implement the usual API and hence
work in pipelines.
Parameters
----------
X : array-like
"""
X = check_array(X)
return self
def transform(self, X):
"""Select continuous features and transform them using PCA.
Parameters
----------
X: numpy ndarray, {n_samples, n_components}
New data, where n_samples is the number of samples and n_components is the number of components.
Returns
-------
array-like, {n_samples, n_components}
"""
selected = auto_select_categorical_features(X, threshold=self.threshold)
_, X_sel, n_selected, _ = _X_selected(X, selected)
if n_selected == 0:
# No features selected.
raise ValueError('No continuous feature was found!')
else:
pca = PCA(svd_solver=self.svd_solver, iterated_power=self.iterated_power, random_state=self.random_state)
return pca.fit_transform(X_sel)
|
创建一个 ContinuousSelector 对象。
源代码位于 tpot/builtin_modules/feature_transformers.py 中
| def __init__(self, threshold=10, svd_solver='randomized' ,iterated_power='auto', random_state=42):
"""Create a ContinuousSelector object."""
self.threshold = threshold
self.svd_solver = svd_solver
self.iterated_power = iterated_power
self.random_state = random_state
|
不执行任何操作并返回未更改的估计器。此方法仅用于实现常用的 API,因此可在 pipeline 中工作。
参数
| 名称 |
类型 |
描述 |
默认值 |
X |
array - like
|
|
必需
|
源代码位于 tpot/builtin_modules/feature_transformers.py 中
| def fit(self, X, y=None):
"""Do nothing and return the estimator unchanged
This method is just there to implement the usual API and hence
work in pipelines.
Parameters
----------
X : array-like
"""
X = check_array(X)
return self
|
选择连续特征并使用 PCA 对其进行转换。
参数
| 名称 |
类型 |
描述 |
默认值 |
X |
|
新数据,其中 n_samples 是样本数量,n_components 是组件数量。
|
必需
|
返回值
| 类型 |
描述 |
(array - like, {n_samples, n_components})
|
|
源代码位于 tpot/builtin_modules/feature_transformers.py 中
| def transform(self, X):
"""Select continuous features and transform them using PCA.
Parameters
----------
X: numpy ndarray, {n_samples, n_components}
New data, where n_samples is the number of samples and n_components is the number of components.
Returns
-------
array-like, {n_samples, n_components}
"""
selected = auto_select_categorical_features(X, threshold=self.threshold)
_, X_sel, n_selected, _ = _X_selected(X, selected)
if n_selected == 0:
# No features selected.
raise ValueError('No continuous feature was found!')
else:
pca = PCA(svd_solver=self.svd_solver, iterated_power=self.iterated_power, random_state=self.random_state)
return pca.fit_transform(X_sel)
|