跳到内容

特征集选择器

此文件是 TPOT 库的一部分。

TPOT 的当前版本由 Cedars-Sinai 的以下人员开发: - Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/) - Anil Saini (anil.saini@cshs.org) - Jose Hernandez (jgh9094@gmail.com) - Jay Moran (jay.moran@cshs.org) - Nicholas Matsumoto (nicholas.matsumoto@cshs.org) - Hyunjun Choi (hyunjun.choi@cshs.org) - Gabriel Ketron (gabriel.ketron@cshs.org) - Miguel E. Hernandez (miguel.e.hernandez@cshs.org) - Jason Moore (moorejh28@gmail.com)

TPOT 的原始版本主要由宾夕法尼亚大学的以下人员开发: - Randal S. Olson (rso@randalolson.com) - Weixuan Fu (weixuanf@upenn.edu) - Daniel Angell (dpa34@drexel.edu) - Jason Moore (moorejh28@gmail.com) - 以及许多慷慨的开源贡献者

TPOT 是自由软件:你可以根据自由软件基金会发布的 GNU 宽通用公共许可证的条款重新分发和/或修改它,无论是许可证的第 3 版,还是(由你选择)任何更高版本。

TPOT 的分发是希望它有用,但没有任何担保;甚至不包括适销性或特定用途适用性的默示担保。详情请参阅 GNU 宽通用公共许可证。

你应该已随 TPOT 收到一份 GNU 宽通用公共许可证的副本。如果没有,请访问 https://gnu.ac.cn/licenses/

FeatureSetSelector

基础类:BaseEstimator, SelectorMixin

选择预定义的特征子集。

源代码位于 tpot/builtin_modules/feature_set_selector.py
class FeatureSetSelector(BaseEstimator, SelectorMixin):
    """
    Select predefined feature subsets.


    """

    def __init__(self, sel_subset=None, name=None):
        """Create a FeatureSetSelector object.

        Parameters
        ----------
        sel_subset: list or int
            If X is a dataframe, items in sel_subset list must correspond to column names
            If X is a numpy array, items in sel_subset list must correspond to column indexes
            int: index of a single column
        Returns
        -------
        None

        """
        self.name = name
        self.sel_subset = sel_subset


    def fit(self, X, y=None):
        """Fit FeatureSetSelector for feature selection

        Parameters
        ----------
        X: array-like of shape (n_samples, n_features)
            The training input samples.
        y: array-like, shape (n_samples,)
            The target values (integers that correspond to classes in classification, real numbers in regression).

        Returns
        -------
        self: object
            Returns a copy of the estimator
        """
        if isinstance(self.sel_subset, int) or isinstance(self.sel_subset, str):
            self.sel_subset = [self.sel_subset]

        #generate  self.feat_list_idx
        if isinstance(X, pd.DataFrame):
            self.feature_names_in_ = X.columns.tolist()
            self.feat_list_idx = sorted([self.feature_names_in_.index(feat) for feat in self.sel_subset])


        elif isinstance(X, np.ndarray):
            self.feature_names_in_ = None#list(range(X.shape[1]))

            self.feat_list_idx = sorted(self.sel_subset)

        n_features = X.shape[1]
        self.mask = np.zeros(n_features, dtype=bool)
        self.mask[np.asarray(self.feat_list_idx)] = True

        return self

    #TODO keep returned as dataframe if input is dataframe? may not be consistent with sklearn

    # def transform(self, X):

    def _get_tags(self):
        tags = {"allow_nan": True, "requires_y": False}
        return tags

    def _get_support_mask(self):
        """
        Get the boolean mask indicating which features are selected
        Returns
        -------
        support : boolean array of shape [# input features]
            An element is True iff its corresponding feature is selected for
            retention.
        """
        return self.mask

__init__(sel_subset=None, name=None)

创建一个 FeatureSetSelector 对象。

参数

名称 类型 描述 默认值
sel_subset

如果 X 是一个数据框,sel_subset 列表中的项必须与列名对应。如果 X 是一个 numpy 数组,sel_subset 列表中的项必须与列索引对应。int: 单个列的索引

None

返回

类型 描述
None
源代码位于 tpot/builtin_modules/feature_set_selector.py
def __init__(self, sel_subset=None, name=None):
    """Create a FeatureSetSelector object.

    Parameters
    ----------
    sel_subset: list or int
        If X is a dataframe, items in sel_subset list must correspond to column names
        If X is a numpy array, items in sel_subset list must correspond to column indexes
        int: index of a single column
    Returns
    -------
    None

    """
    self.name = name
    self.sel_subset = sel_subset

fit(X, y=None)

拟合 FeatureSetSelector 进行特征选择

参数

名称 类型 描述 默认值
X

训练输入样本。

必需的
y

目标值(分类中对应类别的整数,回归中的实数)。

None

返回

名称 类型 描述
self object

返回估计器的副本

源代码位于 tpot/builtin_modules/feature_set_selector.py
def fit(self, X, y=None):
    """Fit FeatureSetSelector for feature selection

    Parameters
    ----------
    X: array-like of shape (n_samples, n_features)
        The training input samples.
    y: array-like, shape (n_samples,)
        The target values (integers that correspond to classes in classification, real numbers in regression).

    Returns
    -------
    self: object
        Returns a copy of the estimator
    """
    if isinstance(self.sel_subset, int) or isinstance(self.sel_subset, str):
        self.sel_subset = [self.sel_subset]

    #generate  self.feat_list_idx
    if isinstance(X, pd.DataFrame):
        self.feature_names_in_ = X.columns.tolist()
        self.feat_list_idx = sorted([self.feature_names_in_.index(feat) for feat in self.sel_subset])


    elif isinstance(X, np.ndarray):
        self.feature_names_in_ = None#list(range(X.shape[1]))

        self.feat_list_idx = sorted(self.sel_subset)

    n_features = X.shape[1]
    self.mask = np.zeros(n_features, dtype=bool)
    self.mask[np.asarray(self.feat_list_idx)] = True

    return self