跳到内容

Fss 节点

此文件是 TPOT 库的一部分。

TPOT 的当前版本由 Cedars-Sinai 的以下人员开发:- Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/) - Anil Saini (anil.saini@cshs.org) - Jose Hernandez (jgh9094@gmail.com) - Jay Moran (jay.moran@cshs.org) - Nicholas Matsumoto (nicholas.matsumoto@cshs.org) - Hyunjun Choi (hyunjun.choi@cshs.org) - Gabriel Ketron (gabriel.ketron@cshs.org) - Miguel E. Hernandez (miguel.e.hernandez@cshs.org) - Jason Moore (moorejh28@gmail.com)

TPOT 的原始版本主要由宾夕法尼亚大学的以下人员开发:- Randal S. Olson (rso@randalolson.com) - Weixuan Fu (weixuanf@upenn.edu) - Daniel Angell (dpa34@drexel.edu) - Jason Moore (moorejh28@gmail.com) - 以及许多其他慷慨的开源贡献者

TPOT 是自由软件:您可以根据自由软件基金会发布的 GNU 宽通用公共许可证(本许可证的第 3 版或您选择的任何后续版本)的条款重新分发和/或修改它。

分发 TPOT 是希望它会有用,但没有任何担保;甚至不包括对适销性或特定用途适用性的暗示担保。请参阅 GNU 宽通用公共许可证了解更多详情。

您应该已经随 TPOT 收到了 GNU 宽通用公共许可证的副本。如果未收到,请参阅 https://gnu.ac.cn/licenses/

FSSIndividual

基础:SklearnIndividual

源代码位于 tpot/search_spaces/nodes/fss_node.py
class FSSIndividual(SklearnIndividual):
    def __init__(   self,
                    subsets,
                    rng=None,
                ):

        """
        An individual for representing a specific FeatureSetSelector. 
        The FeatureSetSelector selects a feature list of list of predefined feature subsets.

        This instance will select one set initially. Mutation and crossover can swap the selected subset with another.

        Parameters
        ----------
        subsets : str or list, default=None
            Sets the subsets that the FeatureSetSeletor will select from if set as an option in one of the configuration dictionaries. 
            Features are defined by column names if using a Pandas data frame, or ints corresponding to indexes if using numpy arrays.
            - str : If a string, it is assumed to be a path to a csv file with the subsets. 
                The first column is assumed to be the name of the subset and the remaining columns are the features in the subset.
            - list or np.ndarray : If a list or np.ndarray, it is assumed to be a list of subsets (i.e a list of lists).
            - dict : A dictionary where keys are the names of the subsets and the values are the list of features.
            - int : If an int, it is assumed to be the number of subsets to generate. Each subset will contain one feature.
            - None : If None, each column will be treated as a subset. One column will be selected per subset.
        rng : int, np.random.Generator, optional
            The random number generator. The default is None.
            Only used to select the first subset.

        Returns
        -------
        None    
        """

        subsets = subsets
        rng = np.random.default_rng(rng)

        if isinstance(subsets, str):
            df = pd.read_csv(subsets,header=None,index_col=0)
            df['features'] = df.apply(lambda x: list([x[c] for c in df.columns]),axis=1)
            self.subset_dict = {}
            for row in df.index:
                self.subset_dict[row] = df.loc[row]['features']
        elif isinstance(subsets, dict):
            self.subset_dict = subsets
        elif isinstance(subsets, list) or isinstance(subsets, np.ndarray):
            self.subset_dict = {str(i):subsets[i] for i in range(len(subsets))}
        elif isinstance(subsets, int):
            self.subset_dict = {"{0}".format(i):i for i in range(subsets)}
        else:
            raise ValueError("Subsets must be a string, dictionary, list, int, or numpy array")

        self.names_list = list(self.subset_dict.keys())


        self.selected_subset_name = rng.choice(self.names_list)
        self.sel_subset = self.subset_dict[self.selected_subset_name]


    def mutate(self, rng=None):
        rng = np.random.default_rng(rng)
        #get list of names not including the current one
        names = [name for name in self.names_list if name != self.selected_subset_name]
        self.selected_subset_name = rng.choice(names)
        self.sel_subset = self.subset_dict[self.selected_subset_name]


    def crossover(self, other, rng=None):
        self.selected_subset_name = other.selected_subset_name
        self.sel_subset = other.sel_subset

    def export_pipeline(self, **kwargs):
        return FeatureSetSelector(sel_subset=self.sel_subset, name=self.selected_subset_name)


    def unique_id(self):
        id_str = "FeatureSetSelector({0})".format(self.selected_subset_name)
        return id_str

__init__(subsets, rng=None)

表示特定 FeatureSetSelector 的个体。FeatureSetSelector 从预定义的特征子集中选择一个特征列表。

此实例最初将选择一个集合。变异和交叉可以将其替换为另一个选定的子集。

参数

名称 类型 描述 默认值
subsets strlist

设置 FeatureSetSelector 将从中选择的子集,如果将其设置为其中一个配置字典中的选项。如果使用 Pandas 数据框,则特征由列名定义;如果使用 numpy 数组,则特征由对应的索引(整数)定义。- str:如果是字符串,则假定它是包含子集的 csv 文件的路径。第一列假定为子集的名称,其余列为子集中的特征。- list 或 np.ndarray:如果是 list 或 np.ndarray,则假定它是一个子集列表(即列表的列表)。- dict:一个字典,其中键是子集的名称,值是特征列表。- int:如果是整数,则假定它是要生成的子集数量。每个子集将包含一个特征。- None:如果为 None,则每列将被视为一个子集。每个子集将选择一列。

None
rng (int, Generator)

随机数生成器。默认值为 None。仅用于选择第一个子集。

None

返回

类型 描述
None
源代码位于 tpot/search_spaces/nodes/fss_node.py
def __init__(   self,
                subsets,
                rng=None,
            ):

    """
    An individual for representing a specific FeatureSetSelector. 
    The FeatureSetSelector selects a feature list of list of predefined feature subsets.

    This instance will select one set initially. Mutation and crossover can swap the selected subset with another.

    Parameters
    ----------
    subsets : str or list, default=None
        Sets the subsets that the FeatureSetSeletor will select from if set as an option in one of the configuration dictionaries. 
        Features are defined by column names if using a Pandas data frame, or ints corresponding to indexes if using numpy arrays.
        - str : If a string, it is assumed to be a path to a csv file with the subsets. 
            The first column is assumed to be the name of the subset and the remaining columns are the features in the subset.
        - list or np.ndarray : If a list or np.ndarray, it is assumed to be a list of subsets (i.e a list of lists).
        - dict : A dictionary where keys are the names of the subsets and the values are the list of features.
        - int : If an int, it is assumed to be the number of subsets to generate. Each subset will contain one feature.
        - None : If None, each column will be treated as a subset. One column will be selected per subset.
    rng : int, np.random.Generator, optional
        The random number generator. The default is None.
        Only used to select the first subset.

    Returns
    -------
    None    
    """

    subsets = subsets
    rng = np.random.default_rng(rng)

    if isinstance(subsets, str):
        df = pd.read_csv(subsets,header=None,index_col=0)
        df['features'] = df.apply(lambda x: list([x[c] for c in df.columns]),axis=1)
        self.subset_dict = {}
        for row in df.index:
            self.subset_dict[row] = df.loc[row]['features']
    elif isinstance(subsets, dict):
        self.subset_dict = subsets
    elif isinstance(subsets, list) or isinstance(subsets, np.ndarray):
        self.subset_dict = {str(i):subsets[i] for i in range(len(subsets))}
    elif isinstance(subsets, int):
        self.subset_dict = {"{0}".format(i):i for i in range(subsets)}
    else:
        raise ValueError("Subsets must be a string, dictionary, list, int, or numpy array")

    self.names_list = list(self.subset_dict.keys())


    self.selected_subset_name = rng.choice(self.names_list)
    self.sel_subset = self.subset_dict[self.selected_subset_name]

FSSNode

基础:SearchSpace

源代码位于 tpot/search_spaces/nodes/fss_node.py
class FSSNode(SearchSpace):
    def __init__(self,                     
                    subsets,
                ):
        """
        A search space for a FeatureSetSelector. 
        The FeatureSetSelector selects a feature list of list of predefined feature subsets.

        Parameters
        ----------
        subsets : str or list, default=None
            Sets the subsets that the FeatureSetSeletor will select from if set as an option in one of the configuration dictionaries. 
            Features are defined by column names if using a Pandas data frame, or ints corresponding to indexes if using numpy arrays.
            - str : If a string, it is assumed to be a path to a csv file with the subsets. 
                The first column is assumed to be the name of the subset and the remaining columns are the features in the subset.
            - list or np.ndarray : If a list or np.ndarray, it is assumed to be a list of subsets (i.e a list of lists).
            - dict : A dictionary where keys are the names of the subsets and the values are the list of features.
            - int : If an int, it is assumed to be the number of subsets to generate. Each subset will contain one feature.
            - None : If None, each column will be treated as a subset. One column will be selected per subset.

        Returns
        -------
        None    

        """

        self.subsets = subsets

    def generate(self, rng=None) -> SklearnIndividual:
        return FSSIndividual(   
            subsets=self.subsets,
            rng=rng,
            )

__init__(subsets)

FeatureSetSelector 的搜索空间。FeatureSetSelector 从预定义的特征子集中选择一个特征列表。

参数

名称 类型 描述 默认值
subsets strlist

设置 FeatureSetSelector 将从中选择的子集,如果将其设置为其中一个配置字典中的选项。如果使用 Pandas 数据框,则特征由列名定义;如果使用 numpy 数组,则特征由对应的索引(整数)定义。- str:如果是字符串,则假定它是包含子集的 csv 文件的路径。第一列假定为子集的名称,其余列为子集中的特征。- list 或 np.ndarray:如果是 list 或 np.ndarray,则假定它是一个子集列表(即列表的列表)。- dict:一个字典,其中键是子集的名称,值是特征列表。- int:如果是整数,则假定它是要生成的子集数量。每个子集将包含一个特征。- None:如果为 None,则每列将被视为一个子集。每个子集将选择一列。

None

返回

类型 描述
None
源代码位于 tpot/search_spaces/nodes/fss_node.py
def __init__(self,                     
                subsets,
            ):
    """
    A search space for a FeatureSetSelector. 
    The FeatureSetSelector selects a feature list of list of predefined feature subsets.

    Parameters
    ----------
    subsets : str or list, default=None
        Sets the subsets that the FeatureSetSeletor will select from if set as an option in one of the configuration dictionaries. 
        Features are defined by column names if using a Pandas data frame, or ints corresponding to indexes if using numpy arrays.
        - str : If a string, it is assumed to be a path to a csv file with the subsets. 
            The first column is assumed to be the name of the subset and the remaining columns are the features in the subset.
        - list or np.ndarray : If a list or np.ndarray, it is assumed to be a list of subsets (i.e a list of lists).
        - dict : A dictionary where keys are the names of the subsets and the values are the list of features.
        - int : If an int, it is assumed to be the number of subsets to generate. Each subset will contain one feature.
        - None : If None, each column will be treated as a subset. One column will be selected per subset.

    Returns
    -------
    None    

    """

    self.subsets = subsets