此文件是 TPOT 库的一部分。
TPOT 的当前版本由 Cedars-Sinai 的以下人员开发:- Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/) - Anil Saini (anil.saini@cshs.org) - Jose Hernandez (jgh9094@gmail.com) - Jay Moran (jay.moran@cshs.org) - Nicholas Matsumoto (nicholas.matsumoto@cshs.org) - Hyunjun Choi (hyunjun.choi@cshs.org) - Gabriel Ketron (gabriel.ketron@cshs.org) - Miguel E. Hernandez (miguel.e.hernandez@cshs.org) - Jason Moore (moorejh28@gmail.com)
TPOT 的原始版本主要由宾夕法尼亚大学的以下人员开发:- Randal S. Olson (rso@randalolson.com) - Weixuan Fu (weixuanf@upenn.edu) - Daniel Angell (dpa34@drexel.edu) - Jason Moore (moorejh28@gmail.com) - 以及许多其他慷慨的开源贡献者
TPOT 是自由软件:您可以根据自由软件基金会发布的 GNU 宽通用公共许可证(本许可证的第 3 版或您选择的任何后续版本)的条款重新分发和/或修改它。
分发 TPOT 是希望它会有用,但没有任何担保;甚至不包括对适销性或特定用途适用性的暗示担保。请参阅 GNU 宽通用公共许可证了解更多详情。
您应该已经随 TPOT 收到了 GNU 宽通用公共许可证的副本。如果未收到,请参阅 https://gnu.ac.cn/licenses/。
FSSIndividual
基础:SklearnIndividual
源代码位于 tpot/search_spaces/nodes/fss_node.py
| class FSSIndividual(SklearnIndividual):
def __init__( self,
subsets,
rng=None,
):
"""
An individual for representing a specific FeatureSetSelector.
The FeatureSetSelector selects a feature list of list of predefined feature subsets.
This instance will select one set initially. Mutation and crossover can swap the selected subset with another.
Parameters
----------
subsets : str or list, default=None
Sets the subsets that the FeatureSetSeletor will select from if set as an option in one of the configuration dictionaries.
Features are defined by column names if using a Pandas data frame, or ints corresponding to indexes if using numpy arrays.
- str : If a string, it is assumed to be a path to a csv file with the subsets.
The first column is assumed to be the name of the subset and the remaining columns are the features in the subset.
- list or np.ndarray : If a list or np.ndarray, it is assumed to be a list of subsets (i.e a list of lists).
- dict : A dictionary where keys are the names of the subsets and the values are the list of features.
- int : If an int, it is assumed to be the number of subsets to generate. Each subset will contain one feature.
- None : If None, each column will be treated as a subset. One column will be selected per subset.
rng : int, np.random.Generator, optional
The random number generator. The default is None.
Only used to select the first subset.
Returns
-------
None
"""
subsets = subsets
rng = np.random.default_rng(rng)
if isinstance(subsets, str):
df = pd.read_csv(subsets,header=None,index_col=0)
df['features'] = df.apply(lambda x: list([x[c] for c in df.columns]),axis=1)
self.subset_dict = {}
for row in df.index:
self.subset_dict[row] = df.loc[row]['features']
elif isinstance(subsets, dict):
self.subset_dict = subsets
elif isinstance(subsets, list) or isinstance(subsets, np.ndarray):
self.subset_dict = {str(i):subsets[i] for i in range(len(subsets))}
elif isinstance(subsets, int):
self.subset_dict = {"{0}".format(i):i for i in range(subsets)}
else:
raise ValueError("Subsets must be a string, dictionary, list, int, or numpy array")
self.names_list = list(self.subset_dict.keys())
self.selected_subset_name = rng.choice(self.names_list)
self.sel_subset = self.subset_dict[self.selected_subset_name]
def mutate(self, rng=None):
rng = np.random.default_rng(rng)
#get list of names not including the current one
names = [name for name in self.names_list if name != self.selected_subset_name]
self.selected_subset_name = rng.choice(names)
self.sel_subset = self.subset_dict[self.selected_subset_name]
def crossover(self, other, rng=None):
self.selected_subset_name = other.selected_subset_name
self.sel_subset = other.sel_subset
def export_pipeline(self, **kwargs):
return FeatureSetSelector(sel_subset=self.sel_subset, name=self.selected_subset_name)
def unique_id(self):
id_str = "FeatureSetSelector({0})".format(self.selected_subset_name)
return id_str
|
__init__(subsets, rng=None)
表示特定 FeatureSetSelector 的个体。FeatureSetSelector 从预定义的特征子集中选择一个特征列表。
此实例最初将选择一个集合。变异和交叉可以将其替换为另一个选定的子集。
参数
| 名称 |
类型 |
描述 |
默认值 |
subsets |
str 或 list
|
设置 FeatureSetSelector 将从中选择的子集,如果将其设置为其中一个配置字典中的选项。如果使用 Pandas 数据框,则特征由列名定义;如果使用 numpy 数组,则特征由对应的索引(整数)定义。- str:如果是字符串,则假定它是包含子集的 csv 文件的路径。第一列假定为子集的名称,其余列为子集中的特征。- list 或 np.ndarray:如果是 list 或 np.ndarray,则假定它是一个子集列表(即列表的列表)。- dict:一个字典,其中键是子集的名称,值是特征列表。- int:如果是整数,则假定它是要生成的子集数量。每个子集将包含一个特征。- None:如果为 None,则每列将被视为一个子集。每个子集将选择一列。
|
None
|
rng |
(int, Generator)
|
随机数生成器。默认值为 None。仅用于选择第一个子集。
|
None
|
返回
源代码位于 tpot/search_spaces/nodes/fss_node.py
| def __init__( self,
subsets,
rng=None,
):
"""
An individual for representing a specific FeatureSetSelector.
The FeatureSetSelector selects a feature list of list of predefined feature subsets.
This instance will select one set initially. Mutation and crossover can swap the selected subset with another.
Parameters
----------
subsets : str or list, default=None
Sets the subsets that the FeatureSetSeletor will select from if set as an option in one of the configuration dictionaries.
Features are defined by column names if using a Pandas data frame, or ints corresponding to indexes if using numpy arrays.
- str : If a string, it is assumed to be a path to a csv file with the subsets.
The first column is assumed to be the name of the subset and the remaining columns are the features in the subset.
- list or np.ndarray : If a list or np.ndarray, it is assumed to be a list of subsets (i.e a list of lists).
- dict : A dictionary where keys are the names of the subsets and the values are the list of features.
- int : If an int, it is assumed to be the number of subsets to generate. Each subset will contain one feature.
- None : If None, each column will be treated as a subset. One column will be selected per subset.
rng : int, np.random.Generator, optional
The random number generator. The default is None.
Only used to select the first subset.
Returns
-------
None
"""
subsets = subsets
rng = np.random.default_rng(rng)
if isinstance(subsets, str):
df = pd.read_csv(subsets,header=None,index_col=0)
df['features'] = df.apply(lambda x: list([x[c] for c in df.columns]),axis=1)
self.subset_dict = {}
for row in df.index:
self.subset_dict[row] = df.loc[row]['features']
elif isinstance(subsets, dict):
self.subset_dict = subsets
elif isinstance(subsets, list) or isinstance(subsets, np.ndarray):
self.subset_dict = {str(i):subsets[i] for i in range(len(subsets))}
elif isinstance(subsets, int):
self.subset_dict = {"{0}".format(i):i for i in range(subsets)}
else:
raise ValueError("Subsets must be a string, dictionary, list, int, or numpy array")
self.names_list = list(self.subset_dict.keys())
self.selected_subset_name = rng.choice(self.names_list)
self.sel_subset = self.subset_dict[self.selected_subset_name]
|
FSSNode
基础:SearchSpace
源代码位于 tpot/search_spaces/nodes/fss_node.py
| class FSSNode(SearchSpace):
def __init__(self,
subsets,
):
"""
A search space for a FeatureSetSelector.
The FeatureSetSelector selects a feature list of list of predefined feature subsets.
Parameters
----------
subsets : str or list, default=None
Sets the subsets that the FeatureSetSeletor will select from if set as an option in one of the configuration dictionaries.
Features are defined by column names if using a Pandas data frame, or ints corresponding to indexes if using numpy arrays.
- str : If a string, it is assumed to be a path to a csv file with the subsets.
The first column is assumed to be the name of the subset and the remaining columns are the features in the subset.
- list or np.ndarray : If a list or np.ndarray, it is assumed to be a list of subsets (i.e a list of lists).
- dict : A dictionary where keys are the names of the subsets and the values are the list of features.
- int : If an int, it is assumed to be the number of subsets to generate. Each subset will contain one feature.
- None : If None, each column will be treated as a subset. One column will be selected per subset.
Returns
-------
None
"""
self.subsets = subsets
def generate(self, rng=None) -> SklearnIndividual:
return FSSIndividual(
subsets=self.subsets,
rng=rng,
)
|
__init__(subsets)
FeatureSetSelector 的搜索空间。FeatureSetSelector 从预定义的特征子集中选择一个特征列表。
参数
| 名称 |
类型 |
描述 |
默认值 |
subsets |
str 或 list
|
设置 FeatureSetSelector 将从中选择的子集,如果将其设置为其中一个配置字典中的选项。如果使用 Pandas 数据框,则特征由列名定义;如果使用 numpy 数组,则特征由对应的索引(整数)定义。- str:如果是字符串,则假定它是包含子集的 csv 文件的路径。第一列假定为子集的名称,其余列为子集中的特征。- list 或 np.ndarray:如果是 list 或 np.ndarray,则假定它是一个子集列表(即列表的列表)。- dict:一个字典,其中键是子集的名称,值是特征列表。- int:如果是整数,则假定它是要生成的子集数量。每个子集将包含一个特征。- None:如果为 None,则每列将被视为一个子集。每个子集将选择一列。
|
None
|
返回
源代码位于 tpot/search_spaces/nodes/fss_node.py
| def __init__(self,
subsets,
):
"""
A search space for a FeatureSetSelector.
The FeatureSetSelector selects a feature list of list of predefined feature subsets.
Parameters
----------
subsets : str or list, default=None
Sets the subsets that the FeatureSetSeletor will select from if set as an option in one of the configuration dictionaries.
Features are defined by column names if using a Pandas data frame, or ints corresponding to indexes if using numpy arrays.
- str : If a string, it is assumed to be a path to a csv file with the subsets.
The first column is assumed to be the name of the subset and the remaining columns are the features in the subset.
- list or np.ndarray : If a list or np.ndarray, it is assumed to be a list of subsets (i.e a list of lists).
- dict : A dictionary where keys are the names of the subsets and the values are the list of features.
- int : If an int, it is assumed to be the number of subsets to generate. Each subset will contain one feature.
- None : If None, each column will be treated as a subset. One column will be selected per subset.
Returns
-------
None
"""
self.subsets = subsets
|