跳到内容

遗传编码器

代码来源 https://github.com/EpistasisLab/autoqtl 本文件包含所有遗传编码器的类定义。所有遗传编码器类都继承了 Scikit learn BaseEstimator 和 TransformerMixin 类,以遵循 Scikit-learn 的范例。

DominantEncoder

基础: BaseEstimator, TransformerMixin

此类包含用于将输入特征编码为显性遗传模型(Dominant genetic model)的函数定义。使用的编码是 AA(0)->1, Aa(1)->1, aa(2)->0。

源代码位于 tpot/builtin_modules/genetic_encoders.py
class DominantEncoder(BaseEstimator, TransformerMixin):
    """This class contains the function definition for encoding the input features as a Dominant genetic model.
    The encoding used is AA(0)->1, Aa(1)->1, aa(2)->0. """

    def fit(self, X, y=None):
        """Do nothing and return the estimator unchanged.
        Dummy function to fit in with the sklearn API and hence work in pipelines.

        Parameters
        ----------
        X : array-like
        """
        return self

    def transform(self, X, y=None):
        """Transform the data by applying the Dominant encoding.

        Parameters
        ----------
        X : numpy ndarray, {n_samples, n_components}
            New data, where n_samples is the number of samples (number of individuals)
            and n_components is the number of components (number of features).
        y : None
            Unused

        Returns
        -------
        X_transformed: numpy ndarray, {n_samples, n_components}
            The encoded feature set
        """
        X = check_array(X)
        map = {0: 1, 1: 1, 2: 0}
        mapping_function = np.vectorize(lambda i: map[i] if i in map else i)

        X_transformed = mapping_function(X)

        return X_transformed

fit(X, y=None)

不执行任何操作并返回未更改的估计器。此虚拟函数用于与 sklearn API 兼容,因此可在流水线中使用。

参数

名称 类型 描述 默认值
X array - like
必需
源代码位于 tpot/builtin_modules/genetic_encoders.py
def fit(self, X, y=None):
    """Do nothing and return the estimator unchanged.
    Dummy function to fit in with the sklearn API and hence work in pipelines.

    Parameters
    ----------
    X : array-like
    """
    return self

transform(X, y=None)

通过应用显性编码转换数据。

参数

名称 类型 描述 默认值
X numpy ndarray, {样本数, 分量数}

新数据,其中样本数 (n_samples) 是样本(个体)的数量,分量数 (n_components) 是分量(特征)的数量。

必需
y None

未使用

None

返回值

名称 类型 描述
X_transformed numpy ndarray, {样本数, 分量数}

编码后的特征集

源代码位于 tpot/builtin_modules/genetic_encoders.py
def transform(self, X, y=None):
    """Transform the data by applying the Dominant encoding.

    Parameters
    ----------
    X : numpy ndarray, {n_samples, n_components}
        New data, where n_samples is the number of samples (number of individuals)
        and n_components is the number of components (number of features).
    y : None
        Unused

    Returns
    -------
    X_transformed: numpy ndarray, {n_samples, n_components}
        The encoded feature set
    """
    X = check_array(X)
    map = {0: 1, 1: 1, 2: 0}
    mapping_function = np.vectorize(lambda i: map[i] if i in map else i)

    X_transformed = mapping_function(X)

    return X_transformed

HeterosisEncoder

基础: BaseEstimator, TransformerMixin

此类包含用于将输入特征编码为杂合优势遗传模型(Heterozygote Advantage genetic model)的函数定义。使用的编码是 AA(0)->0, Aa(1)->1, aa(2)->0。

源代码位于 tpot/builtin_modules/genetic_encoders.py
class HeterosisEncoder(BaseEstimator, TransformerMixin):
    """This class contains the function definition for encoding the input features as a Heterozygote Advantage genetic model.
    The encoding used is AA(0)->0, Aa(1)->1, aa(2)->0. """

    def fit(self, X, y=None):
        """Do nothing and return the estimator unchanged.
        Dummy function to fit in with the sklearn API and hence work in pipelines.

        Parameters
        ----------
        X : array-like
        """
        return self

    def transform(self, X, y=None):
        """Transform the data by applying the Heterosis encoding.

        Parameters
        ----------
        X : numpy ndarray, {n_samples, n_components}
            New data, where n_samples is the number of samples (number of individuals)
            and n_components is the number of components (number of features).
        y : None
            Unused

        Returns
        -------
        X_transformed: numpy ndarray, {n_samples, n_components}
            The encoded feature set
        """
        X = check_array(X)
        map = {0: 0, 1: 1, 2: 0}
        mapping_function = np.vectorize(lambda i: map[i] if i in map else i)

        X_transformed = mapping_function(X)

        return X_transformed

fit(X, y=None)

不执行任何操作并返回未更改的估计器。此虚拟函数用于与 sklearn API 兼容,因此可在流水线中使用。

参数

名称 类型 描述 默认值
X array - like
必需
源代码位于 tpot/builtin_modules/genetic_encoders.py
def fit(self, X, y=None):
    """Do nothing and return the estimator unchanged.
    Dummy function to fit in with the sklearn API and hence work in pipelines.

    Parameters
    ----------
    X : array-like
    """
    return self

transform(X, y=None)

通过应用杂合优势编码转换数据。

参数

名称 类型 描述 默认值
X numpy ndarray, {样本数, 分量数}

新数据,其中样本数 (n_samples) 是样本(个体)的数量,分量数 (n_components) 是分量(特征)的数量。

必需
y None

未使用

None

返回值

名称 类型 描述
X_transformed numpy ndarray, {样本数, 分量数}

编码后的特征集

源代码位于 tpot/builtin_modules/genetic_encoders.py
def transform(self, X, y=None):
    """Transform the data by applying the Heterosis encoding.

    Parameters
    ----------
    X : numpy ndarray, {n_samples, n_components}
        New data, where n_samples is the number of samples (number of individuals)
        and n_components is the number of components (number of features).
    y : None
        Unused

    Returns
    -------
    X_transformed: numpy ndarray, {n_samples, n_components}
        The encoded feature set
    """
    X = check_array(X)
    map = {0: 0, 1: 1, 2: 0}
    mapping_function = np.vectorize(lambda i: map[i] if i in map else i)

    X_transformed = mapping_function(X)

    return X_transformed

OverDominanceEncoder

基础: BaseEstimator, TransformerMixin

此类包含用于将输入特征编码为超显性遗传模型(Over Dominance genetic model)的函数定义。使用的编码是 AA(0)->1, Aa(1)->2, aa(2)->0。

源代码位于 tpot/builtin_modules/genetic_encoders.py
class OverDominanceEncoder(BaseEstimator, TransformerMixin):
    """This class contains the function definition for encoding the input features as a Over Dominance genetic model.
    The encoding used is AA(0)->1, Aa(1)->2, aa(2)->0. """

    def fit(self, X, y=None):
        """Do nothing and return the estimator unchanged.
        Dummy function to fit in with the sklearn API and hence work in pipelines.

        Parameters
        ----------
        X : array-like
        """
        return self

    def transform(self, X, y=None):
        """Transform the data by applying the Heterosis encoding.

        Parameters
        ----------
        X : numpy ndarray, {n_samples, n_components}
            New data, where n_samples is the number of samples (number of individuals)
            and n_components is the number of components (number of features).
        y : None
            Unused

        Returns
        -------
        X_transformed: numpy ndarray, {n_samples, n_components}
            The encoded feature set
        """
        X = check_array(X)
        map = {0: 1, 1: 2, 2: 0}
        mapping_function = np.vectorize(lambda i: map[i] if i in map else i)

        X_transformed = mapping_function(X)

        return X_transformed

fit(X, y=None)

不执行任何操作并返回未更改的估计器。此虚拟函数用于与 sklearn API 兼容,因此可在流水线中使用。

参数

名称 类型 描述 默认值
X array - like
必需
源代码位于 tpot/builtin_modules/genetic_encoders.py
def fit(self, X, y=None):
    """Do nothing and return the estimator unchanged.
    Dummy function to fit in with the sklearn API and hence work in pipelines.

    Parameters
    ----------
    X : array-like
    """
    return self

transform(X, y=None)

通过应用杂合优势编码转换数据。

参数

名称 类型 描述 默认值
X numpy ndarray, {样本数, 分量数}

新数据,其中样本数 (n_samples) 是样本(个体)的数量,分量数 (n_components) 是分量(特征)的数量。

必需
y None

未使用

None

返回值

名称 类型 描述
X_transformed numpy ndarray, {样本数, 分量数}

编码后的特征集

源代码位于 tpot/builtin_modules/genetic_encoders.py
def transform(self, X, y=None):
    """Transform the data by applying the Heterosis encoding.

    Parameters
    ----------
    X : numpy ndarray, {n_samples, n_components}
        New data, where n_samples is the number of samples (number of individuals)
        and n_components is the number of components (number of features).
    y : None
        Unused

    Returns
    -------
    X_transformed: numpy ndarray, {n_samples, n_components}
        The encoded feature set
    """
    X = check_array(X)
    map = {0: 1, 1: 2, 2: 0}
    mapping_function = np.vectorize(lambda i: map[i] if i in map else i)

    X_transformed = mapping_function(X)

    return X_transformed

RecessiveEncoder

基础: BaseEstimator, TransformerMixin

此类包含用于将输入特征编码为隐性遗传模型(Recessive genetic model)的函数定义。使用的编码是 AA(0)->0, Aa(1)->1, aa(2)->1。

源代码位于 tpot/builtin_modules/genetic_encoders.py
class RecessiveEncoder(BaseEstimator, TransformerMixin):
    """This class contains the function definition for encoding the input features as a Recessive genetic model.
    The encoding used is AA(0)->0, Aa(1)->1, aa(2)->1. """

    def fit(self, X, y=None):
        """Do nothing and return the estimator unchanged.
        Dummy function to fit in with the sklearn API and hence work in pipelines.

        Parameters
        ----------
        X : array-like
        """
        return self

    def transform(self, X, y=None):
        """Transform the data by applying the Recessive encoding.

        Parameters
        ----------
        X : numpy ndarray, {n_samples, n_components}
            New data, where n_samples is the number of samples (number of individuals)
            and n_components is the number of components (number of features).
        y : None
            Unused

        Returns
        -------
        X_transformed: numpy ndarray, {n_samples, n_components}
            The encoded feature set
        """
        X = check_array(X)
        map = {0: 0, 1: 1, 2: 1}
        mapping_function = np.vectorize(lambda i: map[i] if i in map else i)

        X_transformed = mapping_function(X)

        return X_transformed

fit(X, y=None)

不执行任何操作并返回未更改的估计器。此虚拟函数用于与 sklearn API 兼容,因此可在流水线中使用。

参数

名称 类型 描述 默认值
X array - like
必需
源代码位于 tpot/builtin_modules/genetic_encoders.py
def fit(self, X, y=None):
    """Do nothing and return the estimator unchanged.
    Dummy function to fit in with the sklearn API and hence work in pipelines.

    Parameters
    ----------
    X : array-like
    """
    return self

transform(X, y=None)

通过应用隐性编码转换数据。

参数

名称 类型 描述 默认值
X numpy ndarray, {样本数, 分量数}

新数据,其中样本数 (n_samples) 是样本(个体)的数量,分量数 (n_components) 是分量(特征)的数量。

必需
y None

未使用

None

返回值

名称 类型 描述
X_transformed numpy ndarray, {样本数, 分量数}

编码后的特征集

源代码位于 tpot/builtin_modules/genetic_encoders.py
def transform(self, X, y=None):
    """Transform the data by applying the Recessive encoding.

    Parameters
    ----------
    X : numpy ndarray, {n_samples, n_components}
        New data, where n_samples is the number of samples (number of individuals)
        and n_components is the number of components (number of features).
    y : None
        Unused

    Returns
    -------
    X_transformed: numpy ndarray, {n_samples, n_components}
        The encoded feature set
    """
    X = check_array(X)
    map = {0: 0, 1: 1, 2: 1}
    mapping_function = np.vectorize(lambda i: map[i] if i in map else i)

    X_transformed = mapping_function(X)

    return X_transformed

UnderDominanceEncoder

基础: BaseEstimator, TransformerMixin

此类包含用于将输入特征编码为亚显性遗传模型(Under Dominance genetic model)的函数定义。使用的编码是 AA(0)->2, Aa(1)->0, aa(2)->1。

源代码位于 tpot/builtin_modules/genetic_encoders.py
class UnderDominanceEncoder(BaseEstimator, TransformerMixin):
    """This class contains the function definition for encoding the input features as a Under Dominance genetic model.
    The encoding used is AA(0)->2, Aa(1)->0, aa(2)->1. """

    def fit(self, X, y=None):
        """Do nothing and return the estimator unchanged.
        Dummy function to fit in with the sklearn API and hence work in pipelines.

        Parameters
        ----------
        X : array-like
        """
        return self

    def transform(self, X, y=None):
        """Transform the data by applying the Heterosis encoding.

        Parameters
        ----------
        X : numpy ndarray, {n_samples, n_components}
            New data, where n_samples is the number of samples (number of individuals)
            and n_components is the number of components (number of features).
        y : None
            Unused

        Returns
        -------
        X_transformed: numpy ndarray, {n_samples, n_components}
            The encoded feature set
        """
        X = check_array(X)
        map = {0: 2, 1: 0, 2: 1}
        mapping_function = np.vectorize(lambda i: map[i] if i in map else i)

        X_transformed = mapping_function(X)

        return X_transformed

fit(X, y=None)

不执行任何操作并返回未更改的估计器。此虚拟函数用于与 sklearn API 兼容,因此可在流水线中使用。

参数

名称 类型 描述 默认值
X array - like
必需
源代码位于 tpot/builtin_modules/genetic_encoders.py
def fit(self, X, y=None):
    """Do nothing and return the estimator unchanged.
    Dummy function to fit in with the sklearn API and hence work in pipelines.

    Parameters
    ----------
    X : array-like
    """
    return self

transform(X, y=None)

通过应用杂合优势编码转换数据。

参数

名称 类型 描述 默认值
X numpy ndarray, {样本数, 分量数}

新数据,其中样本数 (n_samples) 是样本(个体)的数量,分量数 (n_components) 是分量(特征)的数量。

必需
y None

未使用

None

返回值

名称 类型 描述
X_transformed numpy ndarray, {样本数, 分量数}

编码后的特征集

源代码位于 tpot/builtin_modules/genetic_encoders.py
def transform(self, X, y=None):
    """Transform the data by applying the Heterosis encoding.

    Parameters
    ----------
    X : numpy ndarray, {n_samples, n_components}
        New data, where n_samples is the number of samples (number of individuals)
        and n_components is the number of components (number of features).
    y : None
        Unused

    Returns
    -------
    X_transformed: numpy ndarray, {n_samples, n_components}
        The encoded feature set
    """
    X = check_array(X)
    map = {0: 2, 1: 0, 2: 1}
    mapping_function = np.vectorize(lambda i: map[i] if i in map else i)

    X_transformed = mapping_function(X)

    return X_transformed