xenonpy.inverse.iqspr package
Submodules
xenonpy.inverse.iqspr.estimator module
- class xenonpy.inverse.iqspr.estimator.GaussianLogLikelihood(descriptor, *, targets={}, **estimators)[source]
Bases:
BaseLogLikelihood
Gaussian loglikelihood.
- Parameters:
descriptor (BaseFeaturizer or BaseDescriptor) – Descriptor calculator.
estimators (BaseEstimator) –
Gaussian estimators follow the scikit-learn style. These estimators must provide a method named
predict
which accesses descriptors as input and returns(mean, std)
in order. By default, BayesianRidge will be used.targets (dictionary) – Upper and lower bounds for each property to calculate the Gaussian CDF probability
- fit(smiles, y=None, *, X_scaler=None, y_scaler=None, **kwargs)[source]
Default - automatically remove NaN data rows
- Parameters:
y (pandas.DataFrame) – Target properties for training.
X_scaler (Scaler (optional, not implement)) – Scaler for transform X.
y_scaler (Scaler (optional, not implement)) – Scaler for transform y.
kwargs (dict) – Parameters pass to BayesianRidge initialization.
- log_likelihood(smis, *, log_0=-1000.0, **targets)[source]
Log likelihood
- Parameters:
- Returns:
log_likelihood – Estimated log-likelihood of each sample’s property values. Cannot be pd.Series!
- Return type:
pd.Dataframe of float (col - properties, row - samples)
- remove_estimator(*properties)[source]
Remove estimators from estimator set.
- Parameters:
properties (str) – The name of properties will be removed from estimator set.
- property timer
xenonpy.inverse.iqspr.iqspr module
- class xenonpy.inverse.iqspr.iqspr.IQSPR(*, estimator, modifier, r_ESS=1)[source]
Bases:
BaseSMC
SMC iqspr runner (assume data type of samples = list or np.array).
- Parameters:
estimator (BaseLogLikelihood or BaseLogLikelihoodSet) – Log likelihood estimator for given input samples.
modifier (BaseProposal) – Modify given input samples to new ones.
r_ESS (float) – r_ESS*sample_size = Upper threshold of ESS (effective sample size) using in SMC resampling. Resample will happen only if calculated ESS is smaller or equal to the upper threshold. As 1 <= ESS <= sample_size, picking any r_ESS < 1/sample_size will lead to never resample; picking any r_ESS >= 1 will lead to always resample. Default is 1, i.e., resample at each step of SMC.
- resample(sims, freq, size, p)[source]
Re-sample from given samples.
- Parameters:
X (list[object]) – Input samples for likelihood calculation. Can be changed to accept other data types.
size (int) – Resample size.
p (numpy.ndarray[float]) – The probabilities associated with each entry in X. If not given the sample assumes a uniform distribution over all entries.
- Returns:
re-sample – Re-sampling result.
- Return type:
- property estimator
- property modifier
- property timer
xenonpy.inverse.iqspr.iqspr4df module
- class xenonpy.inverse.iqspr.iqspr4df.IQSPR4DF(*, estimator, modifier, r_ESS=1, sample_col=None)[source]
Bases:
BaseSMC
SMC iqspr runner (assume data type of samples = pd.DataFrame).
- Parameters:
estimator (BaseLogLikelihood or BaseLogLikelihoodSet) – Log likelihood estimator for given input samples.
modifier (BaseProposal) – Modify given input samples to new ones.
r_ESS (float) – r_ESS*sample_size = Upper threshold of ESS (effective sample size) using in SMC resampling. Resample will happen only if calculated ESS is smaller or equal to the upper threshold. As 1 <= ESS <= sample_size, picking any r_ESS < 1/sample_size will lead to never resample; picking any r_ESS >= 1 will lead to always resample. Default is 1, i.e., resample at each step of SMC.
sample_col (list or str) – Name(s) of columns that will be used to extract unique samples in the unique function. Default is None, which means all columns are used.
- resample(sims, freq, size, p)[source]
Re-sample from given samples.
- Parameters:
X (list[object]) – Input samples for likelihood calculation. Can be changed to accept other data types.
size (int) – Resample size.
p (numpy.ndarray[float]) – The probabilities associated with each entry in X. If not given the sample assumes a uniform distribution over all entries.
- Returns:
re-sample – Re-sampling result.
- Return type:
- unique(x)[source]
- Parameters:
X (pd.DataFrame) – Input samples.
- Returns:
unique (pd.DataFrame) – The sorted unique samples.
unique_counts (np.ndarray of int) – The number of times each of the unique values comes up in the original array
- property estimator
- property modifier
- property timer
xenonpy.inverse.iqspr.modifier module
- exception xenonpy.inverse.iqspr.modifier.GetProbError(tmp_str, i_b, i_r)[source]
Bases:
ProposalError
- exception xenonpy.inverse.iqspr.modifier.MolConvertError(new_smi)[source]
Bases:
ProposalError
- exception xenonpy.inverse.iqspr.modifier.NGramTrainingError(error, smi)[source]
Bases:
ProposalError
- class xenonpy.inverse.iqspr.modifier.NGram(*, ngram_table=None, sample_order=(1, 10), del_range=(1, 10), min_len=1, max_len=1000, reorder_prob=0)[source]
Bases:
BaseProposal
N-Garm
- Parameters:
ngram_table (NGram table) – NGram table for modify SMILES.
sample_order (tuple[int, int] or int) – range of order of ngram table used during proposal, when given int, sample_order = (1, int)
del_range (tuple[int, int] or int) – range of random deletion of SMILES string during proposal, when given int, del_range = (1, int)
min_len (int) – minimum length of the extended SMILES, shall be smaller than the lower bound of the sample_order
max_len (int) – max length of the extended SMILES to be terminated from continuing modification
reorder_prob (float) – probability of the SMILES being reordered during proposal
- merge_table(*ngram_tab, weight=1, overwrite=True)[source]
Merge with a given NGram table
- Parameters:
ngram_tab (
NGram
) – the table(s) in the given NGram class variable(s) will be merged to the table in selfweight (int/float or list/tuple/np.array/pd.Series[int/float]) – a scalar/vector to scale the frequency in the given NGram table to be merged, must have the same length as ngram_tab
overwrite (boolean) – overwrite the original table (self) or not, do not recommend to be False (may have memory issue)
- Returns:
tmp_n_gram – merged NGram tables
- Return type:
- on_errors(error)[source]
- Parameters:
error (ProposalError) – Error object.
- proposal(smiles)[source]
Propose new SMILES based on the given SMILES. Make sure you always check the train_order against sample_order before using the proposal!
- remove_table(max_order=None)[source]
Remove estimators from estimator set.
- Parameters:
max_order (int) – max order to be left in the table, the rest is removed.
- split_table(cut_order)[source]
Split NGram table into two
- Parameters:
cut_order (int) – split NGram table between cut_order and cut_order+1
- Returns:
n_gram1 (NGram)
n_gram2 (NGram)
- property del_range
- property max_len
- property min_len
- property ngram_table
- property reorder_prob
- property sample_order
- property timer
Module contents
- exception xenonpy.inverse.iqspr.GetProbError(tmp_str, i_b, i_r)[source]
Bases:
ProposalError
- exception xenonpy.inverse.iqspr.MolConvertError(new_smi)[source]
Bases:
ProposalError
- exception xenonpy.inverse.iqspr.NGramTrainingError(error, smi)[source]
Bases:
ProposalError
- class xenonpy.inverse.iqspr.GaussianLogLikelihood(descriptor, *, targets={}, **estimators)[source]
Bases:
BaseLogLikelihood
Gaussian loglikelihood.
- Parameters:
descriptor (BaseFeaturizer or BaseDescriptor) – Descriptor calculator.
estimators (BaseEstimator) –
Gaussian estimators follow the scikit-learn style. These estimators must provide a method named
predict
which accesses descriptors as input and returns(mean, std)
in order. By default, BayesianRidge will be used.targets (dictionary) – Upper and lower bounds for each property to calculate the Gaussian CDF probability
- fit(smiles, y=None, *, X_scaler=None, y_scaler=None, **kwargs)[source]
Default - automatically remove NaN data rows
- Parameters:
y (pandas.DataFrame) – Target properties for training.
X_scaler (Scaler (optional, not implement)) – Scaler for transform X.
y_scaler (Scaler (optional, not implement)) – Scaler for transform y.
kwargs (dict) – Parameters pass to BayesianRidge initialization.
- log_likelihood(smis, *, log_0=-1000.0, **targets)[source]
Log likelihood
- Parameters:
- Returns:
log_likelihood – Estimated log-likelihood of each sample’s property values. Cannot be pd.Series!
- Return type:
pd.Dataframe of float (col - properties, row - samples)
- remove_estimator(*properties)[source]
Remove estimators from estimator set.
- Parameters:
properties (str) – The name of properties will be removed from estimator set.
- property timer
- class xenonpy.inverse.iqspr.IQSPR(*, estimator, modifier, r_ESS=1)[source]
Bases:
BaseSMC
SMC iqspr runner (assume data type of samples = list or np.array).
- Parameters:
estimator (BaseLogLikelihood or BaseLogLikelihoodSet) – Log likelihood estimator for given input samples.
modifier (BaseProposal) – Modify given input samples to new ones.
r_ESS (float) – r_ESS*sample_size = Upper threshold of ESS (effective sample size) using in SMC resampling. Resample will happen only if calculated ESS is smaller or equal to the upper threshold. As 1 <= ESS <= sample_size, picking any r_ESS < 1/sample_size will lead to never resample; picking any r_ESS >= 1 will lead to always resample. Default is 1, i.e., resample at each step of SMC.
- resample(sims, freq, size, p)[source]
Re-sample from given samples.
- Parameters:
X (list[object]) – Input samples for likelihood calculation. Can be changed to accept other data types.
size (int) – Resample size.
p (numpy.ndarray[float]) – The probabilities associated with each entry in X. If not given the sample assumes a uniform distribution over all entries.
- Returns:
re-sample – Re-sampling result.
- Return type:
- property estimator
- property modifier
- property timer
- class xenonpy.inverse.iqspr.IQSPR4DF(*, estimator, modifier, r_ESS=1, sample_col=None)[source]
Bases:
BaseSMC
SMC iqspr runner (assume data type of samples = pd.DataFrame).
- Parameters:
estimator (BaseLogLikelihood or BaseLogLikelihoodSet) – Log likelihood estimator for given input samples.
modifier (BaseProposal) – Modify given input samples to new ones.
r_ESS (float) – r_ESS*sample_size = Upper threshold of ESS (effective sample size) using in SMC resampling. Resample will happen only if calculated ESS is smaller or equal to the upper threshold. As 1 <= ESS <= sample_size, picking any r_ESS < 1/sample_size will lead to never resample; picking any r_ESS >= 1 will lead to always resample. Default is 1, i.e., resample at each step of SMC.
sample_col (list or str) – Name(s) of columns that will be used to extract unique samples in the unique function. Default is None, which means all columns are used.
- resample(sims, freq, size, p)[source]
Re-sample from given samples.
- Parameters:
X (list[object]) – Input samples for likelihood calculation. Can be changed to accept other data types.
size (int) – Resample size.
p (numpy.ndarray[float]) – The probabilities associated with each entry in X. If not given the sample assumes a uniform distribution over all entries.
- Returns:
re-sample – Re-sampling result.
- Return type:
- unique(x)[source]
- Parameters:
X (pd.DataFrame) – Input samples.
- Returns:
unique (pd.DataFrame) – The sorted unique samples.
unique_counts (np.ndarray of int) – The number of times each of the unique values comes up in the original array
- property estimator
- property modifier
- property timer
- class xenonpy.inverse.iqspr.NGram(*, ngram_table=None, sample_order=(1, 10), del_range=(1, 10), min_len=1, max_len=1000, reorder_prob=0)[source]
Bases:
BaseProposal
N-Garm
- Parameters:
ngram_table (NGram table) – NGram table for modify SMILES.
sample_order (tuple[int, int] or int) – range of order of ngram table used during proposal, when given int, sample_order = (1, int)
del_range (tuple[int, int] or int) – range of random deletion of SMILES string during proposal, when given int, del_range = (1, int)
min_len (int) – minimum length of the extended SMILES, shall be smaller than the lower bound of the sample_order
max_len (int) – max length of the extended SMILES to be terminated from continuing modification
reorder_prob (float) – probability of the SMILES being reordered during proposal
- merge_table(*ngram_tab, weight=1, overwrite=True)[source]
Merge with a given NGram table
- Parameters:
ngram_tab (
NGram
) – the table(s) in the given NGram class variable(s) will be merged to the table in selfweight (int/float or list/tuple/np.array/pd.Series[int/float]) – a scalar/vector to scale the frequency in the given NGram table to be merged, must have the same length as ngram_tab
overwrite (boolean) – overwrite the original table (self) or not, do not recommend to be False (may have memory issue)
- Returns:
tmp_n_gram – merged NGram tables
- Return type:
- on_errors(error)[source]
- Parameters:
error (ProposalError) – Error object.
- proposal(smiles)[source]
Propose new SMILES based on the given SMILES. Make sure you always check the train_order against sample_order before using the proposal!
- remove_table(max_order=None)[source]
Remove estimators from estimator set.
- Parameters:
max_order (int) – max order to be left in the table, the rest is removed.
- split_table(cut_order)[source]
Split NGram table into two
- Parameters:
cut_order (int) – split NGram table between cut_order and cut_order+1
- Returns:
n_gram1 (NGram)
n_gram2 (NGram)
- property del_range
- property max_len
- property min_len
- property ngram_table
- property reorder_prob
- property sample_order
- property timer