xenonpy.inverse package
Subpackages
- xenonpy.inverse.iqspr package
- Submodules
- xenonpy.inverse.iqspr.estimator module
- xenonpy.inverse.iqspr.iqspr module
- xenonpy.inverse.iqspr.iqspr4df module
- xenonpy.inverse.iqspr.modifier module
GetProbError
MolConvertError
NGramTrainingError
NGram
NGram.add_char()
NGram.del_char()
NGram.esmi2smi()
NGram.fit()
NGram.get_prob()
NGram.merge_table()
NGram.modify()
NGram.on_errors()
NGram.proposal()
NGram.remove_table()
NGram.reorder_esmi()
NGram.sample_next_char()
NGram.smi2esmi()
NGram.smi2list()
NGram.split_table()
NGram.validator()
NGram.del_range
NGram.max_len
NGram.min_len
NGram.ngram_table
NGram.reorder_prob
NGram.sample_order
NGram.timer
- Module contents
GetProbError
MolConvertError
NGramTrainingError
GaussianLogLikelihood
IQSPR
IQSPR4DF
NGram
NGram.add_char()
NGram.del_char()
NGram.esmi2smi()
NGram.fit()
NGram.get_prob()
NGram.merge_table()
NGram.modify()
NGram.on_errors()
NGram.proposal()
NGram.remove_table()
NGram.reorder_esmi()
NGram.sample_next_char()
NGram.smi2esmi()
NGram.smi2list()
NGram.split_table()
NGram.validator()
NGram.del_range
NGram.max_len
NGram.min_len
NGram.ngram_table
NGram.reorder_prob
NGram.sample_order
NGram.timer
Submodules
xenonpy.inverse.base module
- exception xenonpy.inverse.base.LogLikelihoodError[source]
Bases:
Exception
Base exception for LogLikelihood classes
- exception xenonpy.inverse.base.ProposalError[source]
Bases:
Exception
Base exception for Proposal classes
- old_smi = None
- exception xenonpy.inverse.base.ResampleError[source]
Bases:
Exception
Base exception for Resample classes
- class xenonpy.inverse.base.BaseLogLikelihood[source]
Bases:
BaseEstimator
Abstract class to calculate likelihood of candidates from
pandas.DataFrame
descriptor data of the candidates generated by BaseFeaturizer or BaseDescriptor.Using a BaseLogLikelihood Class
BaseLogLikelihood()
requires user to define only the log_likelihood function. Use of log values is recommended in order to avoid typical underflow that may appear for small probability values.- log_likelihood(X, **targets)[source]
Log likelihood
- Parameters:
- Returns:
log_likelihood – Estimated log-likelihood of each sample’s property values. Cannot be pd.Series!
- Return type:
pd.Dataframe of float (col - properties, row - samples)
- property timer
- class xenonpy.inverse.base.BaseLogLikelihoodSet(*, loglikelihoods='all')[source]
Bases:
BaseEstimator
Abstract class to organize log-likelihoods.
Examples
class MyLogLikelihood(BaseLogLikelihoodSet): def __init__(self): super().__init__() self.loglike1 = SomeFeature1() self.loglike1 = SomeFeature2() self.loglike2 = SomeFeature3() self.loglike2 = SomeFeature4()
- Parameters:
loglikelihoods (list[str] or 'all') – log-likelihoods that will be used. Default is ‘all’.
- log_likelihood(X, **kwargs)[source]
Log likelihood
- Parameters:
X (list-like[object] or pd.DataFrame) – Input samples for likelihood calculation. For pd.DataFrame, if any column name matches any group name, the matched group(s) will be calculated with corresponding column(s); otherwise, the pd.DataFrame will be passed on as is.
kwargs (list[string]) – specified BaseLogLikelihood.
- Returns:
log_likelihood – Estimated log-likelihood of each sample’s property values.
- Return type:
pd.Dataframe of float (col - properties, row - samples)
- property all_loglikelihoods
- property elapsed
- property timer
- class xenonpy.inverse.base.BaseProposal[source]
Bases:
BaseEstimator
- property timer
- class xenonpy.inverse.base.BaseResample[source]
Bases:
BaseEstimator
Abstract class to resample candidates.
Using a BaseResample Class
BaseResample()
requires user to define only the resample function. Use of this function appears in BaseSMC, but can often be skipped, as BaseSMC may have its direct implementation inside the class, too.- resample(X, freq, size, p)[source]
Re-sample from given samples.
- Parameters:
X (list of object) – Input samples for likelihood calculation.
freq (list or np.array of int) – Frequency of each input sample
size (int) – Resample size.
p (np.ndarray of float) – The probabilities associated with each entry in X. If not given the sample assumes a uniform distribution over all entries.
- Returns:
new_sample – Re-sampling result.
- Return type:
- property timer
- class xenonpy.inverse.base.BaseSMC[source]
Bases:
BaseEstimator
Abstract class to iteratively generate and pick up high likelihood candidates based on BaseProposal, BaseLogLikelihood, and/or BaseResample classes.
Using a BaseSMC Class
BaseSMC()
provides a basic looping structure for user to implement algorithms that are in the form of sequential Monte Carlo or genetic algorithm. To avoid repeated calculation of log-likelihood of the same candidates, a unique function is required and implemented in the loop to pick up unique candidates, which may need to be able to adjust for different input type of the candidates. The default unique function of BaseSMC assumes candidates to be list or np.array.The set of candidates is assumed to be list-like, but other data types are allowed if they are compatible with other components (modifier, estimator, resample and unique functions).
- __call__(samples, beta, *, size=None, yield_lpf=False)[source]
Run SMC
- Parameters:
samples (list of object) – Initial samples. This variable can take other data type as long as it matches with other components, such as estimator, modifier, resample and unique functions.
beta (list/1D-numpy of float or pd.Dataframe) – Annealing parameters for each step. If pd.Dataframe, column names should follow keys of mdl in BaseLogLikeihood or BaseLogLikelihoodSet
size (int) – Sample size for each draw. Default is None, which means sample size will be the same as the length of samples
yield_lpf (bool) – Yield estimated log likelihood, probability and frequency of each samples. Default is
False
.
- Yields:
samples (list of object) – New samples in each SMC iteration. This variable can also be other data type, consistent with the input samples.
llh (np.ndarray float) – Estimated values of log-likelihood of each samples. Only yield when
yield_lpf=Ture
.p (np.ndarray of float) – Estimated probabilities of each samples. Only yield when
yield_lpf=Ture
.freq (np.ndarray of float) – The number of unique samples in original samples. Only yield when
yield_lpf=Ture
.
- resample(X, freq, size, p)[source]
Re-sample from given samples.
- Parameters:
X (list[object]) – Input samples for likelihood calculation. Can be changed to accept other data types.
size (int) – Resample size.
p (numpy.ndarray[float]) – The probabilities associated with each entry in X. If not given the sample assumes a uniform distribution over all entries.
- Returns:
re-sample – Re-sampling result.
- Return type:
- property timer