MDIntrinsicDimension - Intrinsic dimension estimation for molecular dynamics simulations
Many biological datasets, such as those derived from molecular dynamics (MD) simulations, are described by high-dimensional feature vectors. However, lot of these features are redundant, often due to correlations in the data. In such cases, the actual dynamics of the system can be projected onto a lower-dimensional manifold without significant loss of information.
The therm Intrinsic Dimension (ID) intuitively refers to the minimum number of variables needed to describe the essential structure of such a dataset. Under the manifold hypothesis, which assumes that high-dimensional data lie on a lower-dimensional manifold, ID estimation aims to determine the minimum dimensionality of said manifold.
This package is specifically designed to address the ID estimation of proteins molecular dynamics simulations.
Workflow
Provided a MD trajectory, the MDIntrinsicDimension package performs two things:
Computes the projection
Estimates ID
Projections are included in the ID estimation procedure as the initial step of dimensionality reduction to remove non-interesting movements such as global rigid-body translations and rotations. After this initial step, ID is computed.
Functionalities
It includes three functions and as many analysis modes:
intrinsic_dimension: ID on whole molecule.
section_id: ID on fixed sliding window along protein's sequence.
secondary_structure_id: ID on contiguous secondary structure elements.