pyckmeans.io package
Submodules
pyckmeans.io.c_interop module
- pyckmeans.io.c_interop.encode_nucleotides(alignment: numpy.ndarray) numpy.ndarray
Encode nucleotide alignment INPLACE.
- Parameters
- alignmentnumpy.ndarray
n*m numpy alignment, where n is the number of entries and m is the number of sites. Dtype must be ‘U1’ or ‘S’.
- Returns
- numpy.ndarray
The encoded alignment.
- Raises
- Exception
Raised if alignment has invalid dtype.
pyckmeans.io.csv module
csv
Comma Separated Value (CSV) input and output.
- exception pyckmeans.io.csv.IncompatibleNamesError
Bases:
Exception
- exception pyckmeans.io.csv.InvalidMatrixShapeError
Bases:
Exception
- pyckmeans.io.csv.read_csv_distmat(file_path: str, header: Optional[int] = 0, index_col: Optional[int] = 0, sep: str = ',', **kwargs) pyckmeans.distance.DistanceMatrix
Read distance matrix from CSV file.
- Parameters
- file_pathstr
Path to CSV file.
- headerOptional[int]
Determines the row in the CSV file containing sample names. Is passed to pandas.read_csv(). By default 0, meaning the first row.
- index_colOptional[int]
Determines the index column. By default, the first column is expected to contain sample names. Passed to pandas.read_csv().
- sepstr
Column separator, be default ‘,’. Passed to Passed to pandas.read_csv().
- **kwargs
Additional keyword arguments passed to pandas.read_csv().
- Returns
- ——-
- pyckmeans.distance.DistanceMatrix
DistanceMatrix object.
- Raises
- InvalidMatrixShapeError
Raised if matrix is not square.
- IncompatibleNamesError
Raised if column and row names do not match.
- pyckmeans.io.csv.write_csv_distmat(dist: pyckmeans.distance.DistanceMatrix, file_path: str, force: bool = False) None
Write DistanceMatrix object to CSV.
- Parameters
- distpyckmeans.distance.DistanceMatrix
DistanceMatrix object.
- file_pathstr
CSV file path.
- forcebool, optional
Force overwrite if file_path already exists, by default False
- Raises
- FileExistsError
Raised if file at file_path already exists and force is False.
- FileExistsError
Raised if file_path points to an existing directory.
pyckmeans.io.fasta module
fasta
Module for reading and writing FASTA files.
- exception pyckmeans.io.fasta.InvalidFastaAlignmentError
Bases:
Exception
- pyckmeans.io.fasta.read_fasta_alignment(fasta_file: str, dtype: Union[str, numpy.dtype] = 'U') Tuple[numpy.ndarray, numpy.ndarray]
Read fasta alignment file. This function expects the fasta to be a valid alignment, meaning that it should contain at least 2 sequences of the same length, including gaps.
- Parameters
- fasta_filestr
Path to a fasta file.
- dtype: Union[str, numpy.dtype]
Data type to use for the sequence array.
- Returns
- Tuple[numpy.ndarray, numpy.ndarray]
Tuple of sequences and names, each as numpy array.
- Raises
- InvalidFastaAlignmentError
Raised if less than 2 sequences are present in fasta_file.
- InvalidFastaAlignmentError
Raised if the sequences have different lengths.
pyckmeans.io.nucleotide_alignment module
nucleotide_alignment
Module for the representation of nucleotide alignments.
- exception pyckmeans.io.nucleotide_alignment.InvalidAlignmentCharacterError
Bases:
Exception
- exception pyckmeans.io.nucleotide_alignment.InvalidAlignmentFileExtensionError
Bases:
Exception
- exception pyckmeans.io.nucleotide_alignment.InvalidAlignmentFileFormatError
Bases:
Exception
- exception pyckmeans.io.nucleotide_alignment.InvalidSeqIORecordsError
Bases:
Exception
- class pyckmeans.io.nucleotide_alignment.NucleotideAlignment(names: Iterable[str], sequences: numpy.ndarray, copy: bool = False, fast_encoding: bool = False)
Bases:
objectClass for nucleotide alignments.
- Parameters
- namesList[str]
Sequence identifiers/names.
- sequencesnumpy.ndarray
n*m alignment matrix, where n is the number of entries and m is the number of sites.
- copybool
If True, sequences will be copied. If false, the NucleotideAlignment will use the original sequences, potentially modifying them.
- fast_encodingbool
If true, a fast nucleotide encoding method without error checking will be used. ATTENTION: This will modify sequences in place.
- Attributes
shapeshape
Methods
copy()Return a copy of the NucleotideAligment object.
distance([distance_type, pairwise_deletion])Calculate genetic distance.
drop_invariant_sites([in_place])Remove invariant sites from alignment.
from_bp_seqio_records(records[, fast_encoding])Build NucleotideAlignment from iterable of Bio.SeqRecord.SeqRecord.
from_file(file_path[, file_format, ...])Read nucleotide alignment from file.
- copy() pyckmeans.io.nucleotide_alignment.NucleotideAlignment
Return a copy of the NucleotideAligment object.
- Returns
- NucleotideAlignment
Copy of self.
- distance(distance_type: str = 'p', pairwise_deletion: bool = True) pyckmeans.distance.DistanceMatrix
Calculate genetic distance.
- Parameters
- distance_typestr, optional
Type of genetic distance to calculate, by default ‘p’. Available distance types are p-distances (‘p’), Jukes-Cantor distances (‘jc’), and Kimura 2-paramater distances (‘k2p’).
- pairwise_deletionbool
Use pairwise deletion as action to deal with missing data. If False, complete deletion is applied. Gaps (“-”, “~”, ” “), “?”, and ambiguous bases are treated as missing data.
- Returns
- ——-
- pyckmeans.distance.DistanceMatrix
n*n distance matrix.
- drop_invariant_sites(in_place: bool = False) pyckmeans.io.nucleotide_alignment.NucleotideAlignment
Remove invariant sites from alignment. Invariant sites are sites, where each entry has the same symbol.
- Parameters
- in_placebool, optional
Modify self in place, by default False
- Returns
- NucleotideAlignment
NucleotideAlignment without invariant sites. If in_place is set to True, self is returned.
- classmethod from_bp_seqio_records(records: Iterable[Bio.SeqRecord.SeqRecord], fast_encoding: bool = False) NucleotideAlignment
Build NucleotideAlignment from iterable of Bio.SeqRecord.SeqRecord. Such an iterable is, for example, returned by Bio.SeqIO.parse() or can be constructed using Bio.Align.MultipleSequenceAlignment().
- Parameters
- records: Iterable[‘Bio.SeqRecord.SeqRecord’]
Iterable of Bio.SeqRecord.SeqRecord. Such an iterable is, for example, returned by Bio.SeqIO.parse() or can be constructed using Bio.Align.MultipleSequenceAlignment().
- fast_encodingbool
If true, a fast nucleotide encoding method without error checking will be used.
- Returns
- NucleotideAlignment
NucleotideAlignment object.
- Raises
- InvalidSeqIORecordsError
Raised of sequences have different lengths.
- classmethod from_file(file_path: str, file_format='auto', fast_encoding=False) pyckmeans.io.nucleotide_alignment.NucleotideAlignment
Read nucleotide alignment from file.
- Parameters
- file_path: str
Path to alignment file.
- file_format: str
Alignment file format. Either “auto”, “fasta” or “phylip”. When “auto” the file format will be inferred based on the file extension.
- fast_encodingbool
If true, a fast nucleotide encoding method without error checking will be used.
- Returns
- Tuple[numpy.ndarray, numpy.ndarray]
Tuple of sequences and names, each as numpy array.
- Raises
- InvalidAlignmentFileExtensionError
Raised if file_format is “auto” and the file extension is not understood.
- InvalidAlignmentFileFormatError
Raised if an invalid file_format is passed.
- property shape: Tuple[int, int]
Get alignment dimensions/shapes.
- Returns
- Tuple[int, int]
Number of samples n, number of sites m
- pyckmeans.io.nucleotide_alignment.read_alignment(file_path: str, file_format: str = 'auto') pyckmeans.io.nucleotide_alignment.NucleotideAlignment
Read nucleotide alignment from file. Alias for NucleotideAlignment.from_file.
- Parameters
- file_path: str
Path to alignment file.
- file_format: str
Alignment file format. Either “auto”, “fasta” or “phylip”. When “auto” the file format will be inferred based on the file extension.
- Returns
- NucleotideAlignment
NucleotideAlignment instance.
- Raises
- InvalidAlignmentFileExtensionError
Raised if file_format is “auto” and the file extension is not understood.
- InvalidAlignmentFileFormatError
Raised if an invalid file_format is passed.
pyckmeans.io.phylip module
fasta
Module for reading and writing PHYLIP files.
- exception pyckmeans.io.phylip.IncompatibleNamesError
Bases:
Exception
- exception pyckmeans.io.phylip.InvalidPhylipAlignmentError
Bases:
Exception
- exception pyckmeans.io.phylip.InvalidPhylipMatrixError
Bases:
ExceptionInvalidPhylipMatrixTypeError
- pyckmeans.io.phylip.read_phylip_alignment(phylip_file: str, dtype: Union[str, numpy.dtype] = 'U') Tuple[numpy.ndarray, numpy.ndarray]
Read phylip alignment file. This function expects the phylip to be a valid alignment, meaning that it should contain at least 2 sequences of the same length, including gaps.
WARNING: whitespace characters in entry names are NOT supported.
- Parameters
- phylip_filestr
Path to a phylip file.
- dtype: Union[str, numpy.dtype]
Data type to use for the sequence array.
- Returns
- Tuple[numpy.ndarray, numpy.ndarray]
Tuple of sequences and names, each as numpy array.
- Raises
- InvalidPhylipAlignmentError
Raised if header is malformed.
- InvalidPhylipAlignmentError
Raised if less than 2 entries are present in phylip_file.
- InvalidPhylipAlignmentError
Raised if number of entries does not match header.
- pyckmeans.io.phylip.read_phylip_distmat(phylip_file: str) pyckmeans.distance.DistanceMatrix
Read distance matrix in PHYLIP format. Supports full and lower-triangle matrices.
- Parameters
- phylip_filestr
Path to distance file in phylip format.
- Returns
- pyckmeans.distance.DistanceMatrix
Distance matrix as pyckmeans.distance DistanceMatrix object.
- Raises
- InvalidPhylipMatrixError
Raised if the header is malformed.
- InvalidPhylipMatrixError
Raised if an empty line is encountered as second line.
- InvalidPhylipMatrixError
Raised if file format can neither be inferred as full nor as lower-triangle matrix.
- InvalidPhylipMatrixError
Raised if an empty line is encountered.
- InvalidPhylipMatrixError
Raised if expecting a full matrix but number of values does not match the header.
- InvalidPhylipMatrixError
Raised if an empty line is encountered.
- InvalidPhylipMatrixError
Raised if expecting lower-triangle matrix but number of values does not match the expected number of values for that entry.
- InvalidPhylipMatrixError
Raised if number of names does not match number of entries stated in the header.
- pyckmeans.io.phylip.write_phylip_distmat(dist: pyckmeans.distance.DistanceMatrix, file_path: str, force: bool = False) None
Write distance matrix to file in PHYLIP matrix format.
- Parameters
- distpyckmeans.distance.DistanceMatrix
Distance matrix as pyckmeans.distance DistanceMatrix object.
- file_pathstr
Output file path.
- forcebool, optional
Force overwrite if file exists, by default False
- Raises
- FileExistsError
Raised if file at file_path already exists and force is False.
- FileExistsError
Raised if file_path points to an existing directory.
- IncompatibleNamesError
Raised if names are incompatible with dist_mat.
Module contents
io
Module containing input and output functionality.