pyckmeans.io package

Submodules

pyckmeans.io.c_interop module

pyckmeans.io.c_interop.encode_nucleotides(alignment: numpy.ndarray) numpy.ndarray

Encode nucleotide alignment INPLACE.

Parameters
alignmentnumpy.ndarray

n*m numpy alignment, where n is the number of entries and m is the number of sites. Dtype must be ‘U1’ or ‘S’.

Returns
numpy.ndarray

The encoded alignment.

Raises
Exception

Raised if alignment has invalid dtype.

pyckmeans.io.csv module

csv

Comma Separated Value (CSV) input and output.

exception pyckmeans.io.csv.IncompatibleNamesError

Bases: Exception

exception pyckmeans.io.csv.InvalidMatrixShapeError

Bases: Exception

pyckmeans.io.csv.read_csv_distmat(file_path: str, header: Optional[int] = 0, index_col: Optional[int] = 0, sep: str = ',', **kwargs) pyckmeans.distance.DistanceMatrix

Read distance matrix from CSV file.

Parameters
file_pathstr

Path to CSV file.

headerOptional[int]

Determines the row in the CSV file containing sample names. Is passed to pandas.read_csv(). By default 0, meaning the first row.

index_colOptional[int]

Determines the index column. By default, the first column is expected to contain sample names. Passed to pandas.read_csv().

sepstr

Column separator, be default ‘,’. Passed to Passed to pandas.read_csv().

**kwargs

Additional keyword arguments passed to pandas.read_csv().

Returns
——-
pyckmeans.distance.DistanceMatrix

DistanceMatrix object.

Raises
InvalidMatrixShapeError

Raised if matrix is not square.

IncompatibleNamesError

Raised if column and row names do not match.

pyckmeans.io.csv.write_csv_distmat(dist: pyckmeans.distance.DistanceMatrix, file_path: str, force: bool = False) None

Write DistanceMatrix object to CSV.

Parameters
distpyckmeans.distance.DistanceMatrix

DistanceMatrix object.

file_pathstr

CSV file path.

forcebool, optional

Force overwrite if file_path already exists, by default False

Raises
FileExistsError

Raised if file at file_path already exists and force is False.

FileExistsError

Raised if file_path points to an existing directory.

pyckmeans.io.fasta module

fasta

Module for reading and writing FASTA files.

exception pyckmeans.io.fasta.InvalidFastaAlignmentError

Bases: Exception

pyckmeans.io.fasta.read_fasta_alignment(fasta_file: str, dtype: Union[str, numpy.dtype] = 'U') Tuple[numpy.ndarray, numpy.ndarray]

Read fasta alignment file. This function expects the fasta to be a valid alignment, meaning that it should contain at least 2 sequences of the same length, including gaps.

Parameters
fasta_filestr

Path to a fasta file.

dtype: Union[str, numpy.dtype]

Data type to use for the sequence array.

Returns
Tuple[numpy.ndarray, numpy.ndarray]

Tuple of sequences and names, each as numpy array.

Raises
InvalidFastaAlignmentError

Raised if less than 2 sequences are present in fasta_file.

InvalidFastaAlignmentError

Raised if the sequences have different lengths.

pyckmeans.io.nucleotide_alignment module

nucleotide_alignment

Module for the representation of nucleotide alignments.

exception pyckmeans.io.nucleotide_alignment.InvalidAlignmentCharacterError

Bases: Exception

exception pyckmeans.io.nucleotide_alignment.InvalidAlignmentFileExtensionError

Bases: Exception

exception pyckmeans.io.nucleotide_alignment.InvalidAlignmentFileFormatError

Bases: Exception

exception pyckmeans.io.nucleotide_alignment.InvalidSeqIORecordsError

Bases: Exception

class pyckmeans.io.nucleotide_alignment.NucleotideAlignment(names: Iterable[str], sequences: numpy.ndarray, copy: bool = False, fast_encoding: bool = False)

Bases: object

Class for nucleotide alignments.

Parameters
namesList[str]

Sequence identifiers/names.

sequencesnumpy.ndarray

n*m alignment matrix, where n is the number of entries and m is the number of sites.

copybool

If True, sequences will be copied. If false, the NucleotideAlignment will use the original sequences, potentially modifying them.

fast_encodingbool

If true, a fast nucleotide encoding method without error checking will be used. ATTENTION: This will modify sequences in place.

Attributes
shape

shape

Methods

copy()

Return a copy of the NucleotideAligment object.

distance([distance_type, pairwise_deletion])

Calculate genetic distance.

drop_invariant_sites([in_place])

Remove invariant sites from alignment.

from_bp_seqio_records(records[, fast_encoding])

Build NucleotideAlignment from iterable of Bio.SeqRecord.SeqRecord.

from_file(file_path[, file_format, ...])

Read nucleotide alignment from file.

copy() pyckmeans.io.nucleotide_alignment.NucleotideAlignment

Return a copy of the NucleotideAligment object.

Returns
NucleotideAlignment

Copy of self.

distance(distance_type: str = 'p', pairwise_deletion: bool = True) pyckmeans.distance.DistanceMatrix

Calculate genetic distance.

Parameters
distance_typestr, optional

Type of genetic distance to calculate, by default ‘p’. Available distance types are p-distances (‘p’), Jukes-Cantor distances (‘jc’), and Kimura 2-paramater distances (‘k2p’).

pairwise_deletionbool

Use pairwise deletion as action to deal with missing data. If False, complete deletion is applied. Gaps (“-”, “~”, ” “), “?”, and ambiguous bases are treated as missing data.

Returns
——-
pyckmeans.distance.DistanceMatrix

n*n distance matrix.

drop_invariant_sites(in_place: bool = False) pyckmeans.io.nucleotide_alignment.NucleotideAlignment

Remove invariant sites from alignment. Invariant sites are sites, where each entry has the same symbol.

Parameters
in_placebool, optional

Modify self in place, by default False

Returns
NucleotideAlignment

NucleotideAlignment without invariant sites. If in_place is set to True, self is returned.

classmethod from_bp_seqio_records(records: Iterable[Bio.SeqRecord.SeqRecord], fast_encoding: bool = False) NucleotideAlignment

Build NucleotideAlignment from iterable of Bio.SeqRecord.SeqRecord. Such an iterable is, for example, returned by Bio.SeqIO.parse() or can be constructed using Bio.Align.MultipleSequenceAlignment().

Parameters
records: Iterable[‘Bio.SeqRecord.SeqRecord’]

Iterable of Bio.SeqRecord.SeqRecord. Such an iterable is, for example, returned by Bio.SeqIO.parse() or can be constructed using Bio.Align.MultipleSequenceAlignment().

fast_encodingbool

If true, a fast nucleotide encoding method without error checking will be used.

Returns
NucleotideAlignment

NucleotideAlignment object.

Raises
InvalidSeqIORecordsError

Raised of sequences have different lengths.

classmethod from_file(file_path: str, file_format='auto', fast_encoding=False) pyckmeans.io.nucleotide_alignment.NucleotideAlignment

Read nucleotide alignment from file.

Parameters
file_path: str

Path to alignment file.

file_format: str

Alignment file format. Either “auto”, “fasta” or “phylip”. When “auto” the file format will be inferred based on the file extension.

fast_encodingbool

If true, a fast nucleotide encoding method without error checking will be used.

Returns
Tuple[numpy.ndarray, numpy.ndarray]

Tuple of sequences and names, each as numpy array.

Raises
InvalidAlignmentFileExtensionError

Raised if file_format is “auto” and the file extension is not understood.

InvalidAlignmentFileFormatError

Raised if an invalid file_format is passed.

property shape: Tuple[int, int]

Get alignment dimensions/shapes.

Returns
Tuple[int, int]

Number of samples n, number of sites m

pyckmeans.io.nucleotide_alignment.read_alignment(file_path: str, file_format: str = 'auto') pyckmeans.io.nucleotide_alignment.NucleotideAlignment

Read nucleotide alignment from file. Alias for NucleotideAlignment.from_file.

Parameters
file_path: str

Path to alignment file.

file_format: str

Alignment file format. Either “auto”, “fasta” or “phylip”. When “auto” the file format will be inferred based on the file extension.

Returns
NucleotideAlignment

NucleotideAlignment instance.

Raises
InvalidAlignmentFileExtensionError

Raised if file_format is “auto” and the file extension is not understood.

InvalidAlignmentFileFormatError

Raised if an invalid file_format is passed.

pyckmeans.io.phylip module

fasta

Module for reading and writing PHYLIP files.

exception pyckmeans.io.phylip.IncompatibleNamesError

Bases: Exception

exception pyckmeans.io.phylip.InvalidPhylipAlignmentError

Bases: Exception

exception pyckmeans.io.phylip.InvalidPhylipMatrixError

Bases: Exception

InvalidPhylipMatrixTypeError

pyckmeans.io.phylip.read_phylip_alignment(phylip_file: str, dtype: Union[str, numpy.dtype] = 'U') Tuple[numpy.ndarray, numpy.ndarray]

Read phylip alignment file. This function expects the phylip to be a valid alignment, meaning that it should contain at least 2 sequences of the same length, including gaps.

WARNING: whitespace characters in entry names are NOT supported.

Parameters
phylip_filestr

Path to a phylip file.

dtype: Union[str, numpy.dtype]

Data type to use for the sequence array.

Returns
Tuple[numpy.ndarray, numpy.ndarray]

Tuple of sequences and names, each as numpy array.

Raises
InvalidPhylipAlignmentError

Raised if header is malformed.

InvalidPhylipAlignmentError

Raised if less than 2 entries are present in phylip_file.

InvalidPhylipAlignmentError

Raised if number of entries does not match header.

pyckmeans.io.phylip.read_phylip_distmat(phylip_file: str) pyckmeans.distance.DistanceMatrix

Read distance matrix in PHYLIP format. Supports full and lower-triangle matrices.

Parameters
phylip_filestr

Path to distance file in phylip format.

Returns
pyckmeans.distance.DistanceMatrix

Distance matrix as pyckmeans.distance DistanceMatrix object.

Raises
InvalidPhylipMatrixError

Raised if the header is malformed.

InvalidPhylipMatrixError

Raised if an empty line is encountered as second line.

InvalidPhylipMatrixError

Raised if file format can neither be inferred as full nor as lower-triangle matrix.

InvalidPhylipMatrixError

Raised if an empty line is encountered.

InvalidPhylipMatrixError

Raised if expecting a full matrix but number of values does not match the header.

InvalidPhylipMatrixError

Raised if an empty line is encountered.

InvalidPhylipMatrixError

Raised if expecting lower-triangle matrix but number of values does not match the expected number of values for that entry.

InvalidPhylipMatrixError

Raised if number of names does not match number of entries stated in the header.

pyckmeans.io.phylip.write_phylip_distmat(dist: pyckmeans.distance.DistanceMatrix, file_path: str, force: bool = False) None

Write distance matrix to file in PHYLIP matrix format.

Parameters
distpyckmeans.distance.DistanceMatrix

Distance matrix as pyckmeans.distance DistanceMatrix object.

file_pathstr

Output file path.

forcebool, optional

Force overwrite if file exists, by default False

Raises
FileExistsError

Raised if file at file_path already exists and force is False.

FileExistsError

Raised if file_path points to an existing directory.

IncompatibleNamesError

Raised if names are incompatible with dist_mat.

Module contents

io

Module containing input and output functionality.