Arthroverse Downloads
All datasets are available for free download. Click a file type to begin.
This TSV file contains a unique identifier for each family (e.g., IF00002, IF00067), the classification type of the family, the number of members within the family and the count of datasets and scaffolds associated with the family.
This TSV file contains metadata for genomic families. Each row represents a family and includes details such as the number and percentage of metagenomes, metatranscriptomes, and isolates, as well as taxonomy group distributions across Bacteria, Archaea, Eukaryota, Viruses and Unclassified groups.
This TSV file contains PFAM domain annotations for each family, including the PFAM hit, HMM alignment start/end positions, genomic start/end positions, and an accuracy score for each alignment.
This TSV file contains representative sequence data for each family, including the representative sequence length, average family sequence length, the sequence header, and the sequence itself.
This archive contains FASTA files for all protein families. Each file corresponds to a specific family and includes its protein sequences for use in alignment, annotation, and phylogenetic analyses.
This archive contains FASTA files of aligned sequences for all protein families, where each file includes the multiple sequence alignment of the representative protein sequences within a family.
This archive contains HMM profile files for all protein families — probabilistic models built from aligned sequences for use in sequence similarity searches and annotation.
This archive contains predicted protein structure files in CIF format for all protein families, each corresponding to the representative 3D structure of a family's protein sequence.