- ArthroVerse

↑ Top

Browse

The Browse menu is the main way to explore the contents of the database. It groups the data by the level you want to look at: protein Families, sequenced Samples, individual Scaffolds, the host the sample came from (By Host), or a geographic Map. Each view shares a common pattern: a filter form at the top, a sortable results table, and pagination controls at the bottom.

Families

Browse the protein families catalogued in ArthroVerse. Each row links to a detailed Family Info page with the representative sequence, taxonomic and order distributions, functional and structural annotations, and a 3D model where available. You can filter the list by:

Family name & Category — free-text search boxes.
Taxonomy — restrict to a chosen taxonomic group.
3D model quality — filter by the confidence of the predicted structure: High Medium Low
pLDDT and pTM score ranges — drag the sliders to keep only models within a confidence band (each runs from 0 to 1). Tick the checkbox to enable a range filter before searching.

Click any column header (Family, Members, pLDDT, …) to sort; click again to reverse direction. The Records and Page pills show how many results matched, and the Show selector changes how many rows appear per page.

Families browse page with filters and results table — The Families page: filter form, sortable columns, and per-family quality & confidence scores.

Samples

Samples (datasets) are the sequenced metagenomes, isolates, and metatranscriptomes the families were derived from. Each row shows its TaxonOID, sample name, type, host, and the number of families detected in it. Filter by:

Sample — search by name.
Host — select the host organism.
Category — Isolate, Metagenome, or Metatranscriptome.
Number of families — enable the range slider to keep only samples within a chosen family count.

Click a TaxonOID to open the full Sample Metadata page (sequencing details, project & publication links, environment, sampling conditions, assembly stats, and a map of the collection site). The families count links to the families found in that sample, and the Map column opens the exact coordinates in Google Maps.

Samples browse page — The Samples page, with the host-taxonomy filter and per-sample family counts.

Scaffolds

Scaffolds are the assembled contigs within each sample. The Scaffolds view lets you search across the whole database by Sample, Scaffold ID, Host, or Taxonomy, and sort by length or number of families.

Selecting a scaffold opens its Scaffold Information page, which summarises sample, taxonomy, environment, and lineage, and shows every protein family on that scaffold as a clickable honeycomb. Links out to IMG/JGI are provided for the original assembly record.

Scaffolds browse page and scaffold info — Scaffold search results; each scaffold links to its families shown as a honeycomb.

By Host

By Host lets you start from a host organism's taxonomy and drill down through Domain → Kingdom → Phylum → Class → Order → Family → Genus → Species. Each dropdown unlocks the next once a value is chosen, and a summary of your current selection appears below the form.

The results table lists every matching host-taxonomy record with its NCBI ID and counts of associated Families and Datasets. Use the Browse Families / Browse Datasets buttons (or the count links) to carry the host filter across to those views — you'll then see a Host Taxonomy filter summarised at the top of those pages.

By Host taxonomy drill-down page — The By Host page: cascading taxonomy dropdowns and a table of matching host records.

Map

The Map view plots samples geographically using their recorded latitude and longitude. Each marker is a sampling site; click a marker for its details. Maps also appear on individual Sample and Family pages to show where the underlying data was collected.

Note: some samples have no coordinate data and will not appear on the map.

Geographic map of sampling sites — Sampling locations plotted on an interactive map.

↑ Top

Sequence Search Tools

The Sequence Search Tools let you query the database with your own data — a sequence, a profile, a motif, or a structure — and find the matching protein families. Pick the tool that fits what you already have. Whichever you use, the hits link straight to the relevant Family Info page.

Diamond

Fast protein similarity search. Paste one or more amino-acid sequences (FASTA) and DIAMOND aligns them against the family representative sequences, returning the best hits ranked by alignment score and e-value. Use this when you have a sequence and want a quick "what is this similar to?" answer. A Load example button fills the box with a sample query.

HMMER by Sequence

Profile-based search starting from a sequence. Submit a query sequence and HMMER scores it against the family hidden Markov models, which is more sensitive to remote homologs than a plain alignment. Hits are reported with bit scores and e-values, each expandable to show the alignment.

HMMER by Model

If you already have an HMM profile — for example one downloaded from this site, or built from your own alignment — upload it here to search the database with your model directly. Useful for comparing a curated profile against ArthroVerse families.

Pattern Search

Search by a sequence motif or pattern (e.g. a PROSITE-style expression) rather than a full sequence. Enter the pattern and ArthroVerse returns families whose sequences contain a match, with the matched region highlighted in the consensus. Ideal for locating short conserved sites or signatures.

Structure Search

Search by 3D structure. Provide a structure file and the tool compares its fold against the family models, surfacing structurally similar families even when sequence similarity is low. Results connect through to the family's structure and feature viewers (PDBe Molstar, Feature Viewer) and external references such as PDB, CATH, and AlphaFold.

↑ Top

Statistics

The Statistics page gives a database-wide overview: total counts of families, samples, and scaffolds, distributions across taxonomic orders and sample types, and the spread of model quality and confidence scores. Use it to understand the overall scale and composition of ArthroVerse before drilling into individual records, or to cite summary figures.

↑ Top

Downloads

The Downloads section provides the underlying data for offline use. Depending on the record, you can obtain:

FASTA — family sequences, plus an aligned (MSA) version.
HMM — the hidden Markov model profile for a family.
CIF — the predicted 3D structure file.

Per-family download buttons also appear directly on each Family Info page, alongside the MSA, structure, and HMM logo viewers.

The Downloads page with the available data formats.

↑ Top

About

The About page describes the purpose and scope of ArthroVerse, how the protein families were constructed, the data sources used, and how to cite the resource. It also credits the tools that power the site's viewers and analyses — including DIAMOND, HMMER, PDBe Molstar, Feature Viewer, Skylign, and data from IMG/JGI, NCBI, Pfam/InterPro, CATH, and AlphaFold — along with contact details for the team.

↑ Top

Help

You're reading it. This Help page documents every section of the navigation menu. If something isn't covered here, or you spot an error in the data, please get in touch via the contact details on the About page.

Help & Documentation

Browse

Families

Samples

Scaffolds

By Host

Map

Sequence Search Tools

Diamond

HMMER by Sequence

HMMER by Model

Pattern Search

Structure Search

Statistics

Downloads

About

Help