Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Automatic annotation using ARBA

Last modified August 10, 2020

UniProt's Automatic Annotation pipeline enhances the unreviewed records in UniProtKB by enriching them with automatic classification and annotation.

The Association-Rule-Based Annotator (ARBA) is one of the contributors to this pipeline. It is a multiclass learning system trained on expertly annotated entries in UniProtKB/Swiss-Prot. ARBA uses rule mining techniques to generate concise annotation models with the highest representativeness and coverage for annotation, based on the properties of InterPro group membership and taxonomy. ARBA employs a data exclusion set that censors data not suitable for computational annotation (such as specific biophysical or chemical properties) and generates human-readable rules for each release.

ARBA rules can annotate protein properties such as function, catalytic activity, pathway membership, subcellular location and protein names but feature predictions are currently excluded. Generating rules on-the-fly in this way allows rules to evolve along with the content of UniProtKB with little or no manual intervention. It also provides a constant supply of potential 'seed rules' which can be further developed by the curators into UniRule rules.

ARBA based evidence for UniProtKB annotation (example: Q3TWF3)

UniProtKB entries contain evidence tags that describe the provenance of a given annotation and provide links to a reference where applicable. When an annotation is added to an entry based on an automatic annotation ARBA rule, the evidence tag indicates this:

ARBA evidence tag

When you click on the tag, you see a link to the relevant ARBA annotation rule:

Link to ARBA annotation rule

Searching ARBA rules

The ARBA dataset is available from the UniProt website. In order to search the dataset to view rules of interest, click on the dropdown next to the search box and select 'ARBA'. Now enter a search term or rule ID. You can also use the advanced search to build your query.

UniProt namespace dropdown, including ARBA

Exploring the ARBA rule pages

Example

An ARBA rule page contains the unique ARBA ID, a link to the UniProtKB entries annotated by the rule, and the full rule with its conditions and annotations. A rule consists of a set of conditions and corresponding annotations that apply to a protein entry if the conditions are true.

ARBA rule page ARBA00004016

Conditions are listed on the left hand side of the rule page and annotations are on the right hand side. If a condition holds true then the corresponding annotation is applied. An ARBA rule only ever applies one annotation but can have multiple condition sets that lead to this annotation. Clicking on the conditions highlights the annotation and vice versa.

UniProt is an ELIXIR core data resource
Main funding by: National Institutes of Health

We'd like to inform you that we have updated our Privacy Notice to comply with Europe’s new General Data Protection Regulation (GDPR) that applies since 25 May 2018.

Do not show this banner again