SALVIAS TaxonScrubber

Page history last edited by Anonymous 1 yr ago

 

SALVIAS TaxonScrubber

Summary

Type of tool

Application

Function

Data cleaning

Online / Desktop

Desktop

Computer infrastructure

Widows, MS Access

Development status

Dated. Version 1.2 September 2004

Time of use

Data preparation. When data is imported into ALA

Licence

GNU General Public Licence

A stand alone tool for correction and standardization of spelling of plant species names,1 and for detecting and flagging standard and non-standard species names.2

 

Description

SALVIAS TaxonScrubber is a stand-alone application for automated standardization of taxonomic names. In addition to removing spelling errors in species names, TaxonScrubber splits concatenated information (such as Genus + specific_epithet + Author) and stores each value in a separate field. This can be used to restructure flat-file specimen data prior to importing to a relational database. Although designed primarily for standardizing inventory data for the SALVIAS plots database, TaxonScrubber can be used whenever large numbers of taxonomic records need to be error-checked and reformatted.3

 

TaxonScrubber performs four basic actions:4

  • Splitting of concatenated fields.  Epithets and authorities contained in single fields are split into separate fields. For example, the input string "Quercus alba L." is split into three fields, Genus = "Quercus", Species_epithet = "alba", Sp_auth = "L.". TaxonScrubber can split up to two subspecific levels off of a single name (e.g., Quercus alba var. gunnisonii Torr. fo. Rugosa).
  • Recognition and removal of standard annotations.  TaxonScrubber contains an extensive library of Latin and English botanical annotations, their spelling variants, and abbreviations. Annotations such as "cf.", "aff.", "vel. sp. aff.", etc., are removed and stored in a separate field. Informal annotations of uncertainty, such as question marks, are treated as "cf." Any text not recognized as a standard annotation is stored in an additional annotation field, and flagged for inspection by the user.
  • Standardization of spelling. Once fields have been split, and extraneous text removed, TaxonScrubber matches names to a standard list of validly published names (currently, TaxonScrubber uses a world list of plant names; however, later releases of TaxonScrubber will have the option of loading name lists for other taxa). After flagging all names which match to the standard list, TaxonScrubber's "Hand scrub" utility provides pull-down menus for correcting remaining names to the standard world list. Names still unmatched at the end of the process can then be flagged as morphospecies names (e.g., Miconia sp.3), or as indets (e.g., Miconia sp.).
  • Standardization of higher taxonomy. TaxonScrubber standardizes all family names to match taxonomic concepts and spellings of the Missouri Botanical Garden's TROPICOS database. Future versions will allow the user to update higher taxonomy according to alternative taxonomic concepts (for example, APG familial concepts; see The Angiosperm Phylogeny Website).

 

During the scrubbing process, TaxonScrubber generates new fields containing the results of the splitting and cleaning process, and various "flag fields" indicating the status of each name component (Family, genus, specific epithet, etc). These fields may be retained or deleted as needed upon export of the formatted the cleaned file.

 

Other TaxonScrubber features5

  • File management. TaxonScrubber imports, names, backs up, and manages source files within the database environment. Original files are left untouched until the user has completed the scrubbing process, and chooses to export the scrubbed file and replace the original.
  • Archiving of source names. Prior to scrubbing, TaxonScrubber archives the original names, unchanged, for comparison with the "scrubbed versions". After scrubbing, these fields can be deleted--or not--at the user's discretion.
  • Hand-scrubbing. TaxonScrubber features tools for manual inspection of taxonomic fields, including filters which display only records containing selected standard annotations, and matching to pull-down menus of standard names or names within the original file.

 

Screen shot of SALVIAS TaxonScrubber6

 

TaxonScrubber was developed by Brad Boyle in the Department of Ecology and Evolutionary Biology at University of Arizona, with support from the Center for Applied Biodiversity Science at Conservation International.7

 

Function

  • Data cleaning and manipulation
    • Data cleaning – spelling, misnaming
    • File restructuring
  • Taxonomy
  • Provider interaction
    • Data preparation
  • User interface
    • Personal use
    • Raw data

 

Why use this tool?

  • To correct and standardise the spelling of plant species names

 

 Who will use this tool?

  • Data capture
  • Data providers
    • Institutions
    • Private collections
  • ALA infrastructure

 

 How will the tool be used?

Two files are required to run TaxonScrubber:

  • The main application – TaxonScrubber
  • Taxonomic database file – World plant list

 

World plant list is a lookup table for nearly 1 million plant names. Based on all names in a world list of vascular plant names from the Missouri Botanical Garden's TROPICOS database, with additional names of old world plants from the IPNI source databases. Compilation date: May 2003, reformatted for TaxonScrubber Ver. 1.2, Sept. 2004.8

 

  • Windows, MS Access
  • Desktop application
  • User input is required

     

 Where in the data chain could this tool be used?

  • Data source
  • ALA central
  • Pathways between these

 

 When could this tool be used?

  • Before data is made available to ALA
  • As data is imported into ALA for storage
  • While data is stored with ALA

 

Availability

 

Comments

  • TaxonScrubber does not appear to be able to run as a batch job
  • TaxonScrubber hasn’t been updated for three years.

 

Q&A with Brad Boyle, TaxonScrubber creator, January 2008 9

 

Is TaxonScrubber still being maintained?

Yes and no. I originally developed it for my own use for cleaning data for import to SALVIAS. However, enough people were interested in it that I decided to make it available over our website. Although I have since issued a couple of updates, mostly bug-fixes, I will probably not be doing any further development, mostly because I do not want to continue working with Microsoft Access/Visual Basic. That said, I provide limited advice from time to time to people needing help with using the application.

 

 If we were to use another Taxonomic database file, (for example a current extraction of the TROPICOS database, or a compilation of other species databases) is this possible/sensible?

Yes. The download would need to be reprocessed into the format which TaxonScrubber can read. Much would depend on the format of the original download, and whether or not it itself needs any cleaning. I would have to take a look at the list before committing to anything; if a lot of time would be involved, I would have to consider charging a consulting fee to compensate for time lost to other projects. Or, if you are familiar with programming in Access, you are welcome to try to produce a new reference database yourself.

 

Can TaxonScrubber be applied to other organisms eg fungi, animals?

Yes. Anything named with a Latin name. It's just a matter of having a taxonomic authority list. For example, I have used TaxonScrubber to check lists of North America birds against the AOU checklist.

 

 Is there scope for someone (you/us?) to alter the program - for example to run as a batch job?

You're certainly welcome to try. If you are familiar with Visual Basic, the code is pretty transparent (if inelegant). However, although it does not run in command line, TaxonScrubber is still essentially a "batch" program, in the sense that it can process thousands of names at once.

 

Future directions

One of the reasons that I am no longer actively developing TaxonScrubber in its current form is that I would like to move it to a platform-independent Open Source version that would run as both a stand-alone and on the web. I haven't started work on this yet, but hope to release an initial version before the end of this year (2008). It's all a matter of finding the time, as usual.

 

 


Comments (2)

Anonymous said

at 5:43 pm on Feb 3, 2008

TaxonScrubber does not appear to be able to run as a batch job

Anonymous said

at 5:43 pm on Feb 3, 2008

TaxonScrubber hasn’t been updated for three years.

You don't have permission to comment on this page.