- Loading...
- No images or files uploaded yet.
|
|
||||||||||||||||||||
Data Tester
Data Tester
A set of tools to assist in checking the quality of biodiversity datasets
Description A generic Java framework targeted to data cleaning and data validation. The idea behind this project has been originally conceived within the biodiversity informatics field. It followed the establishment of the first global networks that served primary data from biological collections. With the increase in the amount of shared data, which included researchers and policy makers among its users, data quality naturally gained importance. In this context, some networks started to develop tools and interfaces to help with data cleaning and data validation issues. The main idea of this project was to gather all knowledge from those first data cleaning tools and to produce a new framework that could serve as a common ground for implementing and running a large number of data tests.
The framework has been originally developed as open source software by the Reference Center on Environmental Information (CRIA) with funding from the Global Biodiversity Information Facility (GBIF) and the Gordon and Betty Moore Foundation. Despite being originated from the biodiversity informatics field, it is by no means bound or limited to this area. Its design pursued the following goals:
Two Java packages were created: one containing the framework itself, and another containing a set of generic tests that can be useful in different situations.1
Function This is a suite of data cleaning and data validation tools. Tests that can be executed include the following:2
Why use this tool? Data quality is extremely important to both data users and data providers.
Who will use this tool? DataTester can be employed directly by data providers, other portals or persons preparing to perform analyses on data retrieved. In fact, the software is not limited to biodiversity data types, but those in fields other than biodiversity informatics can add tests for the kinds of errors that might be found in their data sets.3
How will the tool be used? The software is particularly suited to reporting on XML data sets, but can be applied to other data formats or relational databases. It allows programmers to develop new tests and to generalize tests so that they can work against multiple data standards (e.g. Darwin Core and ABCD schema). Each test may be associated with a severity (error, warning, info) to make it easier to focus on the most significant issues.4
Written in Java, this is a desktop application. The tester comes as four files, all platform independent:
Where in the data chain could this tool be used?
When could this tool be used?
Availability
Comments GBIF have released three papers that discuss issues related to the quality of data:5
|
||||||||||||||||||
Comments (0)
You don't have permission to comment on this page.