BRT

Page history last edited by Anonymous 1 yr ago

 

BRT – Boosted Regression Trees

Summary

Type of tool

Application

Function

Species modelling

Online / Desktop

Desktop

Computer infrastructure

Windows, Unix, R

Development status

Version 1.6  August 2007

Time of use

As a post process, after data is with the use

Licence

 

BRT – Boosted Regression Trees, also called stochastic gradient boosting.

 

Description

Boosted regression trees combine two algorithms: “boosting” is a method for developing multiple models and combining them; “regression trees” are single models that partition the predictor space into disjoint regions and predict a separate constant value in each of them.

 

Boosting is used to overcome the inaccuracies of a single model, and makes it possible to model a complex response surface. Regression trees can use continuous and categorical predictor variables, allow for missing data, are not sensitive to outliers, tend to exclude irrelevant variables, and model interactions.

 

BRT are described in different ways in different disciplines. The foremost interpretation from the machine learning community is that it is a method for finding many rough rules of thumb (i.e. many regression trees) that, when combined, are more accurate than any single rule. The boosting algorithm calls the regression tree algorithm repeatedly, each time giving it a re-weighted version of the data that emphasizes the records that were misclassified in the last round. Finally the suite of trees is combined by weighted averaging. Statisticians have reinterpreted it as a method for developing a regression model in a forward stage-wise fashion, adding small modifications across the model space (via trees) to fit the data better. The final model has numerous terms, each term being a regression tree.

 

As boosting proceeds, the model complexity increases until eventually it over-fits the data. The number of trees in the boosted model is a natural measure of complexity, and is chosen by measuring prediction accuracy on independent data. This identifies the most complex model that still predicts well, and is based on the trade-off between training error and generalization error.1

 

Function

  • Analysis tools
  • User interface
    • Personal use
    • Raw data

 

Why use this tool?

  • Species modelling

 

Who will use this tool?

  • Data users
    • Expert
  • Special skills are required

 

How will the tool be used?

  • BRT is part of the GBM library of R
  • Desktop application
  • User input required

 

Where in the data chain could this tool be used?

  • User’s machine

 

When could this tool be used?

  • As a post process, after data is with the user

 

Availability

R Project for Statistical Computing

 

Comments

For species’ distribution modelling, BRT has been compared to other methods.2

 

 


1 Friedman and Meulman 2003 in Elith et al (2006) Novel methods improve prediction of species’ distributions from occurrence data Ecography 29: 129-151.

available at: http://www.blackwell-synergy.com/toc/eco/29/2

2 Elith et al (2006) Novel methods improve prediction of species’ distributions from occurrence data Ecography 29: 129-151. available at: http://www.blackwell-synergy.com/toc/eco/29/2

Comments (0)

You don't have permission to comment on this page.