Kepler

Page history last edited by Anonymous 1 yr ago

 

Kepler

Summary

Type of tool

Application

Function

Scientific workflow design

Online / Desktop

Modern desktop

Computer infrastructure

Java, platform independent

Development status

Beta version 1.0.0 beta3    January 2007

Time of use

Before ALA, during streaming, and as a post-process

Licence

Open source

Kepler is an open source software tool that allows scientists to design scientific workflows and execute them efficiently using emerging Grid-based approaches to distributed computation.1

 

Description

Kepler is a software application for the analysis and modeling of scientific data. Using Kepler's graphical interface and components, scientists with little background in computer science can create executable models called "scientific workflow," which are a flexible tool for accessing scientific data (streaming sensor data, medical and satellite images, simulation output, observational data, etc.) and executing complex analysis on the retrieved data.

Kepler is developed by a cross-project collaboration to develop open source tools to enable scientists to create and run computational experiments. 2

 

A scientific workflow.3

 

Each workflow consists of analytical steps that may involve database access and querying, data analysis and mining, and intensive computations performed on high performance cluster computers. Each workflow step is represented by an “actor,” a processing component that can be dragged and dropped into a workflow via Kepler’s visual interface. Connected actors and a few other components form a workflow, allowing scientists to inspect and display data on the fly as it is computed, make parameter changes as necessary, and re-run and reproduce experimental results.

 

Workflows may represent theoretical models or observational analyses; they can be simple and linear, or complex and non-linear. One of the benefits of scientific workflows is that they can be nested, meaning that a workflow can contain “sub-workflows” that perform embedded tasks. 4

 

Kepler builds upon the mature Ptolemy II framework, developed at the University of California, Berkeley.

"Ptolemy II is a software framework developed as part of the Ptolemy project, which studies modeling, simulation, and design of concurrent, real-time, embedded systems. 5

 

Kepler is designed to support numerous scientific domains, including bioinformatics, ecoinformatics, geoinformatics, and others.6

 

Kepler includes distributed computing technologies that allow scientists to share their data and workflows with other scientists and to use data and analytical workflows from others around the world. Kepler also provides access to a continually expanding, geographically distributed set of data repositories, computing resources, and workflow libraries (e.g., ecological data from field stations, specimen data from museum collections, data from the geosciences, etc.).7

 

Function

  • Data analysis  tools
    • Data cleaning
    • Data mining
  • Analysis tools
    • Simple
    • Complex

 

Why use this tool?

Kepler users with little background in computer science can create workflows with standard components, or modify existing workflows to suit their needs. Quantitative analysts can use the visual interface to create and share R and other statistical analyses. Users need not know how to program in R in order to take advantage of its powerful analytical features; pre-programmed Kepler components can simply be dragged into a visually represented workflow.8

 

 Who will use this tool?

  • Data creators
  • Data capture
  • Data users

 

 How will the tool be used?

  • Modern desktop
  • Java 1.4 or later required
  • Platform independent
  • Can use high performance clusters
  • User input required
  • Supports EcoGrid access

 

 Where in the data chain could this tool be used?

  • Data source
  • User’s machine

 

 When could this tool be used?

  • Before data is made available to ALA
  • At the time of a user request
  • As a post process, after data is with the user

 

Availability

 

Comments

What are the data requirements?

What ALA architecture is required to support the tool?

 


Comments (0)

You don't have permission to comment on this page.