Ginko!

About this app

What is this?

This web app runs the pipeline to estimate the Genetic diversity Indicators from a Kobotoolbox form. Its purpose is to estimate the genetic diversity indicators by country and high-quality reports based on data that was captured in Kobo, using the template available at the Guideline materials and documentation for the Genetic Diversity Indicators of the monitoring framework for the Kunming-Montreal Global Biodiversity Framework.

Target users: practitioners or research teams that willing to assess the genetic diversity indicators by themselves and examine the results without having to process the data by themselves in R (due to lack of time, programming capacities, etc).

Background: Kobotoolbox is a tool for data collection in webforms. We used it in the pilot multicountry assessment of the genetic diversity indicators. In order to estimate the indicators, first the output of Kobo needs to be reformatted in tables to estimate the indicators for each country or species. This is currently done by a series of R functions that were previously developed, but running them requires knowledge of R and time, resources that are commonly lacking. So we made this app to do it for you :)

How to use this app

The app runs a total of four steps. Each step needs at least one input file, and produces a series of output files needed for the subsequent step. It also produces a detailed report for each step.

You can run all steps one after another. After each step runs, you will be able to download the output data and the report, and then continue to the next step. You can also run only one step, assuming that you have the input data needed for it.

To use the app, simply click on each of the boxes detailing each step below, and follow instructions.


00 - Upload data

Upload the raw output exported from KoboToolbox in .csv format as downloaded using these instructions.

Tips:


01 - Raw data quality check

Description


Step 1 - Quality check

This step takes as input the output from KoboToolbox in .csv format as downloaded using these instructions.

It will process this file to look for common sources of error and flags those records for manual revision by the assessors who capture data from each country. Specifically, it:

  1. Filters out records which were marked as “not_approved” in the manual Kobo validation interface (this means country assessors determined the is something wrong with that particular record).
  2. Filters out any data entries with the word “test”, as they are not real data.
  3. Checks for common data capture errors regarding the number of populations:
    • Are 0 correct?
    • should 999 be -999? (missing data label)
    • are extant/extint confused?
  4. Check GBIF ID codes to have the right number of digits
  5. Check genus, species and subspecies should be a single word.
  6. Flags the records that need manual review and potentially correction.
  7. Asks the user if she/he wants to keep the taxa flagged in the previous step, or if they should be filtered out.

Output:

  • a report of the quality check in html format
  • A .csv file (called kobo_output_tocheck.csv) showing the records that need manual review or corrections, if any.
  • A .csv file (called kobo_output_clean.csv) with the data after processing (records flagged in the previous file may or may not be included according to user choice).

If any entries need corrections, you have to go back to Kobo and correct the relevant entries. Once you are happy with how data looks, you can proceed to Step 2.


Parameters

Process

Result

Run status

        
Report preview
Download to-check CSV Download clean CSV Download report (HTML) Download step log
02 - Clean & extract indicators data

Description


Step 2 - Processing clean data to extract indicator data

This steps performs the following:

  1. it re-formats the data as outputed by Kobo to the shape needed to calculate each of the genetic diversity indicators. For example, in the Kobo output each species assessment is a single row, with population data in different columns, but to estimate the Ne indicator it is needed to have data of each population as a row. This script does that format transformation for you.

  2. and transforms Nc to Ne based on a custom Nc:Ne ratio.

Notice that at this stage the indicator values are not calculated. This script only re-formats the data from the kobo-output so that you can use these data to estimate the indicators by yourself outside R (e.g. in Excel or other software), or continue to step 3 if you want to use the R functions and standard analyses of this repository.

The input is the “clean kobo output” that was first cleaned in step 1.

Output

The output are the indicators data ready to be used to estimate the indicators.

  • indNe_data.csv file: data needed to estimate the Ne 500 indicator. Each population is a row and the population size data (either Ne or Nc) is provided in different columns.

  • indPM_data.csv file: data needed to estimate the PM indicator. Each row is a taxon of a single assessment, and the number of extant and extinct populations are provided.

  • indDNAbased_data file: data needed to estimate the genetic monitoring indicator (number of species in which genetic diversity has been or is being monitored using DNA-based methods). Each row is a taxon.

  • metadata.csv file: metadata for taxa and indicators, in some cases creating new useful variables, like taxon name (joining Genus, species, etc) and if the taxon was assessed only a single time or multiple times

Important note on transforming Nc to Ne data:

In the Kobo form, Ne and Nc data are collected as follows:

  • Ne (effective population size) from genetic analyses, ie by software like NeEstimator or Gone. The estimate and its lower an upper limits are stored as numbers in the columns Ne, NeLower, NeUpper. These columns are not modified during processing.

  • Nc (number of mature individuals) from point estimates, that is quantitative data with or without confidence intervals. The estimate and its lower an upper limits, if available, are stored as numbers in the columns NcPoint, NcLower, NcUpper.

  • Nc (number of mature individuals) from quantitative range or qualitative data, these are the ranges that in the kobo form show options like “<5,000 by much” or “< 5,000 but not by much (tens or a few hundred less)”. The estimate is stored as text in the column NcRange.

This steps uses the function transform_to_Ne() (see it here) to transform Nc estimates and their lower an upper estimates to Ne based on the Nc:Ne ratio the user decides.

For NcPoint, NcLower, NcUpper columns (Nc from point estimates) Nc is transformed to Ne done by multiplying them for the desired ratio.

For NcRangecolumns (Nc from quantitative range or qualitative data) the range options (text) are first translated to numbers following this rule:

  • “more_5000_bymuch” to 10000
  • “more_5000” to 5500
  • “less_5000_bymuch” to 500
  • “less_5000” to 4050
  • “range_includes_5000” to 5001

This is stored in the new column Nc_from_range. And then, to transform Nc to Ne it is multiplied for the desired ratio.

Regardless if the Nc data was NcPoint or NcRange, after transforming it to Ne it is stored in the column Ne_from_Nc. Notice that the column NcType (part of the Koboform original variables) states if Nc data came from NcPoint or NcRange. If the type as NcPoint and there were lower and upper intervals, they are also transformed to Ne and stored in the columns NeLower_from_Nc, NeUpper_from_Nc.

Finally, a new column Ne_combined is created combining data from Ne genetic estimates, with the Ne from transforming Nc using the ratio. For this, if both Ne from genetic data and from transforming Nc exist, the Ne from genetic data is given preference.

For transparency, the column Ne_calculated_from specifies for each population were the data to estimate Ne came from. Options are: “genetic data”, “NcPoint ratio”, and “NcRange ratio”, as explained above.


Parameters

Input data source
Upload a CSV file with cleaned Kobo data (output from Step 1 or equivalent).

This ratio is used to transform Nc (census size) to Ne (effective population size). Default value of 0.1 means Ne = Nc * 0.1

Process

Result

Run status

        
Output files
Download indNe_data.csv Download indPM_data.csv Download indDNAbased_data.csv Download metadata.csv Download step log
03 - Estimate indicators

Description


Step 3 - Estimate indicators

This step estimates the Genetid Diversity Indiciators for each of the assessments (e.g. for each species assessed). The output includes the Ne 500 and PM indicator value per record along with kew metadata in a single large table.

The input are the files produced by step 2:

  • indNe_data.csv
  • indPM_data.csv
  • indDNAbased_data.csv
  • metadata.csv

Ne 500 indicator

The Ne 500 indicator es estimated by dividing “the number of populations whithin a species with Ne > 500” over “the number of populations within a species with data to estimate Ne”.

Here the indicator is estimated not by taxon but by X_uuid (unique record of a taxon), because a single taxon could be assessed by different countries or more than once with different parameters).

This is done with the function estimate_indicatorNe() (see it here).

PM indicator

The Proportion of Maintained populations (PM indicator) is the he proportion of populations within species which are maintained. This can be estimated based on the n_extant_populations and n_extinct_populations, as follows:

n_extant_populations / n_extant_populations + n_extinct_populations.

DNA-based genetic monitoring indicator

This indicator refers to the number (count) of taxa by country in which genetic monitoring based on DNA-methods is occurring. This is stored in the variable ´temp_gen_monitoring´ as a “yes/no” answer for each taxon, so to estimate the indicator, we only need to count how many said “yes”, keeping only one of the records when the taxon was multiassessed.

Output:

  • A .csv file (called indicators_full.csv) The PM and Ne 500 indicators and the metadata in a single large table, in which each row is a taxon assessed.
  • A report of the step in html format, where you can also see the header of the indicator values

Parameters

Input data source
Upload all 4 CSV files from Step 2 output (or equivalent).

This step uses the default Ne threshold of 500.

The Ne 500 indicator calculates the proportion of populations within species with an effective population size greater than 500.

Process

Result

Run status

        
Output files
Download indicators_full.csv Download indicatorNe.csv Download indicatorPM.csv Download indicatorDNAbased.csv Download step log
04 - Country report

Description


Step 4 - Country (or countries) report

This steps creates a report including plots, simple statistics summarizing the indicator values and a short introduction and interpretation of results.


Parameters

Input data source
Upload both CSV files from Step 3 output (or equivalent).

Enter the country name exactly as it appears in your data (lowercase, use underscores instead of spaces). Leave empty to use the first country found in the data.

Process

Result

Run status

        
Available countries

        
Output files
Download country_report.html Download step log