About this app

What is this?

This web app runs the pipeline to estimate the Genetic diversity Indicators from a Kobotoolbox form. Its purpose is to estimate the genetic diversity indicators by country and high-quality reports based on data that was captured in Kobo, using the template available at the Guideline materials and documentation for the Genetic Diversity Indicators of the monitoring framework for the Kunming-Montreal Global Biodiversity Framework.

Target users: practitioners or research teams that willing to assess the genetic diversity indicators by themselves and examine the results without having to process the data by themselves in R (due to lack of time, programming capacities, etc).

Background: Kobotoolbox is a tool for data collection in webforms. We used it in the pilot multicountry assessment of the genetic diversity indicators. In order to estimate the indicators, first the output of Kobo needs to be reformatted in tables to estimate the indicators for each country or species. This is currently done by a series of R functions that were previously developed, but running them requires knowledge of R and time, resources that are commonly lacking. So we made this app to do it for you :)

How to use this app

The app runs a total of four steps. Each step needs at least one input file, and produces a series of output files needed for the subsequent step. It also produces a detailed report for each step.

You can run all steps one after another. After each step runs, you will be able to download the output data and the report, and then continue to the next step. You can also run only one step, assuming that you have the input data needed for it.

Step 1 Raw data quality test: performs a quality check on the data from raw output exported from Kobo.
Step 2 Re-format and extract indicators data: re-formats the data from the kobo-output so that data can be used estimate the indicators. There is an output for each of the 3 indicators (Ne 500, PM, DNA-based monitoring).
Step 3 Estimate indicators: estimates the Genetid Diversity Indiciators for each of the assessments (e.g. for each species assessed). The output includes the Ne 500 and PM indicator value per record along with kew metadata in a single large table.
Step 4 Country report: creates a report including plots and simple statistics summarizing the indicator values.

To use the app, simply click on each of the boxes detailing each step below, and follow instructions.

00 - Upload data

Upload the raw output exported from KoboToolbox in .csv format as downloaded using these instructions.

Tips:

The file will be exported using UTF-8 encoded CSV.
Do not edit column names in the export.

Upload CSV file

Browse...

01 - Raw data quality check

Description

Step 1 - Quality check

This step takes as input the output from KoboToolbox in .csv format as downloaded using these instructions.

It will process this file to look for common sources of error and flags those records for manual revision by the assessors who capture data from each country. Specifically, it:

Filters out records which were marked as “not_approved” in the manual Kobo validation interface (this means country assessors determined the is something wrong with that particular record).
Filters out any data entries with the word “test”, as they are not real data.
Checks for common data capture errors regarding the number of populations:
- Are 0 correct?
- should 999 be -999? (missing data label)
- are extant/extint confused?
Check GBIF ID codes to have the right number of digits
Check genus, species and subspecies should be a single word.
Flags the records that need manual review and potentially correction.
Asks the user if she/he wants to keep the taxa flagged in the previous step, or if they should be filtered out.

Output:

a report of the quality check in html format
A .csv file (called kobo_output_tocheck.csv) showing the records that need manual review or corrections, if any.
A .csv file (called kobo_output_clean.csv) with the data after processing (records flagged in the previous file may or may not be included according to user choice).

If any entries need corrections, you have to go back to Kobo and correct the relevant entries. Once you are happy with how data looks, you can proceed to Step 2.

Parameters

Keep records flagged for manual review in the CLEAN output (keep_to_check)

Process

Result

Run status

Report preview

Download to-check CSV Download clean CSV Download report (HTML) Download step log

02 - Clean & extract indicators data

Description

Step 2 - Processing clean data to extract indicator data

This steps performs the following:

it re-formats the data as outputed by Kobo to the shape needed to calculate each of the genetic diversity indicators. For example, in the Kobo output each species assessment is a single row, with population data in different columns, but to estimate the Ne indicator it is needed to have data of each population as a row. This script does that format transformation for you.
and transforms Nc to Ne based on a custom Nc:Ne ratio.

Notice that at this stage the indicator values are not calculated. This script only re-formats the data from the kobo-output so that you can use these data to estimate the indicators by yourself outside R (e.g. in Excel or other software), or continue to step 3 if you want to use the R functions and standard analyses of this repository.

The input is the “clean kobo output” that was first cleaned in step 1.

Output

The output are the indicators data ready to be used to estimate the indicators.

indNe_data.csv file: data needed to estimate the Ne 500 indicator. Each population is a row and the population size data (either Ne or Nc) is provided in different columns.
indPM_data.csv file: data needed to estimate the PM indicator. Each row is a taxon of a single assessment, and the number of extant and extinct populations are provided.
indDNAbased_data file: data needed to estimate the genetic monitoring indicator (number of species in which genetic diversity has been or is being monitored using DNA-based methods). Each row is a taxon.
metadata.csv file: metadata for taxa and indicators, in some cases creating new useful variables, like taxon name (joining Genus, species, etc) and if the taxon was assessed only a single time or multiple times

Important note on transforming Nc to Ne data:

In the Kobo form, Ne and Nc data are collected as follows:

Ne (effective population size) from genetic analyses, ie by software like NeEstimator or Gone. The estimate and its lower an upper limits are stored as numbers in the columns Ne, NeLower, NeUpper. These columns are not modified during processing.
Nc (number of mature individuals) from point estimates, that is quantitative data with or without confidence intervals. The estimate and its lower an upper limits, if available, are stored as numbers in the columns NcPoint, NcLower, NcUpper.
Nc (number of mature individuals) from quantitative range or qualitative data, these are the ranges that in the kobo form show options like “<5,000 by much” or “< 5,000 but not by much (tens or a few hundred less)”. The estimate is stored as text in the column NcRange.

This steps uses the function transform_to_Ne() (see it here) to transform Nc estimates and their lower an upper estimates to Ne based on the Nc:Ne ratio the user decides.

For NcPoint, NcLower, NcUpper columns (Nc from point estimates) Nc is transformed to Ne done by multiplying them for the desired ratio.

For NcRangecolumns (Nc from quantitative range or qualitative data) the range options (text) are first translated to numbers following this rule:

“more_5000_bymuch” to 10000
“more_5000” to 5500
“less_5000_bymuch” to 500
“less_5000” to 4050
“range_includes_5000” to 5001

This is stored in the new column Nc_from_range. And then, to transform Nc to Ne it is multiplied for the desired ratio.

Regardless if the Nc data was NcPoint or NcRange, after transforming it to Ne it is stored in the column Ne_from_Nc. Notice that the column NcType (part of the Koboform original variables) states if Nc data came from NcPoint or NcRange. If the type as NcPoint and there were lower and upper intervals, they are also transformed to Ne and stored in the columns NeLower_from_Nc, NeUpper_from_Nc.

Finally, a new column Ne_combined is created combining data from Ne genetic estimates, with the Ne from transforming Nc using the ratio. For this, if both Ne from genetic data and from transforming Nc exist, the Ne from genetic data is given preference.

For transparency, the column Ne_calculated_from specifies for each population were the data to estimate Ne came from. Options are: “genetic data”, “NcPoint ratio”, and “NcRange ratio”, as explained above.

Parameters

Input data source

Use output from Step 1

Upload my own file

Upload kobo_output_clean.csv

Browse...

Upload a CSV file with cleaned Kobo data (output from Step 1 or equivalent).

Nc:Ne ratio (0.0-1.0)

This ratio is used to transform Nc (census size) to Ne (effective population size). Default value of 0.1 means Ne = Nc * 0.1

Process

Result

Run status

Output files

Download indNe_data.csv Download indPM_data.csv Download indDNAbased_data.csv Download metadata.csv Download step log

03 - Estimate indicators

Description

Step 3 - Estimate indicators

This step estimates the Genetid Diversity Indiciators for each of the assessments (e.g. for each species assessed). The output includes the Ne 500 and PM indicator value per record along with kew metadata in a single large table.

The input are the files produced by step 2:

indNe_data.csv
indPM_data.csv
indDNAbased_data.csv
metadata.csv

Ne 500 indicator

The Ne 500 indicator es estimated by dividing “the number of populations whithin a species with Ne > 500” over “the number of populations within a species with data to estimate Ne”.

Here the indicator is estimated not by taxon but by X_uuid (unique record of a taxon), because a single taxon could be assessed by different countries or more than once with different parameters).

This is done with the function estimate_indicatorNe() (see it here).

PM indicator

The Proportion of Maintained populations (PM indicator) is the he proportion of populations within species which are maintained. This can be estimated based on the n_extant_populations and n_extinct_populations, as follows:

n_extant_populations / n_extant_populations + n_extinct_populations.

DNA-based genetic monitoring indicator

This indicator refers to the number (count) of taxa by country in which genetic monitoring based on DNA-methods is occurring. This is stored in the variable ´temp_gen_monitoring´ as a “yes/no” answer for each taxon, so to estimate the indicator, we only need to count how many said “yes”, keeping only one of the records when the taxon was multiassessed.

Output:

A .csv file (called indicators_full.csv) The PM and Ne 500 indicators and the metadata in a single large table, in which each row is a taxon assessed.
A report of the step in html format, where you can also see the header of the indicator values

Parameters

Input data source

Use output from Step 2

Upload my own files

Upload indNe_data.csv

Browse...

Upload indPM_data.csv

Browse...

Upload indDNAbased_data.csv

Browse...

Upload metadata.csv

Browse...

Upload all 4 CSV files from Step 2 output (or equivalent).

This step uses the default Ne threshold of 500.

The Ne 500 indicator calculates the proportion of populations within species with an effective population size greater than 500.

Process

Result

Run status

Output files

Download indicators_full.csv Download indicatorNe.csv Download indicatorPM.csv Download indicatorDNAbased.csv Download step log

04 - Country report

Description

Step 4 - Country (or countries) report

This steps creates a report including plots, simple statistics summarizing the indicator values and a short introduction and interpretation of results.

Parameters

Input data source

Use output from Step 3

Upload my own files

Upload indicators_full.csv

Browse...

Upload indNe_data.csv

Browse...

Upload both CSV files from Step 3 output (or equivalent).

Country name

Enter the country name exactly as it appears in your data (lowercase, use underscores instead of spaces). Leave empty to use the first country found in the data.

Ginko!

What is this?

How to use this app

Description

Step 1 - Quality check

Output:

Parameters

Process

Result

Run status

Report preview

Description

Step 2 - Processing clean data to extract indicator data

Output

Important note on transforming Nc to Ne data:

Parameters

Input data source

Process

Result

Run status

Output files

Description

Step 3 - Estimate indicators

Ne 500 indicator

PM indicator

DNA-based genetic monitoring indicator

Output:

Parameters

Input data source

Process

Result

Run status

Output files

Description

Step 4 - Country (or countries) report

Parameters

Input data source

Process

Result

Run status

Available countries

Output files