User Stats API #

The User Stats API provides comprehensive statistics about Project Sidewalk users and their contributions in Chicago, IL, including labels placed, distance explored, and validation activities. Each user is identified by an anonymized ID, which persists over time.

User Stats API Preview #

Below is a live preview of user statistics in Chicago, IL retrieved directly from the API, showing the distribution of user contributions and label accuracy.

Loading user stats data...

Endpoint#

Retrieve statistics for all registered users or filter based on specific criteria. See Query Parameters below.

GET /v3/api/userStats

Examples#

/v3/api/userStats?filetype=json Get all user stats for Chicago, IL in JSON (default)

/v3/api/userStats?filetype=csv Get all user stats for Chicago, IL in CSV

/v3/api/userStats?filetype=csv&highQualityOnly=true Get all user stats for users marked as high_quality (in CSV)

/v3/api/userStats?filetype=json&minLabels=10 Get all user stats for users with 10 labels or more (in JSON)

/v3/api/userStats?filetype=json&minLabels=10&min_accuracy=0.9 Get all user stats for users with 10 labels or more and a 90% accuracy or better (in JSON)

Quick Download #

Download user statistics data directly in your preferred format:

Query Parameters#

This endpoint accepts the following optional query parameters.

Parameter	Type	Description
`filetype`	`string`	Specify the output format. Options: `json` (default), `csv`.
`minLabels`	`integer`	Filter users with at least this many total labels. Default: 0 (no minimum).
`min_meters`	`number`	Filter users who have explored at least this many meters. Default: 0 (no minimum).
`min_accuracy`	`number`	Filter users with at least this label accuracy (0.0-1.0). Users without validation data will be excluded.
`highQualityOnly`	`boolean`	When set to `true`, only include users flagged as high quality contributors. Default: `false`.

Responses#

Success Response (200 OK)#

On success, the API returns an HTTP 200 OK status code and the requested data in the specified filetype format.

JSON Format (Default) #

Returns an array of user statistics objects, each representing a single user's contribution data:

[
    {
        "user_id": "bfab6670-0955-440c-abe8-01c2d20696ba",
        "labels": 27,
        "meters_explored": 154.8437957763672,
        "labels_per_meter": 0.17436927556991577,
        "high_quality": true,
        "high_quality_manual": null,
        "label_accuracy": 0.9545454382896423,
        "validated_labels": 22,
        "validations_received": 22,
        "labels_validated_correct": 21,
        "labels_validated_incorrect": 1,
        "labels_not_validated": 5,
        "validations_given": 20,
        "dissenting_validations_given": 5,
        "agree_validations_given": 14,
        "disagree_validations_given": 6,
        "unsure_validations_given": 0,
        "stats_by_label_type": {
            "curb_ramp": {
            "labels": 16,
            "validated_correct": 15,
            "validated_incorrect": 1,
            "not_validated": 0
            },
            "no_curb_ramp": {
                "labels": 0,
                "validated_correct": 0,
                "validated_incorrect": 0,
                "not_validated": 0
            },
            // ... other label types
        }
    },
    // ... more user statistics objects
]

JSON Field Descriptions #

Each user statistics object contains the following fields:

Field	Type	Description
`user_id`	`string`	Anonymized unique identifier for the user.
`labels`	`integer`	Total number of labels placed by the user.
`meters_explored`	`number`	Total distance explored by the user in meters.
`labels_per_meter`	`number \| null`	Average number of labels placed per meter explored, or null if no distance explored.
`high_quality`	`boolean`	Whether the user is flagged as a high-quality contributor based on algorithmic assessment.
`high_quality_manual`	`boolean \| null`	Manual override of high-quality status by administrators, or null if not set.
`label_accuracy`	`number \| null`	Accuracy of the user's labels based on validations, ranging from 0.0 to 1.0, or null if no validations.
`validated_labels`	`integer`	Number of the user's labels that have been validated by others.
`validations_received`	`integer`	Total number of validations received on the user's own labels.
`labels_validated_correct`	`integer`	Number of the user's labels validated as correct.
`labels_validated_incorrect`	`integer`	Number of the user's labels validated as incorrect.
`labels_not_validated`	`integer`	Number of the user's labels that have not been validated.
`validations_given`	`integer`	Total number of validations performed by the user on others' labels.
`dissenting_validations_given`	`integer`	Number of validations where the user disagreed with the majority.
`agree_validations_given`	`integer`	Number of validations where the user agreed with the label.
`disagree_validations_given`	`integer`	Number of validations where the user disagreed with the label.
`unsure_validations_given`	`integer`	Number of validations where the user was unsure about the label.
`stats_by_label_type`	`object`	Breakdown of statistics by label type.

Label Type Statistics Fields #

The stats_by_label_type object contains a key for each label type, with values that provide detailed statistics for that specific type of label:

Field	Type	Description
`stats_by_label_type.[type]`	`object`	Statistics for a specific label type (e.g., "curb_ramp", "obstacle"). The available label types match those in the Label Types API, but are provided in snake_case format.
`stats_by_label_type.[type].labels`	`integer`	Number of labels of this type placed by the user.
`stats_by_label_type.[type].validated_correct`	`integer`	Number of this type of label validated as correct.
`stats_by_label_type.[type].validated_incorrect`	`integer`	Number of this type of label validated as incorrect.
`stats_by_label_type.[type].not_validated`	`integer`	Number of this type of label not yet validated.

CSV Format #

If filetype=csv is specified, the response body will be CSV data. The first row contains the header fields, with the stats_by_label_type object flattened into individual columns for each label type and statistic.

user_id,labels,meters_explored,labels_per_meter,high_quality,high_quality_manual,label_accuracy,validated_labels,...
bfab6670-0955-440c-abe8-01c2d20696ba,27,154.8437957763672,0.17436927556991577,true,,0.9545454382896423,22,...
814f4169-98a1-4afa-80da-3b46be1da405,687,9898.09765625,0.06940727680921555,true,,0.8013029098510742,614,...
...

CSV Column Descriptions #

In CSV format, each row corresponds to a user, and the columns map to the JSON fields as follows:

The first set of columns match the top-level attributes from the JSON format (e.g., user_id, labels, meters_explored, etc.)
The label type statistics are flattened into a set of columns for each label type, with the naming pattern [label_type]_[statistic]
For example, curb_ramp_labels, curb_ramp_validated_correct, curb_ramp_validated_incorrect, curb_ramp_not_validated, etc.
This flattened structure makes it easier to import the data into spreadsheet applications and data analysis tools

Error Responses#

If an error occurs, the API will return an appropriate HTTP status code and a JSON response body containing details about the error.

400 Bad Request: Invalid parameter values.
404 Not Found: The requested resource does not exist.
500 Internal Server Error: An unexpected error occurred on the server.
503 Service Unavailable: The server is temporarily unable to handle the request.

Error Response Body #

Error responses include a JSON body with the following structure:

{
    "status": 400, // HTTP Status Code
    "code": "INVALID_PARAMETER", // Machine-readable error code
    "message": "Invalid value for filetype parameter. Expected 'csv' or 'json'.", // Human-readable description
    "parameter": "filetype" // Optional: The specific parameter causing the error
}

Data Analysis Tips #

The User Stats API provides rich data for analysis. Here are some tips for meaningful analysis:

Consider using minimum thresholds for label count and validated labels to ensure sufficient data for meaningful analysis
Look beyond just quantity - high label counts don't always equate to high-quality data
Analyze the relationship between labels per meter and accuracy to understand contribution thoroughness
Compare validation patterns across different types of labels to identify where quality issues might exist
Use the Label Types API to get proper color coding and descriptions for visualizations

Related APIs

For more comprehensive analysis, consider using the User Stats API in conjunction with:

Label Types API - Get information about the different types of accessibility issues
Raw Labels API - Access individual label data with geographic information and user ids
Label Clusters API - Work with clustered label data

Terms of Use #

If you use Project Sidewalk data in your research, please cite the following paper (awarded 🏆 Best Paper at CHI 2019):

Manaswi Saha, Michael Saugstad, Hanuma Teja Maddali, Aileen Zeng, Ryan Holland, Steven Bower, Aditya Dash, Sage Chen, Anthony Li, Kotaro Hara, and Jon Froehlich. 2019. Project Sidewalk: A Web-based Crowdsourcing Tool for Collecting Sidewalk Accessibility Data At Scale. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). Association for Computing Machinery, New York, NY, USA, Paper 62, 1–14. https://doi.org/10.1145/3290605.3300292

Contribute#

Project Sidewalk is an open-source project created by the Makeability Lab and hosted on GitHub. We welcome your contributions! If you found a bug or have a feature request, please open an issue on GitHub.

You can also email us at sidewalk@cs.uw.edu

Project Sidewalk in Your City!#

If you are interested in bringing Project Sidewalk to your city, please read our Wiki page.