This We Know

Citizen portal to data.gov

Screenshot

"The public’s right to know is a cornerstone of our democracy. [ThisWeKnow] is designed to make what was once a difficult and time-consuming process into a faster and more streamlined experience." - Senator Patrick Leahy

On March 5, 2009, shortly after his appointment as the first Federal Chief Information Officer, Vivek Kundra announced the creation of Data.gov.[2] Kundra said, "Data.gov will publish data feeds, so we'll have a vast array of data".[3] Recognizing the importance of this initiative, and the need to foster compelling applications enabled by the presence of Data.gov, the Sunlight Foundation sponsored the Apps for America: The data.gov challenge competition.

Green River and Sway Design have always believed there is an exponential value in breaking down data warehouse silos for increased understanding of our world. This dovetailed with President Obama’s goal for data.gov: “...to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.” The U.S. government has thousands of disparate databases developed by multiple agencies, but no front end to pull them together in ways that are useful for its citizens. For instance, there is no overarching way to find out all that the government knows about a given neighborhood. Our vision for ThisWeKnow was to provide a tool to integrate government data to answer citizens’ questions and empower citizens to act on what is known.

Our long-term vision for ThisWeKnow was creating a citizen portal into the entire data.gov catalog. ThisWeKnow aimed to provide citizens with a single destination where they can search and browse all the information the government collects. We also planned to provide other application developers with a powerful standards-based API for accessing the data.

Loading governmental databases into a single, flexible data store breaks down silos of information and facilitates inferences across multiple data stores. For example, inferences can be made by combining census demographic data from the Agency of Commerce, factory information from the Environmental Protection Agency, information about employment from the Department of Labor, and so on. Our prototype used geography as a lens into the data, but there are multiple useful facets to expose, such as: organizations, issues, and time. We have only begun to imagine the discoveries that will become possible after all these datasets are loaded into an integrated repository.

We were thrilled by the response to ThisWeKnow. The app was a finalist in the Apps for America competition. We were invited, along with the other finalists, to a lunch with the nation’s Chief Information Officer, Vivek Kundra. At that lunch, we discussed Vivek’s desire to to leverage the enthusiasm and energy of the open source community to serve the public, and the obstacles to that vision. We were written up in our local paper, and the next thing we knew, Senator Leahy was describing ThisWeKnow on the floor of the Senate (link).

ThisWeKnow had over 50 million rows of data, comprising US Census, EPA, Dept of Labor, Federal Register, and other sources. While it may have been an accomplishment to load so much data and build this application in our spare time on nights and weekends, we believe we were, at most, only 20% of the way to a feature complete release of a front end to Data.gov. Our next phase of work on ThisWeKnow would include loading more datasets, data mopping, calculations of rates, providing users with the ability to query different spatial extents, support for user generated factoids and forms for users to provide feedback on data quality, and an architecture for updates of data.

In October 2011 we had to take ThisWeKnow offline due to our inability to find a source of funding for the initiative. We have invested everything we could in the tool, but with no way to keep the data current, it simply does not make sense to keep it online. We certainly tried - we wrote grants, cultivated business partnerships, and considered business models, but our government is not being realistic if it thinks front ends to its data will grow organically in the private sector without public support. We still very much believe in this vision and look for opportunities to collaborate on these kinds of initiatives in the future.

The app is no longer available online, but the source code is available under an MIT license at github.com/btucker/thisweknow.

Press About ThisWeKnow

ReadWriteWeb: ThisWeKnow: New Semantic Web App Tames Massive Data Sets from Data.gov

Data.gov launched in May this year to make huge data sets of information from federal agencies available in machine-readable formats. While incredibly valuable, these data sets are not particularly useful in their current format to anyone but researchers, statisticians, sociologists, developers, or others used to parsing databases searching for trends. At least for geographically relevant information, ThisWeKnow provides one use case for the data sets. Users can enter the name or ZIP code of any community and get details on all kinds of factors, from violent crime to companies releasing pollutants. Full ReadWriteWeb article

CIO.gov: Removing the Shroud of Secrecy: Making Government More Transparent and Accountable

"For example ThisWeKnow.org empowers citizens by presenting Government data in an easy to understand and consistent manner. Anyone can view cancer rates in San Diego, CA, or the level of toxicity in Beaufort, NC, or the number of bills introduced by Members of Congress since 1993 in Los Alamos, NM by simply typing in a zip code." - Vivek Kundra, Federal Chief Information Officer. Full CIO.gov article

Federal Computer Week: Web mashups put transparency to the test

Builders of these public interest Web sites say government could do more to make its data more accessible. This We Know was built as a Semantic Web mashup from the beginning. The site takes data that agencies have uploaded to Data.gov and converts it into an RDF database, which is used to organize and present the data according to geographic communities. Full Federal Computer Week article

Library of Congress: Powering the Public’s Right to Know — (Senate - September 10, 2009)

The public's right to know is a cornerstone of our democracy. By using technology, a site such as this can provide citizens with access to data that is relevant to them and that can enable and encourage them to make informed decisions. This site is designed to make what was once a difficult and time-consuming process into a faster and more streamlined experience. Senate Record 9/10/2009 [PDF]

Screenshot of This We Know application

The above screen shot from ThisWeKnow shows factoids that begin to tell a story about Los Angeles. The data come from a combination of sources:

  • 2005 Toxics Release Inventory
  • 2005-2007 American Community Survey
  • Local Area Unemployment Statistics
  • National Cancer Institute SEER Registries
  • GovTrack (legislative data)