Data Libraries: Difference between revisions
Jump to navigation
Jump to search
mNo edit summary |
→Armed Conflicts Data: - Added MELTT |
||
(46 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
== Search Engines == | |||
*[http://www.wolframalpha.com Wolfram|Alpha] | |||
*[http://lib.berkeley.edu/wikis/datalab/Main/GoogleSearch Social Science Data Search] Google Custom Search Engine @ Berkeley Library (targets 800+ academic, government agency, non-profit, and other web sites that provide high quality, downloadable statistical information and data sets. Emphasis is on data pertaining to the social sciences, health, developing countries, energy, natural resources, and the environment) | |||
*[https://developers.google.com/search/docs/data-types/dataset Google Dataset Search] | |||
== Misc == | |||
*[http://hcil.cs.umd.edu/localphp/hcil/vast/archive/ Visual Analytics Benchmark Repository] | |||
*[http://www.ics.uci.edu/~mlearn/MLRepository.html UCI Machine Learning Repository] | *[http://www.ics.uci.edu/~mlearn/MLRepository.html UCI Machine Learning Repository] | ||
*[http://kdd.ics.uci.edu/ UCI KDD Archive] UCI Knowledge Discovery in Databases Archive | *[http://kdd.ics.uci.edu/ UCI KDD Archive] UCI Knowledge Discovery in Databases Archive | ||
Line 5: | Line 13: | ||
*[http://lib.stat.cmu.edu/general/tsa/tsa.html Time Series Analysis and Its Applications] Book Example Data | *[http://lib.stat.cmu.edu/general/tsa/tsa.html Time Series Analysis and Its Applications] Book Example Data | ||
*[http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/ Time Series Data Library] | *[http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/ Time Series Data Library] | ||
*[http://www.bized. | *[http://www.bized.co.uk/timeweb/ TimeWeb] Web-based time series databanks | ||
*[http://lisp.vse.cz/pkdd99/Challenge/ PKDD 2000 Challenge] Challenges of European Conferences on Principles and Practice of Knowledge Discovery in Databases | *[http://lisp.vse.cz/pkdd99/Challenge/ PKDD 2000 Challenge] Challenges of European Conferences on Principles and Practice of Knowledge Discovery in Databases | ||
*[http://www.physionet.org/ PhysioNet] the research resource for complex physiologic signals | *[http://www.physionet.org/ PhysioNet] the research resource for complex physiologic signals | ||
Line 37: | Line 45: | ||
*[http://numbrary.com/ Numbrary] Numbrary is a free online service dedicated to finding, using and sharing numbers on the web. | *[http://numbrary.com/ Numbrary] Numbrary is a free online service dedicated to finding, using and sharing numbers on the web. | ||
*[http://infochimps.org/ Infochimps] Free Redistributable Rich Data Sets | *[http://infochimps.org/ Infochimps] Free Redistributable Rich Data Sets | ||
*[http://http://www.redliondata.com/ Red Lion Data] Large catalog of retailer location datasets in csv | |||
*[http://dss.ucar.edu/catalogs/free.html CISL Research Data Archive] CISL Research Data Archive (RDA) - large and diverse collection of meteorological and oceanographic observations, operational and reanalysis model outputs, and remote sensing datasets | *[http://dss.ucar.edu/catalogs/free.html CISL Research Data Archive] CISL Research Data Archive (RDA) - large and diverse collection of meteorological and oceanographic observations, operational and reanalysis model outputs, and remote sensing datasets | ||
*[http://search.dss.ucar.edu/cgi-bin/rdabrowse?nb=true&c=list&cv=All+RDA+Datasets CISL Research Data Archive] CISL Research Data Archive (RDA) dataset selection according to variables, time resolution, etc. | *[http://search.dss.ucar.edu/cgi-bin/rdabrowse?nb=true&c=list&cv=All+RDA+Datasets CISL Research Data Archive] CISL Research Data Archive (RDA) dataset selection according to variables, time resolution, etc. | ||
Line 44: | Line 53: | ||
*[http://odysseas.calit2.uci.edu/doku.php/public:online_social_networks#available_datasets Facebook social graph, weighted random walks, and applications] Representative sample of Facebook users, weighted sample of college Facebook users and full sample of Facebook applications over a period of ~6 months. | *[http://odysseas.calit2.uci.edu/doku.php/public:online_social_networks#available_datasets Facebook social graph, weighted random walks, and applications] Representative sample of Facebook users, weighted sample of college Facebook users and full sample of Facebook applications over a period of ~6 months. | ||
*[http://www.statistikbanken.dk/statbank5a/SelectTable/Omrade0.asp?PLanguage=1 StatBank Denmark] StatBank Denmark contains detailed statistical information on the Danish society. The database is free of charge and data can be exported in several file formats and presented as diagrams or maps. | *[http://www.statistikbanken.dk/statbank5a/SelectTable/Omrade0.asp?PLanguage=1 StatBank Denmark] StatBank Denmark contains detailed statistical information on the Danish society. The database is free of charge and data can be exported in several file formats and presented as diagrams or maps. | ||
*[http://data.3tu.nl/repository/ 3TU.Datacentrum] offers the knowledge, experience and the tools to archive research data in a standardized, secure and well-documented manner. | |||
*[http://www.cc.gatech.edu/gvu/ii/jigsaw/datafiles.html Jigsaw Datafiles] | |||
*[http://stat-computing.org/dataexpo/ American Statistical Association (ASA) Bi-Annual Data Exposition] | |||
*[http://lib.stat.cmu.edu/DASL/ Data and Story Library (DASL)] An online library of data files and stories that illustrate the use of basic statistics methods, from Carnegie Mellon | |||
*[http://sunsite3.berkeley.edu/wikis/datalab/ Berkeley Data Lab] | |||
*[http://www.stat.ucla.edu/data/ UCLA Statistics Data Sets] | |||
*[http://www.freebase.com/ Freebase] A community effort that mostly provides data on people, places, and things. | |||
*[http://aggdata.com Aggdata] repository of for-sale datasets, mostly focused on comprehensive lists of retail locations. | |||
*[http://aws.amazon.com/publicdatasets Amazon Public Data Sets] | |||
*[http://www.githubarchive.org/ GitHub Archive] GitHub's public timeline is a huge time-oriented data source (e.g. commits to hosted open source projects) | |||
*[http://www.correlatesofwar.org/ Correlates of War] All wars in a CSV file. | |||
*[http://www.ihapss.jhsph.edu/data/NMMAPS/documentation/frame.htm NMMAPS] health, climate, pollution time series | |||
*[http://www.quandl.com/ Quandl] financial, economic and social datasets | |||
*[http://openfoodfacts.org/ Open Food Facts] Open Food Facts is a free, open and collaborative database of food products from the entire world. | |||
*[https://www.wikidata.org/ Wikidata] Wikidata is a free linked database that can be read and edited by both humans and machines. | |||
*[https://aws.amazon.com/public-data-sets/ AWS Public Data Sets] AWS hosts a variety of public data sets that anyone can access for free. | |||
*[http://www.makeovermonday.co.uk/data/ Makeover Monday] Datasets available from the Makeover Monday initiative | |||
*[https://datahub.io Datahub] the free, powerful data management platform from Open Knowledge International, based on the CKAN data management system. | |||
== Medicine == | |||
*[http://physionet.org/challenge/ PhysioNet/Computing in Cardiology Challenges] | |||
*[https://wiki.openmrs.org/display/RES/Demo+Data OpenMRS Demo Data] | |||
== Geography == | |||
(based on a list in [[Yau, N.: Visualize This: The FlowingData Guide to Design, Visualizations, and Statistics, Wiley, 2011 | Nathan Yau's book "Visualize This"]]) | |||
*[http://www.census.gov/geo/www/tiger/ TIGER] From the Census Bureau, probably the most extensive detailed data about roads, railroads, rivers, and ZIP codes you can find. | |||
*[http://www.openstreetmap.org/ OpenStreetMap] | |||
*[http://www.geocommons.com/ GeoCommons] Both data and a mapmaker | |||
*[http://www.gadm.org/country Global Administrative Areas] administrative borders for many countries in the world in different formats | |||
*[http://www.datamaps.eu/2013/07/17/oesterreichs-verwaltungsgrenzen-im-geojson-format/ Austrias borders in GeoJSON] | |||
== World == | |||
(based on a list in [[Yau, N.: Visualize This: The FlowingData Guide to Design, Visualizations, and Statistics, Wiley, 2011 | Nathan Yau's book "Visualize This"]]) | |||
*[http://www.globalhealthfacts.org/ Global Health Facts] Health-related data about countries in the world. | |||
*[http://data.un.org/ UNdata] Aggregator of world data from a variety of sources. | |||
*[http://www.who.int/research/en/ World Health Organization] Again, a variety of health-related datasets such as mortality and life expectancy. | |||
*[http://stats.oecd.org/ OECD Statistics] Major source for economic indicators. | |||
*[http://data.worldbank.org World Bank] Data for hundreds of indicators and developer-friendly. | |||
== Open Government Data == | |||
*[http://aiddata.org AidData] Open Data for International Development | |||
*[https://github.com/factbook/factbook.json factbook.json] World Factbook Country Profiles in JSON - Free Open Public Domain Data | |||
== Open Government Data / Government and Politics== | |||
=== European Union === | |||
*[http://www.europeandataportal.eu/ European Data Portal] | |||
=== Austria === | |||
*[http://gov.opendata.at Open Government Data Austria] | *[http://gov.opendata.at Open Government Data Austria] | ||
*[http://offener.datenkatalog.at/ Collection of Open Data in Austria] | *[http://offener.datenkatalog.at/ Collection of Open Data in Austria] | ||
* [http://datamarket.com/ DataMarket.com] is a data portal that provides access to statistics and structured data from various public and private sector organizations. | * [http://datamarket.com/ DataMarket.com] is a data portal that provides access to statistics and structured data from various public and private sector organizations. | ||
*[http://data.gv.at Offene Daten Österreich] | |||
*[http://data.umweltbundesamt.at/ data.umweltbundesamt.at] Environment Agency Austria | |||
*[http://www.datamaps.eu/2013/07/17/oesterreichs-verwaltungsgrenzen-im-geojson-format/ Austrias borders in GeoJSON] | |||
*[http://www.aussda.at/ AUSSDA] Austrian Social Science Data Archive | |||
=== UK === | |||
*[http://data.gov.uk/ Data.gov.uk] Catalog for data supplied by government organizations. | |||
=== USA === | |||
*[http://www.census.gov/ Census Bureau] extensive demographics. | |||
*[http://data.gov/ Data.gov] Catalog for data supplied by government organizations. | |||
*[http://datasf.org/ DataSF] Data specific to San Francisco. | |||
*[http://nyc.gov/data/ NYC] Data specific to New York. | |||
*[http://www.followthemoney.org/ Follow the Money] Big set of tools and datasets to investigate money in state politics. | |||
*[http://www.opensecrets.org/ OpenSecrets] provides details on government spending and lobbying. | |||
== InfoVis Contest Datasets == | == InfoVis Contest Datasets == | ||
Line 57: | Line 128: | ||
* [http://eagereyes.org/InfoVisContest2007Data.html InfoVis 2007 Contest] Movie Database | * [http://eagereyes.org/InfoVisContest2007Data.html InfoVis 2007 Contest] Movie Database | ||
* [http://www.merl.com/wmd/infovis.html InfoVis 2008 Contest] MERL motion sensor data | * [http://www.merl.com/wmd/infovis.html InfoVis 2008 Contest] MERL motion sensor data | ||
== BioVis Contest Datasets == | |||
* [http://www.biovis.net/year/2013/info/contest-data BioVis 2013 Contest] Protein Mutations and their effect on Protein Function | |||
== Network Data == | |||
* [https://snap.stanford.edu/data/ SNAP] Stanford Large Network Dataset Collection by Jure Leskovec | |||
== Other Lists == | == Other Lists == | ||
*[http://graphics.stanford.edu/~klingner/online_databases.html Jeff Klingner's List of Online Databases] | *[http://graphics.stanford.edu/~klingner/online_databases.html Jeff Klingner's List of Online Databases] | ||
*[http://www.visualisingdata.com/index.php/2013/07/a-big-collection-of-sites-and-services-for-accessing-data/ Essential Resources: A big collection of sites and services for accessing data] (by Andy Kirk) | |||
== Tools for Creating Synthetic Datasets == | |||
*[http://www.gris.tu-darmstadt.de/research/vissearch/projects/pcdc-synthetic-data-generation/index.html PCDC - On the Highway to Data] A Tool for the Fast Generation of Large Synthetic Data Sets (by TU Darmstadt) | |||
== Map Data == | |||
*[http://www.gadm.org/ Global Administrative Areas] GADM is a spatial database of the location of the world's administrative areas (or adminstrative boundaries) for use in GIS and similar software. | |||
*[http://vdstech.com/map-data.aspx VDS Technologies GIS & Mapping Components] | |||
== Armed Conflicts Data == | |||
* Armed Conflict Location and Event Data (ACLED): C. Raleigh, A. Linke, H. Hegre, and J. Karlsen. “Introducing ACLED: An Armed Conflict Location and Event Dataset: Special Data Feature.” In: Journal of Peace Research 47.5 (2010), pp. 651–660. | |||
* Uppsala Conflict Data Project – Georeferenced Event Dataset (GED): R. Sundberg and E. Melander. “Introducing the UCDP Georeferenced Event Dataset.” In: Journal of Peace Research 50.4 (2013), pp. 523–532. | |||
* [http://www.start.umd.edu/gtd Global Terrorism Database (GTD)] | |||
* Social Conflict Analysis Database (SCAD): I. Salehyan, C. S. Hendrix, J. Hamner, C. Case, C. Linebarger, E. Stull, and J. Williams. “Social Conflict in Africa: A New Database.” In: International Interactions 38.4 (2012), pp. 503–511. | |||
* [https://www.meltt.net/ VEHICLE] Web application to analyze the results of hierarchically integrated conflict event data (Benedikt Mayer, 2024) | |||
[[Category:Web resources]] | [[Category:Web resources]] |
Latest revision as of 08:05, 13 August 2024
Search Engines
- Wolfram|Alpha
- Social Science Data Search Google Custom Search Engine @ Berkeley Library (targets 800+ academic, government agency, non-profit, and other web sites that provide high quality, downloadable statistical information and data sets. Emphasis is on data pertaining to the social sciences, health, developing countries, energy, natural resources, and the environment)
- Google Dataset Search
Misc
- Visual Analytics Benchmark Repository
- UCI Machine Learning Repository
- UCI KDD Archive UCI Knowledge Discovery in Databases Archive
- UW XML Repository University of Washington XML Data Repository
- StatLib Data, Software and News from the Statistics Community
- Time Series Analysis and Its Applications Book Example Data
- Time Series Data Library
- TimeWeb Web-based time series databanks
- PKDD 2000 Challenge Challenges of European Conferences on Principles and Practice of Knowledge Discovery in Databases
- PhysioNet the research resource for complex physiologic signals
- MatrixMarket A visual repository of test data for use in comparative studies of algorithms for numerical linear algebra, featuring nearly 500 sparse matrices from a variety of applications, as well as matrix generation tools and services.
- LifeLine project Visualising Migrations, Transitions and Trajectories
- 10x10 Every hour, 10x10 gathers the 100 most important words and pictures in the world, based on what's happening in the news.
- InfoVis Cyberinfrastructure Data Bases
- Enron Email Dataset
- Historical Stock Data Historical Data for S&P 500 Stocks
- Ensembl Provides sequence databases of gene, transcript and protein predictions.
- EPA air quality AirData Web site gives you access to air pollution data for the entire United States.
- Network intrusion dataset
- Internet Backbone Data Internet Mapping Project
- UCR Time Series Data Mining Archive A resource for researchers interested in the clustering, classification, indexing, segmentation, change point detection and rule extraction of time series. (by Eamonn Keogh)
- Deutsche Bundesbank Zeitreihen
- Source data sources of the Worldmapper project (Worldmapper project)
- Worldbank Development Data & Statistics
- Business Intelligence Network 2006 Data Visualization Competition (Excel spreadsheet)
- Timeseries by Eamonn Keogh et al.
- U.S. Department of Labor Bureau of Labor Statistics
- National Atlas
- UK Air Quality Archive Air Quality data in the UK from the present back to 1960
- Mitsubishi Electric Research Labs (MERL) Motion sensor data from a network of over 200 sensors for a year (WMD 2007).
- European Road Safety Observatory Road Safety Data
- CARE Community database on road accidents resulting in death or injury (EU)
- IRTAD international database that gathers data on traffic and road accidents from 28 out of the 30 OECD Member countries
- Trends Online A Compendium of Data on Global Change
- Finder! Finder is a browser-based application for finding, organizing and sharing GeoData in common formats.
- ICWSM 2009 Data Challenge data set containing 44 million blog posts; suitable for link analysis, social network extraction, analysis of influence among bloggers, ...
- CKAN Comprehensive Knowledge Archive Network
- Numbrary Numbrary is a free online service dedicated to finding, using and sharing numbers on the web.
- Infochimps Free Redistributable Rich Data Sets
- Red Lion Data Large catalog of retailer location datasets in csv
- CISL Research Data Archive CISL Research Data Archive (RDA) - large and diverse collection of meteorological and oceanographic observations, operational and reanalysis model outputs, and remote sensing datasets
- CISL Research Data Archive CISL Research Data Archive (RDA) dataset selection according to variables, time resolution, etc.
- pachube a service that enables you to connect, tag and share real time sensor data from objects, devices, buildings and environments around the world.
- The World Bank Open Data
- Google Public Data Explorer
- Facebook social graph, weighted random walks, and applications Representative sample of Facebook users, weighted sample of college Facebook users and full sample of Facebook applications over a period of ~6 months.
- StatBank Denmark StatBank Denmark contains detailed statistical information on the Danish society. The database is free of charge and data can be exported in several file formats and presented as diagrams or maps.
- 3TU.Datacentrum offers the knowledge, experience and the tools to archive research data in a standardized, secure and well-documented manner.
- Jigsaw Datafiles
- American Statistical Association (ASA) Bi-Annual Data Exposition
- Data and Story Library (DASL) An online library of data files and stories that illustrate the use of basic statistics methods, from Carnegie Mellon
- Berkeley Data Lab
- UCLA Statistics Data Sets
- Freebase A community effort that mostly provides data on people, places, and things.
- Aggdata repository of for-sale datasets, mostly focused on comprehensive lists of retail locations.
- Amazon Public Data Sets
- GitHub Archive GitHub's public timeline is a huge time-oriented data source (e.g. commits to hosted open source projects)
- Correlates of War All wars in a CSV file.
- NMMAPS health, climate, pollution time series
- Quandl financial, economic and social datasets
- Open Food Facts Open Food Facts is a free, open and collaborative database of food products from the entire world.
- Wikidata Wikidata is a free linked database that can be read and edited by both humans and machines.
- AWS Public Data Sets AWS hosts a variety of public data sets that anyone can access for free.
- Makeover Monday Datasets available from the Makeover Monday initiative
- Datahub the free, powerful data management platform from Open Knowledge International, based on the CKAN data management system.
Medicine
Geography
(based on a list in Nathan Yau's book "Visualize This")
- TIGER From the Census Bureau, probably the most extensive detailed data about roads, railroads, rivers, and ZIP codes you can find.
- OpenStreetMap
- GeoCommons Both data and a mapmaker
- Global Administrative Areas administrative borders for many countries in the world in different formats
- Austrias borders in GeoJSON
World
(based on a list in Nathan Yau's book "Visualize This")
- Global Health Facts Health-related data about countries in the world.
- UNdata Aggregator of world data from a variety of sources.
- World Health Organization Again, a variety of health-related datasets such as mortality and life expectancy.
- OECD Statistics Major source for economic indicators.
- World Bank Data for hundreds of indicators and developer-friendly.
- AidData Open Data for International Development
- factbook.json World Factbook Country Profiles in JSON - Free Open Public Domain Data
Open Government Data / Government and Politics
European Union
Austria
- Open Government Data Austria
- Collection of Open Data in Austria
- DataMarket.com is a data portal that provides access to statistics and structured data from various public and private sector organizations.
- Offene Daten Österreich
- data.umweltbundesamt.at Environment Agency Austria
- Austrias borders in GeoJSON
- AUSSDA Austrian Social Science Data Archive
UK
- Data.gov.uk Catalog for data supplied by government organizations.
USA
- Census Bureau extensive demographics.
- Data.gov Catalog for data supplied by government organizations.
- DataSF Data specific to San Francisco.
- NYC Data specific to New York.
- Follow the Money Big set of tools and datasets to investigate money in state politics.
- OpenSecrets provides details on government spending and lobbying.
InfoVis Contest Datasets
- InfoVis 2003 Contest Tree data
- InfoVis 2004 Contest Meta Data of publications
- InfoVis 2005 Contest Technology Trends in the United States
- InfoVis 2006 Contest 1% public use microdata sample from the 2002 Census
- InfoVis 2007 Contest Movie Database
- InfoVis 2008 Contest MERL motion sensor data
BioVis Contest Datasets
- BioVis 2013 Contest Protein Mutations and their effect on Protein Function
Network Data
- SNAP Stanford Large Network Dataset Collection by Jure Leskovec
Other Lists
- Jeff Klingner's List of Online Databases
- Essential Resources: A big collection of sites and services for accessing data (by Andy Kirk)
Tools for Creating Synthetic Datasets
- PCDC - On the Highway to Data A Tool for the Fast Generation of Large Synthetic Data Sets (by TU Darmstadt)
Map Data
- Global Administrative Areas GADM is a spatial database of the location of the world's administrative areas (or adminstrative boundaries) for use in GIS and similar software.
- VDS Technologies GIS & Mapping Components
Armed Conflicts Data
- Armed Conflict Location and Event Data (ACLED): C. Raleigh, A. Linke, H. Hegre, and J. Karlsen. “Introducing ACLED: An Armed Conflict Location and Event Dataset: Special Data Feature.” In: Journal of Peace Research 47.5 (2010), pp. 651–660.
- Uppsala Conflict Data Project – Georeferenced Event Dataset (GED): R. Sundberg and E. Melander. “Introducing the UCDP Georeferenced Event Dataset.” In: Journal of Peace Research 50.4 (2013), pp. 523–532.
- Global Terrorism Database (GTD)
- Social Conflict Analysis Database (SCAD): I. Salehyan, C. S. Hendrix, J. Hamner, C. Case, C. Linebarger, E. Stull, and J. Williams. “Social Conflict in Africa: A New Database.” In: International Interactions 38.4 (2012), pp. 503–511.
- VEHICLE Web application to analyze the results of hierarchically integrated conflict event data (Benedikt Mayer, 2024)