Friday, March 19, 2010

Open Data Sources, Web APIs and the Semantic Web

Links to various data sources that can be used in the push for a semantic web.

General reference
http://en.wikipedia.org/wiki/Linked_Data#Datasets (List of datasets)
http://www.programmableweb.com/ (Open APIs, mashups and the Web as platform)
http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets (RDF data sets that are part of the emerging Web of Linked Data)
http://blog.visual.ly/data-sources/ (a great list of useful resources for finding data, March 2012)

Dataset providers
Note: these 3 really take the cake and I'm going to start adding to these rather than update this post:
http://www.opendataday.org/wiki/Data (dataset listing page for #odhd)
http://datacatalogs.org/ (aims to be the most comprehensive list of open data catalogs in the world)
http://thedatahub.org/ (registry of open knowledge datasets and projects)


My old dataset providers list:
http://www.quora.com/Data/What-are-some-free-public-data-sets (quora.com free data set links)

http://wiki.dbpedia.org/Datasets (structured information from Wikipedia)
http://www.programmableweb.com/api/dbpedia/mashups

http://www.freebase.com/docs/data (developed by Metaweb)
http://www.programmableweb.com/api/freebase/mashups

http://www.mpi-inf.mpg.de/yago-naga/yago/ (YAGO2 is a huge semantic knowledge base, derived from Wikipedia, WordNet and GeoNames)

http://archive.ics.uci.edu/ml/ (UC Irvine Machine Learning Repository)

http://dbtune.org/ (Serving music-related RDF)

http://www.factual.com/  (evolving data on thousands of topics)

http://www.odata.org/producers (services that expose their data using OData)

http://openspending.org/datasets (government and corporate spending)
http://wiki.openspending.org/Main_Page

http://thedatahub.org/ (WOW, awesome site, thousands of shared datasets)

http://data.gov.uk/ (UK government data)

http://okfn.org/about/ (Good resource, projects using data and links to datasets)


Dataset indexing
http://www.sindice.com/ (Semantic web index)

Query languages/tools
http://en.wikipedia.org/wiki/SPARQL (QPARQL)
http://www.freebase.com/docs/mql/ (MQL for freebase)
http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language (Sharing predictive analytics and data mining models)


People, blogs and groups
http://datavisualization.ch/ (premier news and knowledge resource for data visualization and infographics)
http://flowingdata.com/ (explores how designers, statisticians, and computer scientists are using data to understand ourselves better)
http://blog.kiwitobes.com/ (Toby Segaran, videos)
http://www.dataminingblog.com/list-of-blogs/ (yet another blog list)


Organisations and companies (semantic, bigdata)
http://linkeddata.org/home
http://www.metaweb.com/
http://www.google.com/publicdata/home
http://www.opencalais.com/
http://www.cloudera.com/
http://www.couchbase.com/
http://www.splunk.com/
http://timetric.com/
http://www.vertica.com/
http://www.asterdata.com/
http://www.quantcast.com/


Standards (yeah, need to add to this...)
RDF (Resource Description Framework)
XBRL (eXtensible Business Reporting Language)
SPARQL (SPARQL Protocol and RDF Query Language)
FOAF (Friend of a friend)
OWL (Web Ontology Language)
XFN (XHTML Friends Network)
hCard, hCalendar etc (Microformats in XHTML, see list)
PMML (Predictive Model Markup Language)
OData (Open Data Protocol by Microsoft)









Tuesday, March 2, 2010

Machine learning / AI links and libraries

Code libraries and frameworks

AForge
C# - extensive library for vision, AI, robotics etc
http://code.google.com/p/aforge/
http://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=1181072

Emgu CV
.NET - wrapper to the Intel OpenCV image processing library
http://www.emgu.com/wiki/index.php/Main_Page
http://sourceforge.net/projects/emgucv/

Watchmaker Framework
Java - Evolutionary computation framework for evolutionary/genetic algorithms
http://watchmaker.uncommons.org/

ALGLIB
.NET - cross-platform numerical analysis and data processing library
http://www.alglib.net/

Infer.NET
.NET - Microsoft framework for running Bayesian inference in graphical models, used in a wide variety of domains including information retrieval, bioinformatics, epidemiology, vision
http://research.microsoft.com/en-us/um/cambridge/projects/infernet/default.aspx

Microsoft Solver Foundation
.NET - An extensible framework to model and solve complex problems by using constraints, goals, and data.
http://code.msdn.microsoft.com/solverfoundation


Weka Machine Learning Project
A collection of algorithms for solving real-world data mining problems



Software Tools

RapidMiner
Open-source system for data and text mining (Java based)
http://rapid-i.com/
http://sourceforge.net/projects/yale/

R Project
Free software environment for statistical computing and graphics


KNIME
The user-friendly and comprehensive open-source data integration, processing, analysis, and exploration platform.


Online books

A Field Guide to Genetic Programming (free pdf)
Poli, Langdon, McPhee (2009)
http://www.gp-field-guide.org.uk/

The Elements of Statistical Learning: Data Mining, Inference, and Prediction (free pdf)
Hastie, Tibshirani, Friedman (2009)
http://www-stat.stanford.edu/~tibs/ElemStatLearn/

Information Theory, Inference, and Learning Algorithms (free pdf)
David J.C. MacKay (2003)
http://www.inference.phy.cam.ac.uk/mackay/itila/book.html

Evolutionary Computation in Java - A Practical Guide to the Watchmaker Framework (html)
Daniel W. Dyer
http://watchmaker.uncommons.org/manual/index.html

Planning Algorithms (free pdf)
Steven M. LaVall (2006)
http://planning.cs.uiuc.edu/

Evolution of Parallel Cellular Machines: The Cellular Programming Approach (free pdf)
Moshe Sipper (1997)
http://www.moshesipper.com/pcm/


Links and feeds

Interesting information visualisation blog
http://abeautifulwww.com/

Jürgen Schmidhuber's home page
http://www.idsia.ch/~juergen/

Stanford course in machine learning
http://www.youtube.com/view_play_list?gl=AU&hl=en-GB&p=A89DCFA6ADACE599

Podcast
http://www.biota.org/

Library of AI resources
http://www.aaai.org/AITopics/pmwiki/pmwiki.php/AITopics/HomePage

Data Mining and Analytics Resources
http://www.kdnuggets.com/