General reference
http://en.wikipedia.org/wiki/Linked_Data#Datasets (List of datasets)
http://www.programmableweb.com/ (Open APIs, mashups and the Web as platform)
http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets (RDF data sets that are part of the emerging Web of Linked Data)
http://blog.visual.ly/data-sources/ (a great list of useful resources for finding data, March 2012)
Dataset providers
Note: these 3 really take the cake and I'm going to start adding to these rather than update this post:
http://www.opendataday.org/wiki/Data (dataset listing page for #odhd)
http://datacatalogs.org/ (aims to be the most comprehensive list of open data catalogs in the world)
http://thedatahub.org/ (registry of open knowledge datasets and projects)
My old dataset providers list:
http://www.quora.com/Data/What-are-some-free-public-data-sets (quora.com free data set links)
http://wiki.dbpedia.org/Datasets (structured information from Wikipedia)
http://www.programmableweb.com/api/dbpedia/mashups
http://www.freebase.com/docs/data (developed by Metaweb)
http://www.programmableweb.com/api/freebase/mashups
http://www.mpi-inf.mpg.de/yago-naga/yago/ (YAGO2 is a huge semantic knowledge base, derived from Wikipedia, WordNet and GeoNames)
http://archive.ics.uci.edu/ml/ (UC Irvine Machine Learning Repository)
http://dbtune.org/ (Serving music-related RDF)
http://www.factual.com/ (evolving data on thousands of topics)
http://www.odata.org/producers (services that expose their data using OData)
http://openspending.org/datasets (government and corporate spending)
http://wiki.openspending.org/Main_Page
http://thedatahub.org/ (WOW, awesome site, thousands of shared datasets)
http://data.gov.uk/ (UK government data)
http://okfn.org/about/ (Good resource, projects using data and links to datasets)
Dataset indexing
http://www.sindice.com/ (Semantic web index)
Query languages/tools
http://en.wikipedia.org/wiki/SPARQL (QPARQL)
http://www.freebase.com/docs/mql/ (MQL for freebase)
http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language (Sharing predictive analytics and data mining models)
People, blogs and groups
http://www.linkedin.com/groups?home=&gid=49970 (Semantic Web)
http://datavisualization.ch/ (premier news and knowledge resource for data visualization and infographics)
http://flowingdata.com/ (explores how designers, statisticians, and computer scientists are using data to understand ourselves better)
http://blog.kiwitobes.com/ (Toby Segaran, videos)
http://blog.kiwitobes.com/ (Toby Segaran, videos)
http://www.quora.com/What-are-the-best-blogs-about-data (another blog list)
http://www.dataminingblog.com/list-of-blogs/ (yet another blog list)Organisations and companies (semantic, bigdata)
http://linkeddata.org/home
http://www.metaweb.com/
http://www.google.com/publicdata/home
http://www.opencalais.com/
http://www.cloudera.com/
http://www.couchbase.com/
http://www.splunk.com/
http://timetric.com/
http://www.vertica.com/
http://www.asterdata.com/
http://www.quantcast.com/
Standards (yeah, need to add to this...)
RDF (Resource Description Framework)
XBRL (eXtensible Business Reporting Language)
SPARQL (SPARQL Protocol and RDF Query Language)
FOAF (Friend of a friend)
OWL (Web Ontology Language)
XFN (XHTML Friends Network)
hCard, hCalendar etc (Microformats in XHTML, see list)
PMML (Predictive Model Markup Language)
OData (Open Data Protocol by Microsoft)