Friday, December 10, 2010

Windows hosted virtual private server (VPS) set up

In early 2010 I migrated my websites from shared web host that had begun to suck to a virtual private server (VPS) provided by Web24.

I did this for performance reasons as the shared web host had degraded over the course of several years, and additionally I wanted control over what was installed on the machine just like when at work.

Due to low traffic and budget constraints I'm running the VPS as a web server (IIS), DB server (SQL Server Express, and MySQL) and an SMTP server.  Here are my notes regarding the server installation and configuration so I can do it again fairly quickly if needed.

NOTE: these details were accurate at March 2010, newer versions and service packs are probably available now.

VPS Configuration
Windows Server 2003 R2, 64 bit
IIS 6
1GB RAM
10GB disk space
Parallels Power Panel
Full remote desktop access

General Utilities Installed
7zip, notepad++, Winmerge and Sysinternals Suite

Web Platform Installer
Used the WPI to install: .NET FW 2.0 .NET FW 3.5SP1, MVC 2.0

Database Servers
MySQL:

SQL Server 2008 Express:
SMTP Mail Server
hMailServer:
Web Mail
roundcube (PHP application):
Yep, that's it, too easy really.  Overall I'm pretty happy with Web24 - faster than the old shared hosting, and (almost) complete control over the environment. Only downside is a few more sysadmin tasks.

Next thing to do is switch to Server 2008 with IIS 7.

Entity Framework and LINQ2SQL Links

EF
Migrating from LINQ to SQL to Entity Framework: Eager Loading
http://blogs.msdn.com/adonet/archive/2008/10/07/migrating-from-linq-to-sql-to-entity-framework-eager-loading.aspx

Known Issues and Considerations in LINQ to Entities
http://msdn.microsoft.com/en-us/library/bb896317.aspx


LINQ2SQL

Linq to SQL DataContext Lifetime Management (Rick Strahl)


Python Libraries

Here are some useful Python libraries I've been using for conducting and visualising experiments in machine learning, forecasting and statistical learning. It is updated as I discover new libraries/applications.
Last updated April 2013.

Libraries

numpy - Numerical Python - Numerical Python adds a fast and sophisticated array facility to the Python language. NumPy is the most recent and most actively supported package.

scipy - Scientific Library for Python - SciPy is package of tools for science and engineering for Python. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more.

matplotlib - matplotlib tries to make easy things easy and hard things possible. You can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc, with just a few lines of code

PyBrain -  Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library. PyBrain is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.

nltk - Natural Language Toolkit — is a suite of open source Python modules, data and documentation for research and development in natural language processing.

PIL - Python Imaging Library adds image processing capabilities to your Python interpreter.

mlpy - Machine Learning PYthon - high-performance Python library for predictive modeling. Makes extensive use of  NumPy to provide fast N-dimensional array manipulation and easy integration of C code. The GNU Scientific Library ( GSL) is also required. It provides high level procedures that support, with few lines of code, the design of rich Data Analysis Protocols (DAPs) for preprocessing, clustering, predictive classification, regression and feature selection. Methods are available for feature weighting and ranking, data resampling, error evaluation and experiment landscaping.

networkx - High productivity software for complex networks - creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

neurolab - a simple and powerful Neural Network Library for Python. Contains based neural networks, train algorithms and flexible framework to create and explore other networks. Pure python + numpy.
Includes: Single layer perceptron, Multilayer feed forward perceptron, Competing layer (Kohonen Layer), Learning Vector Quantization (LVQ), Elman Recurrent network, Hopfield Recurrent network. (Sep 2011)
[Note: after using this library I prefer PyBrain since it feels more stable and better documented. However neurolab has a similar api to the MATLAB NN Toolbox]

scikits.learn - Easy-to-use and general-purpose machine learning in Python. Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (numpy, scipy, matplotlib).  It aims to provide simple and efficient solutions to learning problems that are accessible to everybody and reusable in various contexts: machine-learning as a versatile tool for science and engineering. Part of SciKits. (Aug 2011)

scikits.* - SciKits (short for SciPy Toolkits), are add-on packages for SciPy, hosted and developed separately from the main SciPy distribution. All SciKits are available under the 'scikits' namespace.
Selected examples that have recent updates and look maintained: (Sep 2011)

Tablib - allows you to import, export, and manipulate tabular data sets. Advanced features include, segregation, dynamic columns, tags & filtering, and seamless format import & export. (Sep 2011)

pyneurgen - Python Neural Genetic Hybrids.  This software provides libraries for use in Python programs to build hybrids of neural networks and genetic algorithms and/or genetic programming. (Sep 2011)

pyml - machine learning in Python. PyML is an interactive object oriented framework for machine learning, it focuses on SVMs and other kernel methods.
Features:
Classifiers: support vector machines, nearest neighbor, ridge regression
Multi-class methods (one-against-rest and one-against-one)
Feature selection (filter methods, RFE)
Model selection
Preprocessing and normalization
Syntax for combining classifiers
Classifier testing (cross-validation, error rates, ROC curves)
(Sep 2011)

Pandas - pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.  R users will be pleased to find this library adopts some of the best concepts of R, like the foundational DataFrame (April 2013)

PyTables - PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. (April 2013)

gensim - realize unsupervised semantic modelling from plain text.  Useful to determine similarity between pairs of documents.  Includes Latent Sematic Analysis and Latent Dirichlet Allocation (April 2013)

ramp - Ramp is a python package for rapid machine learning prototyping. It provides a simple, declarative syntax for exploring features, algorithms and transformations quickly and efficiently. At its core it’s a unified pandas-based framework for working with existing python machine learning and statistics libraries (scikit-learn, rpy2, etc.) (April 2013)

Statsmodels - Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator. (April 2013)

Blaze - Blaze is the next generation of NumPy, Python’s extremely popular array library. Blaze is designed to handle out-of-core computations on large datasets that exceed the system memory capacity, as well as on distributed and streaming data. (April 2013)



Applications

IPython - IPython provides a rich toolkit to help you make the most out of using Python interactively. Its main components are:
Powerful interactive Python shells (terminal- and Qt-based).
Support for interactive data visualization and use of GUI toolkits.
Flexible, embeddable interpreters to load into your own projects.
Tools for high level and interactive parallel computing.

Orange - Open source data visualization and analysis for novice and experts. Data mining through visual programming or Python scripting. Components for machine learning. Extensions for bioinformatics and text mining. Packed with features for data analytics.