Python SciPy Stack
If you are doing data analysis-related work and your primary language is Python, I would highly recommend to get familiar with the Python SciPy Stack, which will help you boost your productivity significantly.
According to the home page, if a Python distribution would like to promote itself as providing the SciPy Stack, it should include following 8 required components
- Python, the implementation of the Python language itself. For example, the most popular implementation is CPython (Not to be confused with Cython, they are very different things). Other less popular ones include IronPython, Jython, PyPy.
- NumPy, the Python package for numerical computation.
- SciPy, a wide collection of numerical algorithms in e.g. optimization and statistics. It’s noteworthy that SciPy is so fast because it depends on other libraries such as BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra PACKage), which are written in lower-level languages like FORTRAN and C. SciPy is also what the stack is named after.
- Matplotlib, the de facto standard library for 2D plotting in Python.
- Pandas, a library providing high-performance, easy-to-use data structures like DataFrame and Series, and data analysis tools for Python.
- SymPy, a library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) entirely written in Python.
- IPython, a rich interactive interface, and significantly more powerful replacement to the default interpreter interface provided by CPython.
- nose, a framework for testing Python code.
In addition, some other Python libraries I have used and felt worth mentioning are
- scikit-image for image processing & scikit-learn for machine learning, both of which are scikits (short for SciPy Toolkits), i.e. add-on packages for SciPy. See here for a comprehensive list of scikits.
- pytest, another Python testing framework which I find more friendly to use than the aforementioned nose. This is a very good presentation from Mozilla talking about their experience of switching from nose to pytest.
- Jupyter Notebook, originally named IPython notebook, it provides a web interface for interactive programming in multiple languages.
Getting familiar with all the above packages will give you an highly interactive and productive experience when programming in Python.
To install the SciPy stack, Anaconda is recommended. For other options, see the installation guide.