Building a Simple Python Environment for Data Science and Development:

salbaroudi - March 13, 2019 - Development /Python3 /Tools /

Impetus:

...So this all started when I tried to upgrade pip to v18+ on my laptop. I was running Kubuntu ​​ v16.04, and I ​​ had noticed that my version of pip was about 10 versions behind (8.1.1). This leads to an ImportError where main is not recognized, and pip fails to run, as seen below:

 

Why does this happen?

 

For my version of pip, ​​ I had installed the ubuntu apt repo of pip (python3-pip). Trying to then use pip to upgrade itself involves a user space call trying to upgrade a system wide package. ​​ I unfortunately sudoed and bricked my system wide pip. ​​ There are various solutions to this [1]. Some developers just fixed the main method code and got it runinng again. This is fine, but for the developers of pip, a more systemic problem was afoot [2].

 

The pip developer’s post [2] lead me on a trail of research, where I learned a few disquieting things:

  • Other Ubuntu programs ​​ use the system-wide default python; if I try to upgrade a system wide package with a pip install, I can brick other programs in the system. [3][4].

  • Pip is slightly complicated. Because python is still straddling its versions (2 v 3), pip has to deal with this. Specifically, to invoke pip you can type pip, pip2 or pip3 on command line. Depending on what versions of python you have installed, and the order you installed things, pip itself can point to pip2 or pip3.

  • You can install programs in the user local folder with the ​​ - -user option. But with many user projects all dependent on the same libraries, you can again brick projects with a bad install, or upgrade [5].

  • Apt repos almost always lag behind updates for python and it ecosystem of packages. Just relying on apt-get install to get code just isn’t feasible.

 

I also realized that some of the libraries I wanted needed different versions of Python. Pandas needs v3.6+, whereas the system default python is 3.5 and 2.7. ​​ This is alleviated by compiling more recent versions of Python. ​​ 

 

With all this in mind, we can have basic requirements for a development environment when coding up python projects:

  • System default python packages and modules are left alone.

  • I want my user space projects to be protected from each other. Each has a set of constraints to what versions of libraries they can support.

  • I would like to use more recent version of Python

  • I really don’t want to use python2 anymore (python3 only).

 

What is the solution? For me, it was the venv module, with more advanced pip3 usage.

 

There are many virtual environment tools available for python. I chose venv due to comments made on Stack Exchange [6]. It also is the de facto tool in the core python libraries [7] .

One might ask: “What about Docker...?”[8]. I’m not looking to contain my entire system in an image. I am also not swapping OS’es for my personal projects. So venv it is.

 

One flaw with venv is that there is no “--python=version” option available. ​​ If you call venv with the default python3, you will end up generating an environment that points to the system python3.X interpreter. What if I want verison X.Y of python? Simple, just call:

 

pythonX.Y -m venv /path/to/env/directory

 

When venv is “activated” in shell (such as bash), it rewrites $PYTHONPATH variables to point to the environment directory. When pip or other setup tools are run, they install to the environment folder by default.

 

Venv also deals with the pip upgrade problem I faced. It will auto install a recent version of pip in the environment folder. From here, I can install any site-packages into my environment – fully sandboxed away from other projects and the system.

 

The pain of having to download packages over and over again can be mitigated with a requirements.txt files in pip. Making one of these files, or manually downloading every module makes one aware of the dependencies their project has. For a low/mid level developer, I think this is important habit to build.

 

Compiling python on its own can lead to some issues. Python can rely on other (apt) system packages when being compiled. Consider the sqlite3-dev package in ubuntu. Its implementation is in C . ​​ If this is not installed, the .configure script will skip it when generating the make file. There will be no python headers pointing to this external system library. So it will be missing when imported. If you try to download sqlite3-dev with apt, it will install, but the headers for python will still not be generated (...this is not the responsiblity of apt!). So your python3.X still needs to be recompiled again.

 

So overall, venv+pip3 seem to make sense to use.

 

Enough comments and caveats, let’s get started!

 

Building our Rudimentary Environment:

Installing Python 3.7.2:

 

Download a Python3.7.2 tar ball from the Python website. Unrar the tar ball into a user directory. For my particular needs (data science, engineering, exploration, etc...) I found that I needed two external system libraries that python references: sqlite-devel and libssl-dev

sudo apt install sqllite-devel libssl-dev

 

​​ Note: there may be other system libraries you require. You are on your own to find them.

Once done, Enter the un-tared folder, and type the following:

 

./configure

make altinstall

 

It is important **to use altinstall here**, as using the regular “make install” may overwrite the system core python.

 

You’ll see a lot of text spewing out of your command line. After a short time, it will finish up. Test python out quickly:

 

Python3.7 -V

 

And confirm it gives the correct version number. You can also go into interpreter mode and check that core packages are installed via import statements.

 

Prepping venv:

The good news is that for a properly compiled version of python, venv is included already as a core package. This is true for python 3.3+.

Now, venv is a module (not a tool that runs on its own) and does not have an option to choose a version of python. We have to choose the version of python to run venv with, and venv will just point to it after setup. To initialize an environment, make a folder (say – test1), and then run :

python3.7 -m venv /absolute/path/to/new/environment/test1

 

OR

 

python3.7 -m venv ./rel/path/to/folder/test1

 

Venv will create a folder if it doesn’t exist. Take a moment to look inside the folders that venv has pumped out: Observe the following:

  • /bin: You can see internal imports for tools like pip3, and symbolic links to our compiled versions of python, such as python3.7. Our venv scripts to activate our environment are also located here

  • /lib/python3.7/site-packages: pip3 code has already been installed, for our version of python. Other external libraries we install with pip will end up around here.

  • Pyvenv.cfg file: You will see the following few lines:

 

 

home = /usr/local/bin

include-system-site-packages = false

version = 3.7.2

 

System site packages are not included (our sandboxing aspect!) and it references python 3.7.2 we compiled.

Next, let activate our venv environment. This is done in a shell (such as Bash). Note that this will rewrite $PYTHONPATH and other shell variables, so that all references for python point to the environment folder. This is what protects the rest of the system when we install packages.

source bin/activate

 

To see that this is activated, your command line will be modified (see picture below). Note that these mods are only true for the particular shell you are in. Your environment will not be open if you open a new shell.

Next, lets make a requirements.txt file[9], and install all of our packages to our environment folder using pip.

The formatting is as follows:

Packagename [(operator) versionnumber]

 

#Examples:

beautifulsoup4 >= 1.1

matplotlib

scipy == 0.5

...

 

The first line illustrates the syntax of each entry. The terms in the square brackets are optional. ​​ Just list the packages you want, and type:

​​ pip3 install -r <name of reqfile>

 

Here is my requirements.txt file, as an example:

beautifulsoup4

scrapy

matplotlib

sqlalchemy

numpy

scipy

pandas

praw

jupyter

altair

pytest

hypothesis

 

Now that this is done, we need to set up a project folder. This does not have to be inside the environment folder. It can be anywhere. But remember: when you call python3 from command line, it has to be in an activated environment.

When you are done developing for the day, type:

deactivate

 

The activate script already set up the shell to recognize deactivate without the source command. Nice!

Setting up the Project Folder:

Project Requirements:

  • /test directory with pytest scripts, for testing as we develop code.

  • A code folder with separate sub folders for classes, and main code.

  • Version control with git.

  • /env kept in the project folder (but ignored by git).

  • **Usage of absolute imports.

**I spent quite a bit of time reading about imports in python. There are a lot of caveats to python importing, and additional restrictions in python3 (vs python2). I can’t really do it justice in this post. I will simply say that for a shallow project (not too many nested subpackages), absolute imports are a simple way to ensure importing is done correctly. An excellent summary of these issues has been compiled by Chris Yeh [10]. ​​ 

To meet these requirements, I made a simple github repo. [Click Here] to view it. . When I start a new python3 project, I just clone and go!

Happy Developing!

References:

 

[1] Github: Post on pip Breaking: https://github.com/pypa/pip/issues/5221

 

[2] Stack Exchange: Pip broken after upgrading: https://stackoverflow.com/questions/49836676/error-after-upgrading-pip-cannot-import-name-main

 

[3] Stack Exchange: Removing Python: https://stackoverflow.com/questions/34198892/ubuntu-how-do-you-remove-all-python-3-but-not-2

 

[4] Stack Exchange: sudo pip vs pip: https://stackoverflow.com/questions/29310688/sudo-pip-install-vs-pip-install-user

 

[5] Pip User Guilde: https://pip.pypa.io/en/stable/user_guide/

 

[6] Stack Exchange Differences in python environment tools: https://stackoverflow.com/questions/41573587/what-is-the-difference-between-venv-pyvenv-pyenv-virtualenv-virtualenvwrappe

 

[7] Python venv spec page: https://docs.python.org/3/library/venv.html

 

[8] Stack Exchange: venv vs Docker: https://stackoverflow.com/questions/50974960/whats-the-difference-between-docker-and-python-virtualenv

 

[9] Pip Requriements Specification: https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format

 

[10]: The Definitive Guide to Python Importing (Chris Yeh): https://chrisyeh96.github.io/2017/08/08/definitive-guide-python-imports.html

 

 

END

Leave a Reply

Your email address will not be published. Required fields are marked *