Building a Simple Python Environment for Data Science and Development:
...So this all started when I tried to upgrade pip to v18+ on my laptop. I was running Kubuntu v16.04, and I had noticed that my version of pip was about 10 versions behind (8.1.1). This leads to an ImportError where main is not recognized, and pip fails to run, as seen below:
Why does this happen?
For my version of pip, I had installed the ubuntu apt repo of pip (python3-pip). Trying to then use pip to upgrade itself involves a user space call trying to upgrade a system wide package. I unfortunately sudoed and bricked my system wide pip. There are various solutions to this . Some developers just fixed the main method code and got it runinng again. This is fine, but for the developers of pip, a more systemic problem was afoot .
The pip developer’s post  lead me on a trail of research, where I learned a few disquieting things:
Other Ubuntu programs use the system-wide default python; if I try to upgrade a system wide package with a pip install, I can brick other programs in the system. .
Pip is slightly complicated. Because python is still straddling its versions (2 v 3), pip has to deal with this. Specifically, to invoke pip you can type pip, pip2 or pip3 on command line. Depending on what versions of python you have installed, and the order you installed things, pip itself can point to pip2 or pip3.
You can install programs in the user local folder with the - -user option. But with many user projects all dependent on the same libraries, you can again brick projects with a bad install, or upgrade .
Apt repos almost always lag behind updates for python and it ecosystem of packages. Just relying on apt-get install to get code just isn’t feasible.
I also realized that some of the libraries I wanted needed different versions of Python. Pandas needs v3.6+, whereas the system default python is 3.5 and 2.7. This is alleviated by compiling more recent versions of Python.
With all this in mind, we can have basic requirements for a development environment when coding up python projects:
System default python packages and modules are left alone.
I want my user space projects to be protected from each other. Each has a set of constraints to what versions of libraries they can support.
I would like to use more recent version of Python
I really don’t want to use python2 anymore (python3 only).
What is the solution? For me, it was the venv module, with more advanced pip3 usage.
There are many virtual environment tools available for python. I chose venv due to comments made on Stack Exchange . It also is the de facto tool in the core python libraries  .
One might ask: “What about Docker...?”. I’m not looking to contain my entire system in an image. I am also not swapping OS’es for my personal projects. So venv it is.
One flaw with venv is that there is no “--python=version” option available. If you call venv with the default python3, you will end up generating an environment that points to the system python3.X interpreter. What if I want verison X.Y of python? Simple, just call:
pythonX.Y -m venv /path/to/env/directory
When venv is “activated” in shell (such as bash), it rewrites $PYTHONPATH variables to point to the environment directory. When pip or other setup tools are run, they install to the environment folder by default.
Venv also deals with the pip upgrade problem I faced. It will auto install a recent version of pip in the environment folder. From here, I can install any site-packages into my environment – fully sandboxed away from other projects and the system.
The pain of having to download packages over and over again can be mitigated with a requirements.txt files in pip. Making one of these files, or manually downloading every module makes one aware of the dependencies their project has. For a low/mid level developer, I think this is important habit to build.
Compiling python on its own can lead to some issues. Python can rely on other (apt) system packages when being compiled. Consider the sqlite3-dev package in ubuntu. Its implementation is in C . If this is not installed, the .configure script will skip it when generating the make file. There will be no python headers pointing to this external system library. So it will be missing when imported. If you try to download sqlite3-dev with apt, it will install, but the headers for python will still not be generated (...this is not the responsiblity of apt!). So your python3.X still needs to be recompiled again.
So overall, venv+pip3 seem to make sense to use.
Enough comments and caveats, let’s get started!
Building our Rudimentary Environment:
Installing Python 3.7.2:
Download a Python3.7.2 tar ball from the Python website. Unrar the tar ball into a user directory. For my particular needs (data science, engineering, exploration, etc...) I found that I needed two external system libraries that python references: sqlite-devel and libssl-dev
sudo apt install sqllite-devel libssl-dev
Note: there may be other system libraries you require. You are on your own to find them.
Once done, Enter the un-tared folder, and type the following:
It is important **to use altinstall here**, as using the regular “make install” may overwrite the system core python.
You’ll see a lot of text spewing out of your command line. After a short time, it will finish up. Test python out quickly:
And confirm it gives the correct version number. You can also go into interpreter mode and check that core packages are installed via import statements.
The good news is that for a properly compiled version of python, venv is included already as a core package. This is true for python 3.3+.
Now, venv is a module (not a tool that runs on its own) and does not have an option to choose a version of python. We have to choose the version of python to run venv with, and venv will just point to it after setup. To initialize an environment, make a folder (say – test1), and then run :
python3.7 -m venv /absolute/path/to/new/environment/test1
python3.7 -m venv ./rel/path/to/folder/test1
Venv will create a folder if it doesn’t exist. Take a moment to look inside the folders that venv has pumped out: Observe the following:
/bin: You can see internal imports for tools like pip3, and symbolic links to our compiled versions of python, such as python3.7. Our venv scripts to activate our environment are also located here
/lib/python3.7/site-packages: pip3 code has already been installed, for our version of python. Other external libraries we install with pip will end up around here.
Pyvenv.cfg file: You will see the following few lines:
home = /usr/local/bin
include-system-site-packages = false
version = 3.7.2
System site packages are not included (our sandboxing aspect!) and it references python 3.7.2 we compiled.
Next, let activate our venv environment. This is done in a shell (such as Bash). Note that this will rewrite $PYTHONPATH and other shell variables, so that all references for python point to the environment folder. This is what protects the rest of the system when we install packages.
To see that this is activated, your command line will be modified (see picture below). Note that these mods are only true for the particular shell you are in. Your environment will not be open if you open a new shell.
Next, lets make a requirements.txt file, and install all of our packages to our environment folder using pip.
The formatting is as follows:
Packagename [(operator) versionnumber]
beautifulsoup4 >= 1.1
scipy == 0.5
The first line illustrates the syntax of each entry. The terms in the square brackets are optional. Just list the packages you want, and type:
pip3 install -r <name of reqfile>
Here is my requirements.txt file, as an example:
Now that this is done, we need to set up a project folder. This does not have to be inside the environment folder. It can be anywhere. But remember: when you call python3 from command line, it has to be in an activated environment.
When you are done developing for the day, type:
The activate script already set up the shell to recognize deactivate without the source command. Nice!
Setting up the Project Folder:
/test directory with pytest scripts, for testing as we develop code.
A code folder with separate sub folders for classes, and main code.
Version control with git.
/env kept in the project folder (but ignored by git).
**Usage of absolute imports.
**I spent quite a bit of time reading about imports in python. There are a lot of caveats to python importing, and additional restrictions in python3 (vs python2). I can’t really do it justice in this post. I will simply say that for a shallow project (not too many nested subpackages), absolute imports are a simple way to ensure importing is done correctly. An excellent summary of these issues has been compiled by Chris Yeh .
To meet these requirements, I made a simple github repo. [Click Here] to view it. . When I start a new python3 project, I just clone and go!
 Github: Post on pip Breaking: https://github.com/pypa/pip/issues/5221
 Stack Exchange: Pip broken after upgrading: https://stackoverflow.com/questions/49836676/error-after-upgrading-pip-cannot-import-name-main
 Stack Exchange: Removing Python: https://stackoverflow.com/questions/34198892/ubuntu-how-do-you-remove-all-python-3-but-not-2
 Stack Exchange: sudo pip vs pip: https://stackoverflow.com/questions/29310688/sudo-pip-install-vs-pip-install-user
 Pip User Guilde: https://pip.pypa.io/en/stable/user_guide/
 Stack Exchange Differences in python environment tools: https://stackoverflow.com/questions/41573587/what-is-the-difference-between-venv-pyvenv-pyenv-virtualenv-virtualenvwrappe
 Python venv spec page: https://docs.python.org/3/library/venv.html
 Stack Exchange: venv vs Docker: https://stackoverflow.com/questions/50974960/whats-the-difference-between-docker-and-python-virtualenv
 Pip Requriements Specification: https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format
: The Definitive Guide to Python Importing (Chris Yeh): https://chrisyeh96.github.io/2017/08/08/definitive-guide-python-imports.html