Linux is far superior to Windows in terms of security, customization, software updates and so on. Linux provides support for all major programming languages such as C, C++, Go, Java, Perl, Python, Ruby, etc. Moreover, it provides access to a vast set of machine learning tools and utilities. Hence, when it comes to development and deep learning, programmers prefer Linux.

When choosing a distro, first of all, you need to pay attention to the following factors:

  • Ease of use and customization. You are going to do machine learning, and the first thing you don't want to do is spend a lot of time figuring out how to install the right tool, configure the system, update it, or solve some problem in the system.
  • Stability. The distro should be fairly stable, with as few bugs as possible.
  • Software Availability. All tools needed for machine learning should be available and easy to install.

Usually, popular distros with a large community of users have all the advantages described above. Usually, developers of machine learning frameworks conduct tests in them, and also write instructions on how to install them in a particular distro. As a result, I chose the most popular distros used by developers involved in machine learning:

  • Ubuntu
  • Fedora
  • Manjaro
  • Arch Linux
  • openSUSE
  • Solus

Advantages of using Linux for Machine Learning

  1. Environment. The ability to test the developed software directly in the same environment in which it will work after it goes into production. You can have all the same software on your personal computer, with the same versions and settings, as the one that will work on the server for your application.
  2. Free. Distributed under the GNU GPL, almost all Linux distros are free.
  3. Security. Of course, the security of Linux primarily depends on the user, but one cannot but admit that the amount of malware in Linux is less than in Windows. In addition, with access to public controlled repositories for installing software, there is no need to download software from unverified sources.
  4. Compatibility. Using the Wine emulator allows you to run many Windows applications on Linux.
  5. Stability. Some distributions are very stable and don't even require reboots.

Overview of Linux distros for machine learning

However, it remains unclear for us which Linux distro is right for you. Let's move on to a brief overview of each of the distros, where you will determine the most suitable for yourself.

Ubuntu

Ubuntu

Ubuntu is one of the most popular Linux distributions developed by Canonical. It is perfect for programming for both beginners and professionals. Possibly the best Linux for machine learning. Most of the tools have already built deb packages that will work on Debian and all distributions based on it, including Ubuntu.

Choose LTS versions of Ubuntu, as they are supported for at least two years, and sometimes longer, so they don't need to be reinstalled every six months.

Due to the popularity of Ubuntu, you can find many of the tools you need for development in the official repositories. Even if they are not there, there are many PPA repositories, as well as the snap package manager installed by default on the system, using which you can quickly install a lot of useful software.

Fedora

Fedora

Fedora is a fairly popular distro among developers, developed with the support of Red Hat. This distro comes with all the latest technologies that will make their way into Red Hat Enterprise Linux in the future.

There are many developer tools in the official Fedora repositories. Of course, there is not as much software as for Ubuntu, but it is enough. There is also the flatpak package manager, which is significantly better than the snap provided by Ubuntu.

Support for each version of Fedora lasts a little over a year.

Manjaro

Manjaro KDE

Manjaro is currently the most popular distro based on Arch Linux. The advantage of Arch Linux is that you can build a highly customizable desktop environment on top of it. However, installing and configuring Arch Linux is quite complex and time consuming. Personally, I've never installed Arch Linux without documentation.

Manjaro has several editions with different desktop environments. You can use KDE, Gnome or Xfce depending on your preference.

Manjaro uses a rolling release system, but there are regular releases from time to time. You can use the pacman package manager or the AUR repository to get the various development tools you need for development and machine learning.

Arch Linux

Arch Linux is a rather difficult Linux distribution for a newbie. After installation, you get a bare Linux distribution with a minimal set of packages and a сommand line interface. What to do with it all, it's up to you.

Arch Linux has a rolling release update model, so you don't have to worry about reinstalling your system to get a new version of the distro. Because its base system is always up to date with the latest fixes and new features, you don't have to worry about when to install a system update and how long it will take.

I also want to mention ArchWiki, where you will find the answer to almost any question that arises when installing and configuring Arch. The documentation here is so good that users of other distributions use it. And the answer to a question that is not covered in the wiki, with a probability of 98%, you will find on the Arch forum. Therefore, if you have any problems, for example, with PyTorch or TensorFlow, then the forum will definitely help.

Arch Linux is the perfect learning platform for anyone who wants to learn how Linux works, as it requires attention to documentation when using it.
Therefore, if, in addition to machine learning and deep learning, you want to learn Linux, then Arch is the best.

openSUSE

openSUSE

openSUSE Linux is one of the oldest distributions, predating Red Hat and Ubuntu.

OpenSUSE is currently available in two versions, Leap, which provides a stable platform with years of support, and Tumbleweed, which provides a rolling release of updates to the distro. openSUSE is often praised for its ease of configuration via YaST, extensive support for the Btrfs filesystem, and automatic filesystem snapshots.

openSUSE has its own zypper package manager. It is based on the libzypp library. I want to note that it works much faster than apt or yum. The fact is that zypper has such an interesting feature as package patches. You are not downloading a completely new version of the package, but only syncing the differences with rsync.

In conclusion, this is a very stable distro that gives you enough freedom to put together what you need and at the same time with a minimum of problems and errors.

Solus

Solus is a beautiful distribution that is not based on other distributions. Solus uses its own Budgie desktop environment. It tightly integrated the environment with the GNOME stack and uses many of its features.

Solus uses the eopkg package manager to manage packages. eopkg is a fork of the PiSi package manager that was developed for the Pardus distro. eopkg allows you to perform all the necessary set of actions for managing packages and repositories.

Solus uses the rolling release update model. Therefore, the system is always up to date.

Conclusion

I can assume that you are a beginner once wondering which distro is better. I can confidently say that for a beginner, the best distro for machine learning will be Ubuntu. Here are some important examples why:

  • Ubuntu is incredibly popular. So in case of any problems, you can always find a solution.
  • Ubuntu is available with almost all desktop environments: Kubuntu, Xubuntu (XFCE), Lubuntu (LXDE), Ubuntu MATE, Ubuntu Budgie.
  • From what I've seen, Ubuntu is used in most cases for machine learning.

But in any case, it's up to you which distro to use. If you have a different opinion on this topic, write in the comments.