Three much talked about languages in data science are Python R and Julia. Python, created by the Dutch Guido van Rossum and named after the famous show, is tough to beat and has seen many challengers during its time on top. R, the user friendly, slower and perhaps more sophisticated data science language also won’t be gone tomorrow. Julia, the new kid in town with the odd name. What is it, what does it do well and how does it do that well? These questions I will tackled in an abstract way. The language is focused on data science. Which raises the question: will it beat the other data science languages.
Code readability spectrum
Before exploring Julia, we take a quick step back and look at the programming language dimensions. More specifically, how close a programming language is to machine code. Machine code is ultimately just ones and zeros. If we assume our machine uses the von Neumann structure. This could change when Neuromorphic computing takes off, about which you can read more in the following blog written Wesley Kruijthof: https://digitalstrategy.rsm.nl//2020/09/29/biomimicry-from-neural-networks-to-neural-architecture/. Anyway, in the world of ones and zeros, there are languages which lie closer to machine code and languages which are closer to human readable code. Broadly spoken, there are three categorizations on this readability spectrum: assembly, compiled and interpreted. Assembly languages are designed for a kind of processor, meaning they are highly adapted to the machine they’re supposed to run on. Some families of processors are x86, x64 (the 64 bit version of x86), ARM and MIPS. These families are instruction sets on which languages written to create the code more efficiently. Small sidenote, Apple’s mac’s have long used intel’s x86 processors but are switching to ARM, and will create these themselves. When we go a level higher we’ll see the compiler languages. C, C++, GO, Rust and many more. These programs, when run, are compiled, turned into machine code by the assembler and run directly on the machine. Interpreted languages find a different path to the machine. Interpreted languages are easier for beginners. I’m personally comfortable when writing in Python, R or VBA (Excel specifically). Other interpreted languages examples are Javascript, BASIC, PERL and Ruby. When these languages are executed, the code gets interpreted line by line. The interpreter reads each line and executes it immediately. For Python the process looks a little bit as the following: CPython translates Python to a C-file. This file gets compiled by C’s compiler GCC (GNU Compiler collection), translated to machine code, which is not executed. This machine code is the actual interpreter and needs Python source code, which is ultimately executed line by line (GeeksforGeeks, 2020).
So where is Julia?
What does this mean for Julia? Or R and python for that sake? Well the interpreter is much slower, though many libraries exist to speed up the process. Both R and Python are interpreted and this is exactly where the difference with Julia lies. Julia looks like an interpreted language, swims like an interpreted language and quacks like a interpreted language, but it is a compiled one. But how? Well compiled code is checked for errors while compiling, which is called static typing. In dynamic typing the checking is done per line at run time. To be clear, the difference in typing has nothing to do with pressing the keyboard. In Julia, Just-In-Time (JIT) compiling is used (Hall, sd). Opposed to compilation, JIT doesn’t compile the entire code in one go, but does it on the fly and opposed to interpreted coding, it doesn’t interpret. But there are already many interpreted languages adjusted to use the JIT way and these languages have existed for far longer. The difference is that Julia’s got some other tricks up its sleeve. Type stability is the notion that only one type can be the output of a method (UCIDataScienceInitiative, sd). Julia scans the code, finds which type is expected and compiles code for that type (Julia, sd). If it can derive which type the output shall be, Julia achieves speeds equal to C.
Does this mean it’s better?
Well not quite. Though speed is a very important, there are many different factors which define a language as better or best, which obviously aren’t defined terms here. R for example is rather slow. It’s not that some guy from the R-team, while years into development, in a conference said: ah sh*t, we forgot to implement speed. R is for data scientists and is made simple/convenient. It’s also not made to process terabytes or produce time critical results. A complete beginner will get used to R quicker than to Julia and the same applies to Python.
Versatility is also important. When you find your way into a job application for data scientist and are asked why R is popular in the field, the answer shouldn’t be that it’s easy to learn. R’s CRAN has over ten thousand official packages which are solely made for data analysis and related fields. Dwarfing Python and demolishing Julia. The amount of mature packages is directly related to the community of a language. When more people use a language, it gets streamlined. Improvements will be made when many people spend a lot of time. If you google “Biggest R contributor” I can assure you the name Hadley Wickham will pop up very quickly. It’s safely argued that R wouldn’t nearly be as popular without his contributions. Which are both easier to use packages, faster packages and books explaining those. Python on the other hand is the most versatile. Though seemingly contradictory to the dwarfing statement earlier, PyPI has the most packages of the three with quite some distance as well. That’s because Python is also very well developed in other fields like web development, databases and webscraping. (CRAN, sd)
Future
So Julia’s prospects? Julia has got an annual growth rate of 101% (Simplylearn, 2020). Meaning the community is growing and the packages are expanding. If you were to take only one thing from this blog about Julia. It’s that Julia combines the best of two worlds: high performance and ease of use. Somebody who wants to get the best of both worlds, might actually write programs in the two different world. Prototyping in one, and implementing in the other is no longer necessary. Solving this problem will create a user base, which in turn sparks some good old positive network effects. I haven’t even named the possibility to call Python and C packages from Julia yet. Also, Julia might be able to learn from its predecessors. Creating the equivalents of popular packages. Almost forgot: like Python & R, Julia is free (looking at you Matlab and others).
I’m no computer scientist and currently just an aspiring data scientist, so I’m happy to hear any additions or errors in my blog. The future of Julia is discussed intensely in the science community, so I’ve heard. What I ‘d love to hear is your opinion on its prospects. Is Julia going to make it to the top or not? And why?
Bibliography
CRAN. (n.d.). Contributed Packages. Retrieved October 8, 2020, from CRAN: https://cran.r-project.org/web/packages/index.html
GeeksforGeeks. (2020, June 8). Compiler vs Interpreter. Retrieved October 8, 2020, from GeeksforGeeks: https://www.geeksforgeeks.org/compiler-vs-interpreter-2/
Hall, M. (n.d.). Julia in a nutshell. Retrieved October 8, 2020, from AgileScientific: https://agilescientific.com/blog/2014/9/4/julia-in-a-nutshell.html
Julia. (n.d.). Performance tips. Retrieved October 8, 2020, from Julia: https://docs.julialang.org/en/v1/manual/performance-tips/
Simplylearn. (2020, January 21). Things to Know About Julia Programming Language. Retrieved from Simplylearn: https://www.simplilearn.com/things-to-know-about-julia-programming-language-article#:~:text=It%20may%20seem%20like%20an,annual%20rate%20of%20101%20percent.
UCIDataScienceInitiative. (n.d.). Why Does Julia Work So Well? Retrieved October 8, 2020, from UCIDataScienceInitiative: https://ucidatascienceinitiative.github.io/IntroToJulia/Html/WhyJulia