I’ve been a long-time user of Xterm. I tried to switch to other terminal emulators several times because of Xterm’s broken Unicode support, especially regarding glyphs/emojis and multi-font substitution. These glyphs are part of many modern CLI tools and are often printed as blank squares in Xterm. More recently, I attempted to switch again, but every time I try, I’m discouraged by the additional latency added during typing. I’m not a super-fast typist. I average about 80 WPM for normal text with bursts for common terminal commands of up to 120 WPM. The text appears in the terminal, of course, but not as quickly as I would like. There is a noticeable delay, especially when comparing something like Xterm to xfce4-terminal. I’ve placed some hope in the recent development of GPU-accelerated terminals, e.g., wezterm, but it still felt as slow as xfce4-terminals. When I read the benchmarks, they often show how fast it can print a gigabyte text file to stdout, but honestly, this is something I’m not so interested in for everyday use. I found some other interesting benchmarks regarding terminal latency, but there were always some terminal emulators missing for which I would like to know the result, or the results were slightly outdated.
For the benchmark, I used Typometer[1], a tool designed to measure and analyze the latency of various applications that have text input. The test does not include keyboard latency or display latency, as Typometer emulates the keystrokes in software, and the screen capture is also in software, not via a physical camera in front of the screen. Hence, you can expect additional latency from the hardware, and these measurements represent only the latency that originates from the software stack. All versions should be either the latest stable version available via Arch Linux at the time of writing or the latest master commit. All tests were conducted on the same machine, a T14 Gen 1 (AMD Ryzen 7 PRO 4750U) with Arch Linux and Xorg (21.1.11-1).
Terminal Emulator | Min | Max | Avg | Stddev |
---|---|---|---|---|
xterm (389-1) | 2.8 | 9.8 | 5.3 | 1.1 |
alacritty (0.13.1-1) | 5.2 | 17.8 | 6.9 | 1.8 |
kitty-tuned (0.31.0-1) | 8.1 | 16.3 | 10.7 | 1.4 |
zutty (0.14-2) | 7.4 | 16.4 | 11.2 | 1.6 |
st (master 95f22c5) | 11.4 | 17.9 | 14.2 | 1.2 |
urxvt (9.31-4) | 18.4 | 22.7 | 20.4 | 0.8 |
konsole (24.02.0-1) | 16.4 | 26.8 | 20.7 | 2.2 |
kitty (0.31.0-1) | 11.5 | 34.4 | 23.8 | 2.6 |
wezterm (20230712.072601) | 11.3 | 40.9 | 26.1 | 7.2 |
gnome-terminal (3.50.1-1) | 29.0 | 32.3 | 30.2 | 0.8 |
xfce4-terminal (1.1.1-2) | 28.0 | 36.1 | 30.2 | 1.1 |
terminator (2.1.3-3) | 28.7 | 48.0 | 30.5 | 2.0 |
tilix (1.9.6-3) | 28.6 | 69.7 | 31.0 | 4.4 |
hyper (v3.4.1) | 28.1 | 58.9 | 39.8 | 5.7 |
Xterm yields the best results, and Hyper (a web-based terminal) has the worst results. This has met my expectations and matched the results from other blog posts. Hyper, with about 40ms latency, is not as bad as I thought. However, anything with more than 20ms I consider a noticeable delay, and everything below 10ms is fast enough for my needs. I find it quite interesting that kitty can be tuned to be more than twice as fast in terms latency. For kitty tuned I used the following settings:
I gathered these settings from another blog post about terminal latency[2], which is worth reading. Please note that the results in this blog post are not comparable with the results shown here because the author used a camera to measure the latency, which also includes the latency of the monitor.
I’ve also tested the following applications:
Application | Min | Max | Avg | Stddev |
---|---|---|---|---|
gvim (9.1.0000-1) | 4.3 | 31.7 | 8.0 | 5.4 |
alacritty+tmux+neovim (0.13.1-1+3.3_a-7+0.9.5-2) | 5.4 | 12.9 | 8.3 | 1.4 |
chromium (120.0.6099.216-1) | 9.1 | 28.6 | 19.6 | 6.2 |
firefox (121.0.1-1) | 10.3 | 28.3 | 24.1 | 2.5 |
Visual Studio Code (1.87.2-1) | 26.3 | 36.7 | 31.2 | 3.3 |
As we can see, the latency for Neovim inside tmux inside Alacritty (8.3 ms) is not much higher than just Alacritty (6.9 ms). Hence, tmux and Neovim add only about 1.4 milliseconds of latency, which is quite acceptable. We can also see that the latency of an HTML text area in Chromium or Firefox is more than double the Alacritty latency. So, if you often write in applications like Teams, then there is probably not much you can do about it, other than accept about 20 milliseconds of delay for typing. And you are also out of luck in terms of latency if your favorite code editor of choice is Visual Studio Code, as this editor clocks in with a 31.2 ms delay before any hardware latency considerations.
I’m quite satisfied with the results, especially now that I have found a decent alternative to Xterm, which has only 1.7 ms more latency - Alacritty. I’ve seen benchmarks in the past that measured higher values for Alacritty. Hence, I think the terminal latency has improved over time due to complaints on GitHub[3] that caught some attention from the maintainers (there’s also my thumbs up on that issue). For now, I will migrate my configs from Xterm to Alacritty and report back in the form of another blog post in case there are any issues.
One of the contributors[4] to the st terminal emulator reached out to me and mentioned the possibility of configuring and recompiling st for lower latency. One might ask why the latency settings are not zero altogether for the smallest possible delay. Here’s the answer by avih:
Generally speaking, there's a tradeoff between latency and throughput/flicker.The smaller the latency, the worse the throughput is (e.g. in cat huge.txt) because the terminal has to render more frequently, and the more flicker-prone it becomes, for instance when the terminal updates the screen before the application completed its “output batch” - which then requires another screen update once the output batch is complete, e.g. when holding page-down in auto-repeat in vim or less.
This behavior is not unique to st, but what is unique to st is that it’s configurable, and can be adaptive.
By default it’s adaptive between 8 and 32 ms, and tries to draw as soon as possible after the application-output batch completes, i.e. the terminal input becomes idle.
In response to this blog post, the default minlatency setting of st has now changed from 8 milliseconds to 2 to offer a smaller default latency for all users. I’ve chosen to rerun the tests, of course. Here are the new results:
Terminal Emulator | Min | Max | Avg | Stddev |
---|---|---|---|---|
st (master f20e169) | 4.5 | 10.4 | 6.2 | 1.0 |
st (custom f20e169) | 2.2 | 20.6 | 5.2 | 2.1 |
With the new commits, the master branch of st is now placed second, behind xterm and before Alacritty. However, if we custom-tune the settings for the lowest latency possible I chose minlatency = 0
and maxlatency = 1
then we have a new winner. Applying this custom tuning results in an average latency of 5.2 ms, which is 0.1 ms lower than xterm, and that’s with having a much more sane terminal without legacy cruft.
Coq is a proof management system that provides a formal language to write mathematical definitions, executable algorithms, and theorems together with an environment for semi-interactive development of machine-checked proofs. This tutorial will guide you through the process of installing Coq on Windows, Mac, and Linux, and then how to write a simple proof using coqtop. coqtop is the Read-Eval-Print Loop (REPL) for Coq. It allows you to interactively develop proofs. If you come from Haskell, you can think of Coq like GHC and coqtop like GHCi.
Installing Coq on Windows
Installing Coq on Mac
brew install coq
.Installing Coq on Linux
Now, we aim to prove that 1 + 1 = 2 using Coq.
Let’s create a file named hello_proof.v
and insert the following proposition that we seek to prove:
If we attempt to compile our proof using coqc hello_proof.v
, it will generate the following error, as expected:
Error: There are pending proofs in file./hello_proof.v: one_plus_one_is_two.
That is because we have an unproven statement in our file. Now, lets try to prove this proposition interactively.
Start coqtop by running coqtop -load-vernac-source hello_proof.v
in your terminal.
First, we need to enter proof mode by writing Proof.
, pressing enter, and then writing Show.
, followed by another enter, to view the current proof goal.
Welcome to Coq 8.16.1one_plus_one_is_two < Proof.one_plus_one_is_two < Show.1 goal ============================ 1 + 1 = 2
This is how it looks with with vim
(top) and coqtop
(bottom) in tmux
:
We have successfully loaded our proposition into coqtop and can now attempt to prove it using tactics. The first tactic I’d like to introduce is simpl
. The simpl
tactic reduces complex terms to simpler forms:
one_plus_one_is_two < simpl.1 goal ============================ 2 = 2
As we can see, the simpl
tactic has reduced our term 1 + 1
on the left side by evaluating it as 2
. Now, it’s quite obvious that the term 2 = 2
is indeed true. We can solve the last goal with reflexivity
, which is another basic tactic that solves the goal if it is a trivial equality, like in our case.
one_plus_one_is_two < reflexivity.No more goals.
After that we can write Qed.
to end our proof and finish the proof mode.
one_plus_one_is_two < Qed.
We can now put all the steps together into our hello_word.v
file:
and compile the proof with coqc hello_word.v
.
Let’s prove that for all natural numbers, n - n = 0
. In Coq, we can write this as follows:
This starts the proof with the induction tactic. This setups two goals, the base case and the inductive step.
2 goals ============================ 0 - 0 = 0goal 2 is: S n - S n = 0
The base case is trivial, with simpl.
we have 0 = 0
and then just tell Coq that this is really the same reflexivity.
1 goal n : nat IHn : n - n = 0 ============================ S n - S n = 0
Now, we are left with the inductive step S n - S n = 0
. Let’s assume the proposition holds for n, and we show it holds for n + 1. simpl.
applies simplification, reducing the expression S n - S n
to n - n
by definition of subtraction in Coq. rewrite IHn.
uses the inductive hypothesis (IHn) to replace n - n
with 0. Lastly, reflexivity.
asserts that S n - S n
simplifies to 0, which is true after the rewrite. Thus, the proof demonstrates that the proposition forall n, n - n = 0
holds true for all natural numbers n.
You might wonder how to come up with all these different tactics. Well, you can look them up, e.g., in the sheet from Cornell University, which is helpful: Coq Tactics Cheat Sheet, or examine various examples online. It requires some trial and error with simple proofs to become more proficient in proving with Coq. I can highly recommend the Software Foundations online book, which can be considered as the reference framework.
After successfully installing Coq and crafting a straightforward proof using the coqtop REPL, your journey into the meticulous world of formalizing mathematical proofs and programming language semantics is just beginning. Coq offers a robust toolkit for exploring these domains further. Keep exploring, and happy proving!
]]>A category consists of a collection of objects, denoted and, for every two objects , a set of morphisms , also called hom-sets, satisfying the following properties:
For every three objects , there exist a binary operation , called composition of morphisms, that satisfies the composition law:
Composition is associative: for all , , we have:
For each , there is a unique element (identity morphism), such that, for every , we have left and right unit laws:
It is common to express instead of and when indicating is a function from to , it’s typically written as rather than .
A category is a very general concept, the objects and morphisms can be anything, as long as they adhere to the stated conditions. The following is an example category with a collection of objects and collection of morphisms denoted , and the loops are the identity morphisms.
\begin{xy}\xymatrix{A \ar@(l,u)^{1_A}[] \ar_{g\ \circ\ f}[dr] \ar^f[r] & B \ar@(u,r)^{1_B}[] \ar^g[d]\\&C \ar@(d,r)_{1_C}[]}\end{xy}One interesting aspect that follows from the left and right unit laws is that the identity morphism is unique, so there really is just one way to loop back to itself.
Proposition. The identity morphism is unique.
Proof. Suppose that each of and is an identity morphism. Then by left and right unit laws we have: and , hence
In Haskell, Category is a type class that abstracts the concept of a mathematical category. In the context of Haskell, types are considered as objects and functions as morphisms.
In Haskell, one traditionally works in the category (->) called Hask, in which any Haskell type is an object and functions are morphisms. We can implement Hask as follows:
As we can see, cat
has simply been replaced by Haskell arrows. The id
function is the identity morphism that leaves the object unchanged. The (.)
function is a composition of morphisms, which obey category laws in pseudo notation:
Haskell faces some challenges when considered in its entirety as category Hask due to features like non-termination and bottom values. Therefore, when speaking about Hask it is often referred to a constrained subset of Haskell that excludes these problematic aspects. Specifically, this subset only permits terminating functions operating on finite values. It also resolves other subtleties. In essence, this pragmatic subset removes everything preventing Haskell from being modeled as a category.
The Category typeclass[1] can also be used with other structures that can be viewed as categories, not just functions between types. For example, it can be used with the Kleisli category of a monad, where morphisms are functions of type a -> m b
. The objects in this category are identical to the types in Haskell as found in Hask
. However, the transformation between these entities are represented by Kleisli arrows.
Kleisli arrows are a way of composing monadic programs. They are a notational feature that can be useful, but they don’t provide any additional functionality beyond what the monad already provides. The use of this syntax in Haskell, since it overwrites id
and standard function composition (.)
, can be quite confusing if used in this manner. Instead, Kleisli composition is often defined using the “fish” operator, as shown below:
As we can see <=< can be expressed in terms of >>=
and vice versa, hence they form an isomorphism. All of the above is already implemented in the standard Haskell library, so you can simply open an interactive Haskell interpreter (ghci) and test the following examples:
Here are some more examples:
Let and be categories and and be functors . Then a natural transformation from to is a family of morphism that satisfies the following requirements:
For every object in , a natural transformation from the functor to the functor assigns a morphism between objects of . The morphism is called the component of at .
Components must be such that for every morphism in we have: (naturality condition)
These requirements can be expressed by the following commutative diagram:
\begin{xy}\xymatrix{A \ar[r]_{F\ \ \ } \ar[d]_{f} \ar@/^1.5pc/[rr]^{\alpha_{A}\ \circ\ F} & F(A) \ar[r]_{\alpha_{A}} \ar[d]_{F(f)} & G(A) \ar[d]_{G(f)} \\B \ar[r]^{F\ \ \ } \ar@/_1.5pc/[rr]_{\alpha_{B}\ \circ\ F} & F(B) \ar[r]^{\alpha_{B}} & G(B)}\end{xy}Natural transformations are often denoted as double arrows, , to distinguish them in diagrams from usual morphisms:
\begin{xy}\xymatrix @=5pc {\mathcal{C} \rtwocell<5>^{F}_{G}{\alpha} & \mathcal{D}}\end{xy}In other words, a natural transformation is a way of transforming one functor into another while respecting the internal structure of the categories involved. Natural transformations are one of the most important aspects of category theory. Saunders Mac Lane, one of the founders of category theory, once said:
I didn’t invent categories to study functors; I invented them to study natural transformations.
In Haskell, we can define a natural transformation like so:
Or we could also define it the following way, as an infix operator (~>):
Again, the requirement of compatibility with the actions of the functors is not expressible as a type signature, but we can write it down as law in pseudocode:
Now Haskell supports parametric polymorphism, that means that a function will act on all types uniformly and thus automatically satisfies the naturality condition for any polymorphic function of the type:
where F and G are functors. The naturality condition in terms of Haskell means that it doesn’t matter whether we first apply a function, through the application of fmap
, and then change the structure via a structure preserving mapping; or first change the structure, and then apply the function to the new structure, with its own implementation of fmap
. [2]
Lets have a look at the following example:
This function returns Nothing in case of an empty list and the first element of the list in case of an non-empty List. This function is called safeHead
, because there is also a “unsafeHead” in the Haskell standard library, simply called head
. The unsafe variant throws an Exception in case the List is empty. We can prove by equational reasoning (or Coq if you like) that the naturality condition holds in case of safeHead
:
Here are some more natural transformations:
As we can see there is an infinite number of natural transformations.
You can open an interactive Haskell interpreter (ghci), load the functions and test the following examples.
An applicative, in category theory, is a lax monoidal functor with tensorial strength. In the following, we will present the definition as an endofunctor, since the concept has its origin in the context of functional programming where every functor is an endofunctor.
Let be a monoidal category. A lax monoidal endofunctor is a functor together with two coherence maps:
(the unit morphism)
such that the following diagrams commute:
\begin{xy}\xymatrix{(FA\ \otimes\ FB)\ \otimes\ FC \ar[r]^{\alpha} \ar[d]_{\phi_{A,B}\ \otimes\ FC} & FA\ \otimes\ (FB\ \otimes\ FC) \ar[d]^{FA\ \otimes\ \phi_{B,C}} \\F(A\ \otimes\ B)\ \otimes\ FC \ar[d]_{\phi_{A\ \otimes\ B,C}} & FA\ \otimes\ F(B\ \otimes\ C) \ar[d]^{\phi_{A,B\ \otimes\ C}} \\F((A\ \otimes\ B)\ \otimes\ C) \ar[r]_{F_{\alpha}} & F(A\ \otimes\ (B\ \otimes\ C)) \\}\end{xy}The natural transformations:
are part of the monoidal structure on .
Applicative functors are a relatively new concept. They were first introduced in 2008 by Conor McBride and Ross Paterson in their paper Applicative programming with effects.[1] In functional programming where every functor is an endofunctor and every functor applied to the monoidal category , with the tensor product replaced by cartesian product, inherently possesses a unique strength, resulting in every functor within being strong. In simpler terms, a strong lax monoidal functor is just a lax monoidal functor that also has the property of being a strong functor, and its strength coherently associates with the monoidal structure. When we apply this in the context of functors, this coherent association is automatically provided.[2]
The Applicative typeclass in Haskell looks slightly different then our definition of a lax monidal functor. However there is another typeclass in Haskell called Monoidal that reflects our definition. Moreover, there is a equivalence between the two typeclasses Applicative and Monoidal. This parallels our previous demonstration of the interchangeability between bind
and >>=
, as discussed in my post on monads. Let me first introduce the typeclass Monoidal and then we show that this is equivalent to Applicative.
Haskell Definition of Monoidal (Interface)
Please note that fa -> fb -> f(a, b)
is actually the curried version of(f a, f b) -> f (a, b)
Haskell comes with curry
and uncurry
as part of its standard library, which together form an isomorphism.
Hence we can also phrase Monoidal this way, and it aligns seamlessly with our categorical definition of a strong lax monoidal functor:
We have the usual monoidal laws (pseudocode):
Now that we have established the definition of Monoidal lets have a look at the equivalent Applicative definition in Haskell.
Haskell Definition of Applicative (Interface)
This is how to recover Applicative in terms of Monoidal:
And this is the reverse direction, Monoidal in terms of Applicative:
We’ve now formulated a two-way translation between Applicative and Monoidal, illustrating that they are isomorphic. This equality between Applicative and Monoidal can also be shown in a computer-checked proof in Coq.
Though the compiler does not enforce it, a proper instance of Applicative should comply with the applicative laws:
Now, lets have a look at some instances of Applicative.
An Instance of Applicative, the List Applicative
Another Instance, the Maybe Functor
All of the above is already implemented in the standard Haskell library, so you can also simply open an interactive Haskell interpreter (ghci) and test the following examples.
A functor, in category theory, is a structural-preserving mapping between categories. Given two categories, and , a functor associates each object in with an object in and each morphism in with a morphism in , such that:
for every object in ,
for all morphisms and in
That is, functors must preserve identity morphisms and composition of morphisms. We can rephrase these conditions using the subsequent commutative diagram:
\begin{xy}\xymatrix{F(A) \ar[r]_{F(f)} \ar@/^1.5pc/[rr]^{F(g\ \circ f)} & F(B) \ar[r]_{F(g)} & F(C) \\A \ar[r]^{f} \ar@/_1.5pc/[rr]_{g\ \circ\ f} \ar[u]_{F} & B \ar[r]^{g} \ar[u]_{F} & C \ar[u]_{F}}\end{xy}A Functor in Haskell is a typeclass that represents a type that can be mapped over, meaning that you can apply a function to every element of the type without changing its structure.
Haskell Definition of Functor (Interface)
The following condition must always hold:
An Instance of Functor, the List Functor
Another Instance, the Maybe Functor
All of the above is already implemented in the standard Haskell library, so you can simply open an interactive Haskell interpreter (ghci) and test the following examples.
Some more examples contains basically everything that can be mapped over:
A semigroup is an algebraic structure in which is a non-empty set and is a binary associative operation on , such that the equation holds for all . In category theory, a semigroup is a monoid where there might not be an identity element. More formally, a semigroup is a semicategory (a category without the requirement for identiy morphism) with just one object and the following conditions:
The set of morphisms (hom-set) is closed under composition: For every pair of morphisms in , their composition also belongs to
The composition operation is associative: For any three morphisms in , we have .
A type qualifies as a Semigroup if it offers an associative function (<>), allowing the merging of any two type values into a single one.
Haskell Definition of Semigroup (Interface)
Associativity implies that the following condition must always hold:
An Instance of Semigroup, the List Semigroup
Another Instance, the Maybe Semigroup
All of the above is already implemented in the standard Haskell library, so you can also simply open an interactive Haskell interpreter (ghci) and test the following examples.
Some more examples are:
The naturals numbers without zero under addition. This forms a semigroup because addition is associative.
The natural numbers under multiplication. This forms a semigroup because multiplication is associative, but is also a monoid since 1 serves as the identity, as any number multiplied by 1 remains the same.
Non-empty strings under concatenation. This forms a semigroup because string concatenation is associative.
In category theory, a Monoid is a triple (, , ) in a monoidal category (, , ) together with two morphisms:
These must satisfy the following coherence conditions:
We can rephrase these conditions using the subsequent commutative diagrams:
Monoids are a powerful abstraction that can be used to solve a wide variety of problems. Monoids are becoming increasingly important in computer science because they provide a versatile framework for combining elements, especially in the context of parallel and distributed computing. For example, if you need to combine values in a way that’s associative and has an identity, you can model your problem as a monoid.
Consider using monoids for aggregations on large data sets. Because of the associative property, the operation can occur in any order and still yield the same result, enabling parallel processing. The identity element provides a starting value for this computation. The classic examples of this are sum and multiplication on numbers, but also concatenation on strings or lists and more. Monoids are also used in the design of compilers and interpreters. For example, the abstract syntax tree of a program can be represented as a monoid.
The Monoid, by definition, requires us to implement two functions: the unit, which is called mempty in Haskell, where we have to provide a neutral element, and the multiplication <>
(mappend).
Haskell Definition of Monoid (Interface)
These have to obey the Monoid laws (<> infix notation for mappend) in pseudo notation:
An Instance of Monoid, the List Monoid
Another Instance, the Maybe Monoid
All of the above is already implemented in the standard Haskell library, so you can also simply open an interactive Haskell interpreter (ghci) and test the following examples.
Some more examples are:
The naturals numbers under addition. This forms a monoid because addition is associative, and 0 serves as the identity, as any number added by 0 remains the same.
The natural numbers under multiplication. This forms a monoid because multiplication is associative, and 1 serves as the identity, as any number multiplied by 1 remains the same.
Strings under concatenation. This forms a monoid because string concatenation is associative, and the empty string serves as the identity, as any string concatenated with remains the same.
A Monad is a triple where:
These must satisfy the following coherence conditions, known as the Monad laws:
This means that for any object in , we have:
We can rephrase these conditions using the subsequent commutative diagrams:
We can also write down the natural transformations in terms of their components. For each object of , the unit is a morphism , and the multiplication is a morphism , such that the following diagrams commute:
An application of this concept is that monads provide a way to express computations (in terms of morphisms) that include additional structure or side-effects (captured by the endofunctor ) in such a way that these computations can be chained together (via the natural transformation) and lifted over the monadic structure (via the natural transformation), and they do so in a way that is consistent (respecting the associativity and unit laws).
The Monad, by definition, requires us to implement two functions: the unit, which is called return in Haskell, where we just have to lift a value into the Monad (e.g., put a value into a list), and the multiplication join
.
Haskell Definition of Monad (Interface)
These have to obey the Monad laws:
We can now draw the commutative diagram for the Haskell definition of Monad:
The definition of a monad given here is equivalent to the one we typically use in Haskell.
We can easily define >>=
with join
and fmap
.
This operation is called bind (or is sometimes refered to as flatMap). The bind function can be used if you need to operate on the lifted value before collapsing. We can also translate the other way around and define join
in terms of >>=
and id
:
Hence join
is bind applied to the identity function. These two constructions are reverse to each other and they translate the monad laws correctly. Now we lets have a look at some concrete examples (instances of Monad).
An Instance of Monad, the List Monad
Another Instance, the Maybe Monad
All of the above is already implemented in the standard Haskell library, so you can simply open an interactive Haskell interpreter (ghci) and test the following examples.
GPT (Generative Pre-trained Transformer) models, such as GPT-4 by OpenAI, have been revolutionizing natural language processing (NLP) with their incredible capabilities to generate human-like text, translations, and even code. However, their large sizes make them computationally expensive, limiting their real-world deployment. Quantization is a technique that can significantly optimize these models, reducing the memory footprint and speeding up inference without sacrificing much of their performance. In this blog post, we will explore quantization in the context of GPT models and discuss its benefits, challenges, and practical applications.
Quantization, in the context of neural networks, is a technique that reduces the precision of model weights and activations to lower numerical formats, such as integers or lower-precision floating-point representations, while retaining a model’s performance as much as possible. In the context of GPT models, quantization can reduce the memory footprint and computational requirements by converting 32-bit floating-point (FP32) weights and activations to more efficient formats such as 16-bit floating-point (FP16), 8-bit integers (INT8), or even lower[1][2]. Suppose we have a 16-bit floating-point parameter, 3.1415. We can quantize this to an 4-bit integer, 3, which reduces the size by a factor of 4. Although this process sacrifices precision, the result, 3, can be sufficient in many cases.
One of the most significant benefits of quantization is model compression. By converting continuous-valued weights and activations to discrete representations, the memory footprint of the model can be reduced dramatically. This not only allows GPTs to be stored on devices with limited storage capabilities but also reduces the amount of data that needs to be transferred when deploying the models in cloud-based or distributed systems. The following table provides an overview of the memory usage of different llama.cpp models, to get some idea of the reduction possibilities:[3]
Parameters | Original Size (16-bit) | Quantized Size (4-bit) |
---|---|---|
7B | 13 GB | 3.9 GB |
13B | 24 GB | 7.8 GB |
30B | 60 GB | 19.5 GB |
65B | 120 GB | 38.5 GB |
Furthermore, quantization can result in substantial reductions in the computational power required to perform inference with GPT models. This is especially important when deploying GPTs on edge devices or mobile platforms, where energy efficiency is a crucial concern. By using quantized models, these devices can run GPT-based applications with lower latency and reduced power consumption.
While quantization offers significant benefits in terms of model compression and deployment, it is essential to consider the potential impact on performance and accuracy. Quantizing a GPT model inevitably introduces some approximation errors due to the conversion of continuous-valued parameters to discrete representations. The degree of error depends on the specific quantization technique employed and the number of discrete levels used. In order to counteract such effects, it is beneficial to integrate quantization-aware training methods into the model’s learning procedure.
Quantization-aware training[4] involves simulating the effects of quantization during the training process, allowing the model to learn to adapt to the approximations introduced by quantization. This can be achieved by incorporating quantization operations into the forward and backward passes of the training algorithm. By doing so, the model learns to compensate for the quantization errors, leading to more robust performance when the final quantized model is deployed.
Moreover, fine-tuning is another crucial step in optimizing quantized GPT models. Once the model has been quantized, it can be further refined using a smaller dataset, typically specific to the target application. This fine-tuning process helps to adapt the quantized model to the particular nuances of the application domain, ensuring optimal performance and accuracy.
Recent research has demonstrated that GPT models can be effectively quantized with minimal impact on performance. Techniques such as mixed-precision training[5] have proven to be particularly effective in maintaining the accuracy of quantized GPT models. By carefully selecting the appropriate quantization strategy and fine-tuning the model, it is possible to achieve a balance between model compression and performance that meets the requirements of specific applications.
Here is a collection of practical use cases for quantized GPT models:
Consumer Hardware: Quantized GPT models can be integrated into mobile and desktop applications, offering on-device natural language understanding capabilities without relying on cloud services. This enables privacy-sensitive applications and reduces latency.
Edge Computing: Quantized GPT models can be deployed on edge devices, such as IoT gadgets, to offer real-time NLP capabilities. This approach allows for decentralized processing and reduces the need for constant communication with centralized servers, saving bandwidth and improving responsiveness.
Data Center Optimization: Deploying quantized GPT models in data centers can lead to more efficient resource utilization, lowering energy consumption and reducing operational costs. This is particularly beneficial for large-scale NLP services that handle high volumes of user queries.
Quantization is an essential technique for optimizing GPT models, making them more accessible and deployable in real-world applications. By reducing memory footprint, speeding up inference, and improving energy efficiency, quantization unlocks the potential of GPT models on memory-constrained devices, enables real-time NLP capabilities. Despite its challenges, such as potential loss of accuracy and hardware compatibility, quantization is a critical step toward the widespread adoption of GPT models across various platforms and applications.
As the adoption of GPT models continues to grow, the need for optimization techniques like quantization becomes increasingly important. Researchers and practitioners must keep exploring novel quantization methods to further improve the efficiency of these models, addressing challenges and hardware limitations along the way. By investing in these optimization efforts, we can ensure that GPT models become even more accessible and scalable, revolutionizing the field of natural language processing and enabling a wide range of applications across various industries.
Future research in the field of quantization for GPT models will likely focus on developing new techniques to further optimize the trade-off between model compression and performance. Additionally, the development of hardware accelerators specifically designed to handle quantized models could help to unlock the full potential of quantization in GPTs.
Have you ever found yourself contemplating the mysteries of the cosmos? Questions like: how will the universe end? Will a new universe emerge? Are there multiple universes? What happened before the Big Bang? Throughout history, various religions have attempted to answer these questions by positing the existence of an almighty being or deity.
However, not everyone is satisfied with this approach. Some people prefer to disregard these questions altogether, arguing that they have no bearing on our daily lives. Others might jokingly accept “42” as the answer, as proposed by Douglas Adams in his novel, “The Hitchhiker’s Guide to the Galaxy.”
But for those who are genuinely curious and seek answers beyond religious explanations, the field of science offers intriguing theories and hypotheses. While no single theory can definitively answer all these questions, and empirical evidence remains scarce, it is nonetheless fascinating to explore current scientific ideas about the fate of our universe.
In this blog post, we’ll delve into the world of cosmology and examine some of the leading theories that attempt to shed light on the ultimate destiny of our universe.
The Big Crunch is a theoretical model predicting the universe’s demise, in which the universe contracts and collapses into itself, essentially reversing the Big Bang. This implosion would be caused by the gravitational force of all matter in the universe, drawing everything back together. In this scenario, all matter would eventually compress into a singularity—an infinitely dense point surrounded by a black hole, containing all matter and energy that once made up the universe, and from which nothing, including light, could escape.
Evidence for the Big Crunch is rooted in the observed acceleration of the universe’s expansion, believed to be driven by dark energy—a mysterious force causing the universe to expand at an increasing rate. If dark energy’s influence persists, the universe could eventually expand faster than the speed of light, trapping all matter and energy, and culminating in the Big Crunch.
However, several factors could prevent the Big Crunch, such as dark energy’s influence waning over time, causing the universe’s expansion to slow and cease. Alternatively, if the universe contained enough matter, its gravitational force could halt the expansion and trigger contraction. However, this would require far more matter than currently observed, leaving the existence of such a vast amount of matter uncertain.
The Big Crunch remains a speculative concept with an unknown likelihood. The universe may follow one of the other proposed models or an entirely different, yet-to-be-understood path.
The Big Bounce theory builds upon the Big Crunch, suggesting that after the universe collapses into a massive black hole, it could reform and experience another Big Bang. In this cyclic model, the universe would undergo endless cycles of expansion and collapse, driven by the gravitational attraction of matter. The idea of a cyclic universe has appeared in various cosmological models and has even been a part of some religious and philosophical beliefs throughout human history.
However, since current observations do not support the Big Crunch, the Big Bounce is an even more improbable outcome. One major challenge for the Big Bounce theory is explaining how the universe could transition from a contracting state to an expanding state. Some models propose that the universe passes through a “bounce phase,” during which the forces of gravity and dark energy balance each other out. Other models suggest that new physics or undiscovered particles may play a role in the transition from contraction to expansion.
Despite its intriguing nature, the Big Bounce remains a speculative theory with limited empirical evidence. More research and observations are required to understand whether a cyclic universe is a plausible outcome for the cosmos.
The Big Rip is a hypothetical scenario that predicts the universe’s end, originating from the concept of dark energy—a mysterious force responsible for the universe’s expansion. Theories propose that dark energy could eventually become so powerful that it tears apart the very fabric of space and time, leading to the ultimate destruction of the universe.
The process of the Big Rip would start with the acceleration of the universe’s expansion, as dark energy’s influence intensifies and drives further acceleration. Over time, dark energy would reach a critical threshold, causing galaxies, stars, and planets to disintegrate.
As dark energy continues to grow stronger, it would eventually attain the ability to dismantle even atoms themselves. The universe would be reduced to a chaotic mixture of subatomic particles, with all matter and structure obliterated. The timeline for the Big Rip is uncertain, with estimates ranging from tens of billions of years to a more rapid escalation of dark energy, potentially resulting in an earlier occurrence.
In conclusion, the Big Crunch, Big Bounce, and Big Rip represent three theoretical possibilities for the universe’s fate, each driven by different cosmic forces and mechanisms. While our understanding of these concepts is still limited, ongoing research in cosmology and astrophysics continues to unravel the mysteries of the universe, providing valuable insights into the nature of existence and our place within it.
Current data indicates that the universe began with the Big Bang around 13.8 billion years ago and has been expanding ever since. Observations show that this expansion is accelerating[1], driven by dark energy. In a few trillion years, all but the nearest galaxies will be too far away to see. The most likely scenario for the universe’s end, based on our present understanding of physics, is the heat death.
The heat death of the universe, also known as the Big Freeze or Big Chill, is a hypothetical future scenario in which the universe becomes incapable of supporting life or other forms of complexity. This would happen as the universe reaches maximum entropy, with matter and energy evenly distributed, preventing the formation of new celestial structures. Persistent acceleration and increasingly diffuse matter will make it increasingly difficult for matter to interact with other matter, leading to a decline in temperature until all matter ceases to move.
Despite the potential for a heat death, the process would be gradual, unfolding over an immeasurable amount of time. It is likely that other catastrophic events, such as a Big Crunch, will occur long before the heat death becomes a reality. The future of the universe is largely unknown, and other factors may prevent the heat death from ever happening.
In an expanding universe with a non-zero cosmological constant, mass density decreases over time. This would lead to the ionization and fragmentation of all matter into solitary stable particles, causing complex structures to vanish. Star formation will eventually cease as dense stellar remnants lock up any remaining material, possibly around 100 trillion years from now.
In around a googol year, the last objects in the universe, supermassive black holes, will evaporate through Hawking radiation. Following this, the cosmos enters the Dark Era, where matter is only a distant memory. In approximately years, another universe could potentially be created by random quantum fluctuations or quantum tunneling. Over vast periods of time, a spontaneous entropy decrease would eventually occur via the Poincaré recurrence theorem.
Although the heat death of the universe is a hypothetical scenario, it underscores the cosmos’ vastness and complexity, reminding us that the universe continues to evolve and change in ways beyond our understanding. While the future remains uncertain, the pursuit of knowledge and understanding about the universe will always be a captivating and inspiring endeavor.
The Boltzmann brain concept, originating from the Austrian physicist Ludwig Boltzmann, challenges our understanding of consciousness, the universe, and the arrow of time. The idea suggests that self-aware entities can spontaneously arise from random fluctuations in a state of maximum entropy, complete disorder, and randomness. This phenomenon raises questions about the likelihood of the universe’s properties, the existence of intelligent beings, and whether our existence is just a random fluctuation rather than a result of a complex evolutionary process.
In string theory, the anthropic principle is invoked to address the challenge of selecting the correct vacuum state from a vast number of possibilities. It proposes that we should weight the probability of each universe based on its ability to support intelligent life. This principle forces us to think deeply about the nature of our universe and the conditions necessary for life to exist.
The distinction between a Boltzmann brain and the cosmos is that the latter can endure, whereas the former is a momentary quantum mechanical fluctuation. These brains appear briefly in our universe before disappearing and ceasing to exist. String theory predicts that we are more likely to be Boltzmann brains than inhabitants of our universe.
The concept of Boltzmann brains raises questions about our existence and challenges the arrow of time. If these brains are possible, events could move from a state of disorder to order, suggesting that the arrow of time is not a fundamental property of the universe but a result of initial conditions.
Despite the challenges, Boltzmann brains remain an essential and fascinating topic for physicists and philosophers. Researchers study them using statistical mechanics, thermodynamics, and quantum mechanics to develop mathematical models and explore their implications. The study of Boltzmann brains has the potential to provide new insights into the nature of consciousness and the origin of the universe.
In conclusion, Boltzmann brains are a captivating and challenging concept that raises many questions about the nature of the universe and our place in it. The idea challenges our understanding of the arrow of time and the origins of consciousness and has the potential to provide new insights into the fundamental nature of the universe. As an active and exciting area of research, the study of Boltzmann brains will continue to fascinate and intrigue scientists and philosophers alike.
If you’re a programmer you are likely already using Monads quiet frequently. One popular example is Future in Java or Promise in JavaScript. You probably have seen or written code like fetch(url).then(do this).then(do that)
or you’ve used the async/await
syntax. You might have noticed that you cannot get a value out of the Promise. As soon as you write await
your function has to be async
(has to be a Promise itself). Or when you use .then()
you cannot get the value out of the then
. This is because the value only exist after your Promise (the computation) has be resolved, e.g. an API call has been made and the result has arrived.
In other languages like Haskell also things like Input/Output e.g. printing to the terminal or reading from a file happens to be inside a Monad, similar to the Promise in JavaScript. Because similar to JavaScript where you have to write the keyword async
to your function definition in order to use await
, in Haskell you have to specify the Monad type in the type annotation. For instance String -> IO String
is a function that takes a name for an environment variable e.g. $HOME
and gives you the contents of that variable.[1] This operation happens to be of type IO String
and not just String
because the contents the environment variable is dependent on your system and can change over time. But pure functions by definition always provide the same output for the same input. Therefore you get back an IO String
which basically means that as soon as the containing IO operation has been executed you will get back the contents of the environment variable. This is actually pure, because you always get back the same IO computation given the same input.[2] Now, if you have a function like String -> String
then you can be sure that there will be no side effects and no IO computation involved. And most of your functions can look like that, which allows you very easy reasoning of what kind of things can happen in your program, just by reading the type definition.
In this post I want to show you how easy it is to write your own Monad instance in Phyton. I have chosen to implement a list Monad, because it provides easy intuition and has much similarities with the Haskell implementation. We start by writing a Functor and Applicative instance for list in Python.
This is how Functor is defined in Haskell. Although it is named class you can think of it as an interface. This interface states that in order for something to be a Functor it has to implement the fmap
(or map
) function. A Functor is also often called “mappable”, because it is something we can map over like a list.
In Python we can use type hints to implement the Functor interface for list.
As you can see the implementation of the list Functor is very straight forward. The fmap
function simply takes a list fa
and applies the function given in the second argument of fmap
to every element of the list. As you can see we are now able to apply lambda functions to every list element and chain the results by repeated fmap
calls. There is also a builtin function in python for [f(x) for x in fa]
simply called map
. The same code, using the existing implementation, in Haskell would look like: (++"0") <$> (show) <$> (+ 1) <$> [1,2,3]
, where <$>
stands for fmap
as operator.
Next thing we are going to implement is Applicative. The interface for Applicative in Haskell looks as follows:
We have to implement two functions. The first one is very simple. Given a type a
lets say Integer, we simply have to put it into our structure, e.g. Int -> [Int]
. And the second function states, that given two Applicative, e.g. two lists, and one containing a list with a function, we simply apply the functions of that list to every element of the other list.
This is very similar to the implementation of Applicative for lists in Haskell fs <*> xs = [f x | f <- fs, x <- xs]
and [(++"0")] <*> ([(show)] <*> ([(+ 1)] <*> [1,2,3]))
yields the same result.
The Monad interface requires us to implement two functions. The return
is very similar to pure
, we just have to lift a value into the Monad, e.g. put a value into a list. The second function >>=
also called bind, takes a function that converts an a
to m b
e.g. from a -> [b]
in case of lists.
Now lets compare our Python list Monad implementation with the Haskell implementation.
As we can see the implementation is very similar, the bind function consists of concat
and map
, also called flatmap
. And this is how we can use the monadic bind function in Haskell [1,2,3] >>= \x -> [(x+1)] >>= \x -> [(show x)] >>= \x -> [x ++ "0"]
. Here you can find the full implementation of Functor, Applicative and Monad for list in Python:
https://gist.github.com/madnight/b0ae13f7908641655da688ebe7de22cb
Have you ever been in the situation that you just wanted to install a single new package, but pacman couldn’t find it, because your local package database is outdated? If so then you usually have two options. Perform a full system upgrade with pacman -Syu
[1] and a potential reboot in case of a new kernel or do a partial upgrade. Upgrade your local package database and only install the package plus all it’s dependencies in the newest version. The problem is that partial upgrades are unsupported[2]. Therefore, sooner than later, you might end up with a broken installation (missing .so files, wrong glibc version, kernel does not boot…). This might not be a big deal for a seasoned Archer. All you have to do is to arch-chroot from a live USB stick (some might have a Arch Linux Live USB stick always plugged in just in case) and fix the system. But this is at least time consuming and maybe a bit annoying. However, there’s a third, lesser known option.
The Arch Linux Archive (ALA), stores official repositories snapshots, iso images and bootstrap tarballs across time. It keeps packages for a few years on a daily bases. The most common use case for ALA is a full system downgrade in case something went wrong. With the Arch Linux Archive you are able to pin down all your packages to a specific point in time by defining the ALA as your only mirror in the pacman mirrorlist. This allows you to install any package from core, extra or community even if you are let’s say 2 month behind the current Arch Linux upstream.
Replace your /etc/pacman.d/mirrorlist
with the following content:[3]
And replace the date 2042/01/01
with the current date or any date you wish (>=2014). Now you are always able to install any package without upgrading. But, upgrading you should. Since pacman -Syu
will not offer any new updates you have to update your mirror first. You could either manually edit the mirrorlist bump the date and pacman -Syu
again or you can put the following bash function in you .bashrc
or .zshrc
which will do that for you automatically:
There are different kind of performance indicators for disks. The most commonly known is probably throughput. When you use a USB stick you will immediately notice if it’s USB 2.0 or 3.x in case of a GiB sized data transfer. Another important performance indicator for storage is IOPS. IOPS is the number of I/O operations that the disk can handle per second. For example, a typical 7200U SATA HDD has about 75 IOPS, while a Samsung SSD 850 PRO has about 100k IOPS[1]. You’ve probably noticed a significant performance boost after replacing your HDD with a SSD for your operating system.
To reach the maximum IOPS and throughput limits, the applications have to issue I/O requests with enough parallelism. If they don’t, well then disk latency becomes the bottleneck. Disk latency is the amount of time it takes to process a I/O transaction. It can be measured with a tool called ioping, which works very much like the ping tool for hosts.
ioping is available for common Linux distributions and BSD.
If you are running on Windows you can download ioping-1.2-win32.zip then unzip and run the ioping executable. In case your OS is not listed you can try to build ioping from source see: https://github.com/koct9i/ioping
If you know how ping works, then you already know how to use ioping, just write the command and give it a directory path as argument. Here you can see an example run for the current directory:
As we can see the average response time for my SSD is about 800 us (0.8 ms), which results in 1260 sequential IOPS. Even though the SSD could achieve something like 100k IOPS in parallel, it can do only a little more than 1k IOPS on sequential request.
It’s also possible to ping RAM in case it’s mounted on /tmp
, which is the default case under many linux distributions. If you want to ping the memory 10 times run ioping -c 10 /tmp
. I did that for RAM and some other devices and collected the results in the following table:
Device | Latency | IOPS | Note |
---|---|---|---|
RAM | 22 us | 48000 | DDR3 1600MHz (PC3L 12800S) |
SSD | 796 us | 1240 | TOSHIBA THNSNJ12 |
iSCSI | 1.5 ms | 649 | Hetzner Cloud Storage (Ceph block device) |
HDD | 14 ms | 73 | HGST HTS725050A7 |
SSHFS | 26 ms | 40 | Hetzner VPS (20 ms network ping) |
As we can see a fast SSD over network mount can easily beat a local HDD. The I/O latency for Hetzner Cloud Storage is about 1.5 ms. This tells us that their SSD based Ceph cluster must be in the same data center as the VPS, that makes sense. A mount over ssh with sshfs reveals that sshfs itself adds about 6 ms of latency on top of the network latency. Although the drive itself is a SSD, the network latency turns it into a very slow filesystem mount with about half the performance of a local HDD. It is possible to calculate sequential IOPS, since it follows from the latency with IOPS = 1/, whereas is latency in seconds.
We can conclude that I/O latency plays an important role for many applications, because I/O operations often happen to be sequential. This is similar to the importance of single-thread performance in a multi-core CPU architecture. While it is good to have many cores available, the single-thread performance of each core is still very relevant for the overall CPU speed in practice. The reason for that is that many applications runs single-threaded, like all coreutils except for sort[2], and can only utilize one core at a time.
In category theory, hom-sets, are sets of morphisms between objects. Given objects and in a locally small category, the hom-set is the set of all morphisms from A to B.
\begin{xy}\xymatrix{A \ar@/^/[r] \ar@/^1pc/[r] \ar[r] \ar@/_/[r] \ar@/_1pc/[r] &B}\end{xy}Hom-sets itself give rise to another category where hom-sets are objects and arrows between hom-sets are hom-set morphisms. A hom-set morphism is defined as:
, for
, for
such that the following diagram commutes:
\begin{xy}\xymatrix@!C=4.0cm@!R=1.0cm{\text{Hom}(A,B) \ar[dr]|-{\text{Hom}(h,f)} \ar[d]_{\text{Hom}(A,f)} \ar[r]^{\text{Hom}(h,B)} &\text{Hom}(C,B) \ar[d]^{Hom(C,f)} \\\text{Hom}(A,D) \ar[r]_{\text{Hom}(h,D)} & \text{Hom}(C,D)}\end{xy}This can be translated to Hask, the category with Haskell types as objects and functions as morphisms. The previous morphisms are now functions:
f :: b -> dh :: c -> a
and , such that:
\begin{xy}\xymatrix@!C=4.0cm@!R=1.0cm{a\ \rightarrow\ b \ar[dr]|-{c\ \rightarrow\ a\ \rightarrow\ b\ \rightarrow\ d} \ar[d]_{a\ \rightarrow\ b\ \rightarrow\ d} \ar[r]^{c\ \rightarrow\ a\ \rightarrow\ b} &c\ \rightarrow\ b\ \ar[d]^{c\ \rightarrow\ b\ \rightarrow\ d} \\a\ \rightarrow\ d\ \ar[r]_{c\ \rightarrow\ a\ \rightarrow\ d} & c\ \rightarrow\ d}\end{xy}Let’s have a closer look at the morphism from to . So we have a and a morphisms that tells us we can go resulting in . Now, this makes sense if we remember how function composition works:
\begin{xy}\xymatrix { c \ar[r]^{f'} \ar[dr]_{g' \circ {f'}} & a \ar[d]^{g'} & \\ & b &&&&&}\end{xy}We can see that is the function composition . Thus, a hom-set morphism is simply morphism pre-composition and is just morphism post-composition . Furthermore, , which you can think of as , is called a covariant hom-functor or representable functor from the category to the category , with and .
When the second position is fixed (which you can think of as: ) then it’s called a contravariant functor. Consequently, if non of the arguments of the hom-functor are fixed then we have a hom bifunctor also called profunctor, see for instance the diagonal arrow with pre- and post-composition.
]]>It’s often said that simple things which matter in the real world are quite hard to write in Haskell. And indeed it requires more syntax to program in an imperative mutating style in Haskell when compared to Python or JavaScript.
In this post we will compare these three languages with simple examples and see how much more syntax we need to write in Haskell to do imperative style programming.
A simple for loop with print to stdout
.
In Python:
In JavaScript:
In Haskell:
Result:
The similarities are quiet obvious and none of these snippets is more or less complex than the others. One should note that the for
in Python and JavaScript are native language keywords, whereas the Haskell version forM_
is a function.
One common task in programming is to create endless loops that does something and then sleeps for a while and repeat. The following example implements a while
true loop and logs the current date to a temporary file.
In Python:
In JavaScript:
In Haskell:
Result:
Please note that in order to use the await
keyword in JavaScript it is necessary to put it in an async
function. I omitted this kind of extra “noise” to make the syntax comparisons of the while
loop more clear. I also removed the import statements for Python and Haskell. Haskells main
function is also ommited. As we can see there is not much difference between the code examples. The Haskell versions of doing I/O (reading time, writing to file and sleep) is syntactically the shortest one. Of course high expressivenesses, and thus fewer lines of code, does not imply simplicity. A language that is very terse tend to be more complex to understand. That is because it requires the programmer to know the meaning of different short names, standard functions and operators.
In Python:
In JavaScript:
In Haskell:
Result:
As we can see modifying the state of a list is a bit more complex (requires more syntax) in Haskell than in Python or JavaScript. The benefits of this additional syntax in Haskell is that the State is fully encapsulated in the State Monad which makes this computation absolutely pure.[1] This is similar to frameworks like react + redux where you modify the state in a pure manner, but natively available in Haskell and considered default for state manipulation.
The code snippets shows the similarities between the languages. Given the fact that it’s not necessary to develop an deep understanding for mathematical concepts from category theory for e.g. Functor or Monoid, then Haskell can be seen as a nice strongly typed and purely functional programming language with many similarities to scripting languages like JavaScript or Python. So one approach is to learn Haskell in an imperative, pure, functional style and simply ignore the higher concepts of the language until you are either really interested in them or you think you need it.
(Just a) small collection of Haskell operators with some funny names for them. Haskell tutorials and discussions can sometimes be very dry, so lets “Put the fun back into computing” (distrowatch).
Package | Module | Description |
---|---|---|
base | Control.Applicative | Sequential application. |
Definition:(<*>) :: f (a -> b) -> f a -> f b
(<*>) = liftA2 id
Example:
Other names: ap, apply
Package | Module | Description |
---|---|---|
base | Control.Applicative | Associative binary operation. |
Definition:(<|>) :: f a -> f a -> f a
(<|>) = (++)
Example:
Other names: or, alternative
Package | Module | Description |
---|---|---|
base | Control.Applicative | A variant of <*> with arguments reversed. |
Definition:(<**>) :: f a -> f (a -> b) -> f b
(<**>) = liftA2 (\a f -> f a)
Example:
Package | Module | Description |
---|---|---|
base | Control.Monad | Left-to-right Kleisli composition of monads. |
Definition:(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> (a -> m c)
f >=> g = \x -> f x >>= g
Example:
Package | Module | Description |
---|---|---|
base | Control.Monad | (>=>), with arguments flipped. |
(<=<) :: Monad m => (b -> m c) -> (a -> m b) -> (a -> m c)
(<=<) = flip (>=>)
Example:
Package | Module | Description |
---|---|---|
base | Control.Category | Left-to-right composition. |
(>>>) :: cat a b -> cat b c -> cat a c
f >>> g = g . f
Example:
Package | Module | Description |
---|---|---|
base | Prelude | Right-to-left composition. |
Definition:(.) :: (b -> c) -> (a -> b) -> a -> c
(.) f g = \x -> f (g x)
Example:
Notation | Signature | Description |
---|---|---|
>>= | (>>=) :: Monad m => m a -> (a -> m b) -> m b | bind |
>> | (>>) :: Monad m => m a -> m b -> m b | then |
-> | to | |
<- | bind | |
<$> | (<$>) :: Functor f => (a -> b) -> f a -> f b | fmap |
<$ | (<$) :: Functor f => a -> f b -> f a | map-replace by |
!! | (!!) :: [a] -> Int -> a | index |
! | strict | |
++ | (++) :: [a] -> [a] -> [a] | concat |
[] | ([]) :: [a] | empty list |
: | (😃 :: a -> [a] -> [a] | cons |
:: | of type | |
| | lambda | |
@ | as | |
~ | lazy |
The topic of this post is a comparison of different programming language popularity rankings and why I decided to add yet another programming language popularity index. Many different programming language popularity rankings can be found in the Internet. Three popular ones that show up on the first Google search page are:
It’s not easy to choose an objective indicator for a programming popularity ranking. The following table provides a small overview of existing solutions.
Features | TOIBE | RedMonk | PYPL |
---|---|---|---|
Data source | Search engines | GitHub | Google Trends |
Available dataset | 5,000$[1] | GitHub Archive | Google Trends |
Metric | Intransparent | Pull Request | Tutorial searches |
Years covered | Since 2001 | Since 2004 | Since 2004 |
TOP 3 in 2018 | Java, C , C++ | JavaScript, Java, Python | Java, Python, PHP |
The major criticism of the TOIBE index is that it is actually way behind of what’s actually going on in the programming community. For example, the authors of PYPL describe the TOIBE index as lagging indicator.[2] Another problem is the availability of the TOIBE dataset. It’s fairly safe to assume that almost no one is willing to pay 5.000$ to be able to see how the ranking is actually build. This makes the index intransparent and hard to reproduce. Even if one were to buy the dataset it’s still a mystery why and how TOIBE choose different percentage weights for different data sources resulting in a arbitrary metric, where the ranking can be adjusted just by playing around with the parameters. Because of this we can conclude that the TOIBE Index is a slow and subjective indicator, which is in many cases the opposite of what you want. Let’s see if RedMonk can do any better.
The RedMonk Programming Language Ranking is published in irregular intervals. The latest release was in January 2018. According to RedMonk they use the following criteria to build their ranking:
The RedMonk ranking makes some sensible decision on how to improve the quality of the raw dataset by applying some filters (e.g. no forks) while trying to stay as neutral as possible. However they are missing features, such as historical data and a historical chart.
The PYPL PopularitY of Programming Language Index[3] is created by analyzing how often language tutorials are searched on Google. The more a language tutorial for a certain programming language is searched on Google, the more popular the language is assumed to be. Its dataset is based on Google Trends, which makes it quite transparent. The main drawback of PYPL is that it only compares the TOP 22 languages (as of July 2018) and there are no historical rankings. They only provide a historical chart.
There are other rankings worth noting. The octoverse which is a official GitHub source and shows some additional interesting statistics about GitHub. However, their ranking only shows the top 15 programming languages and it does not offer a history, nor does it reveal any informations on how the underlying data has been aggregated. Another one is the Programming Language Popularity chart on https://archive.is/f1ZBZ, which shows a bubble chart with StackOverflow (tags) / GithHub (lines changes) ratio for every language on GitHub in February 2013. It seems as though the chart didn’t received any update since then and can therefore be considered outdated.
Since non of the presented approaches matches my criteria for a sensible, neutral, reproducible, forkable, up to date solution, I decided to create a page that is not based on multiple indicators and weights, but on raw data from one source with a flexibel unbiased metric. The idea to create a neutral and open source analysis of programming language popularity that are used on GitHub is nothing new. The GitHub project http://githut.info provides a good solution. Unfortunately, it received its last update in 2014 and there are many unresolved issues on GitHub, hence we can consider this project as outdated and unmaintained. So I came up with a new approach GitHut 2.0 a successor of GitHut. It shows a ranking with the top 50 languages based on the last quarter. A language trend is calculated as difference from the same quarter of the year before. The percentages shown are the actual fractions of Pull Requests, Pushes, Stars and Issues, which represents the underlying metric of the ranking.
I have to admit that it’s impossible to find a data source that approximates the entire programming community, because many projects are closed source. Another consideration that comes to mind: It is not said that the developers are happy with their bread and butter language at work, for instance the Microsoft Kernel is written in C++. Should a popularity index somehow respect that the developers are free to choose the language and are not forced to do so? On the other hand there are developer surveys where programming language that are not largely used in industry, such as Haskell, have many fans and that’s what they write their hobby projects on weekends. However, I think that my approach offers a good approximation on what is considered to be a current popular programming language choice in the developer community.
TOIBE Frequently Asked Questions (FAQ)
Q: I would like to have the complete data set of the TIOBE index. Is this possible?
A: We spent a lot of effort to obtain all the data and keep the TIOBE index up to date. In order to compensate a bit for this, we ask a fee of 5,000 US$ for the complete data set. The data set runs from June 2001 till today ↩︎
PYPL FAQ
The TIOBE Index is a lagging indicator. It counts the number of web pages with the language name. Objective-c programming has over 20 million pages on the web, [s] while C programming has only 11 million. [s] This explains why Objective-C has a high TIOBE ranking. But who is reading those Objective-C web pages ? Hardly anyone, according to Google Trends data. Objective C programming is searched 30 times less than C programming. [s] In fact, the use of programming by the TIOBE index is misleading (see next question). ↩︎
PYPL Description
The PYPL PopularitY of Programming Language is an indicator based on Google Trends, reflecting the developers searches for programming language tutorial, instead of what pages are available. ↩︎
Although this post is about bspwm (binary space partitioning window manager), lets start with i3 and why I don’t use it, beside its excellent documentation. i3 is a tiling window window manager, started at as fork of wmii, written by Micheal Stapelberg. One popular feature of most tiling window managers, are visual gaps between application windows. The main problem of users who like gaps, is that i3 is lacking the gaps feature. Is this really a thing? Well just compare the i3 repository with the i3-gaps fork. This fork only contains the missing gaps feature. Its hard to find a comparable fork on GitHub, that have such a major impact on the community, just by adding a tiny feature. Another indicator is the subreddit /r/unixporn with 64k subscribers. Many so called nixers and ricers, that are people who customize their Linux distribution, prefer the i3-gaps variant over stock i3. This leads to the simple question, why is the gaps feature not part of stock i3?
The answer to that question is quiet simple. The creator and maintainer of i3 thinks that gaps are don’t serve any purpose[citation needed]. The reason why many people think otherwise vary, some people like the visual separation of their applications, others simply like the fancy look and some have a specific use case in mind. However, it seems that it is not possible to convince Michael to merge the i3-gaps fork maintained by Airblader.
Now the final conclusion could be, use i3-gaps and be happy with a fork of the most popular and well documented tiling window manager. But i3-gaps has its own problems. First of all it is not official supported, there might be bugs, there might be a problem with the gaps, there might be a missing gaps feature. Bspwm allows for a very fine granulated control over the gaps. They can be set dynamically and each gap (north, east, south, west) can be controlled individually. This can be useful, for instance if you want to have a permanent visual conky on one side of your screen. Not so with i3-gaps, its not supported and Airblader currently has no time to implement it, fair enough. - Lets switch to the second most popular tiling window manager bspwm with native gaps support.
Bspwm is a great alternative to i3-gaps, but unfortunately it lacks documentation. Not only that, the commands change over time, so that every guide and documentation will become outdated sooner or later. This leads to confusion about outdated bspwm configuration syntax, floating around all over the Internet and forcing to constantly update the own set of config, to handle breaking changes. There is a relatively small bspwm Arch Linux Wiki page, that describes some basic features, the README.md
on GitHub and a manpage. Having a manpage is a big plus, but is insufficient without further documentation. The man page is there to describe the command line parameters, but there’s no introduction, no guide, no tutorial and no workflow. Airblader provides some basic config examples, but without description or context.
Bspwm could be much more popular (see Slant screenshot above), if it would provide a good documentation, be it a set of markdown files, a PDF or a website. Therefore I decided to start a simple documentation website for bswpm, that is auto generated from the few available official resources. The advantage of this approach is, that in case of changes, the documentation page can be simply regenerated and therefore always stays up to date. Furthermore I added a corresponding GitHub Issue that addresses this issue.
]]>All major browsers, such as Chrome, Firefox and Safari are capable of exporting the currently visited website from HTML to PDF. This can be done via the print or save as function. However, there is no standard way of doing so on the command-line. Lets build a command-line with the following goals in mind: a small footprint, minimal dependencies, composing existing technology, many options for full control of the PDF generation process and a simple usage on Linux, Mac and Windows. To achieve this, I decided to use the well documented and maintained command-line utility wkhtmltopdf and a Alpine Linux based Docker image, to make it available on all platforms with Docker support.
Building Docker images based on Debian or Ubuntu often results in image sizes of a few hundred megabytes or more. This is a well known problem and therefore many Docker image distributors are also offering an Alpine Linux based Docker image. The Alpine Linux distribution is a very common Docker base distribution, because of its very small size of about 5 MB. After a fast Google search, the wkhtmltopdf package of the official Alpine repositories shows up in the search results. Interestingly though, the given binary size is just about 202 KB. Which would be perfectly fine, if it wasn’t for the dependency list. It contains 7 items, including qt5-qtwebkit. Unfortunately, this alone requires 28 MB (installed size) and Xorg. Not only that, the Xorg server needs to be started in order to use the binary.
Since wkhtmltopdf uses the webkit engine to render PDFs, there is no way around the qt5-qtwebkit dependency. However, it is possible to avoid Xorg. I was able to find a GitHub repository that provides a solution, by compiling a qt-webkit version without the need for Xorg.
Unfortunately, this led to a new problem. The compilation of the whole Qt library including the necessary patches takes about 4 hours on EC2 m1.large in 2016. It would be ok to do so once, but Docker requires you to do so every time you want to build the container, in case that you don’t already have that Docker layer. At first, I thought that I could address that issue by pushing the build to Docker Hub. Docker Hub compiles Dockerfiles and provides a compiled Docker image that can be pulled from their servers. But Docker Hub has a build timeout after 2 hours, so it wasn’t able to finish the build.
Therefore I compiled the Dockerfile on my computer, pushed the binary to the GitHub repository, copied it into the Dockerfile and pushed everything to Docker Hub.
I found that it is now possible to build the patched wkhtmltopdf Alpine binary in Travis CI, although documentation says otherwise:
It is very common for test suites or build scripts to hang. Travis CI has specific time limits for each job, and will stop the build and add an error message to the build log in the following situations:
- When a job produces no log output for 10 minutes.
- When a job on a public repository takes longer than 50 minutes.
- When a job on a private repository takes longer than 120 minutes.
Source: https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts
As you can see in Travis CI the build takes more than one hour, despite being a job on a public repository. Now, being able to build in CI, I removed the binary from the git repository and copied it from the builder image into the Docker image.
]]>