Scramble: R’ L F B2 D2 U R’ D L2 F B’ L2 U R’ L F B2 U2 D2 R2

Annotated solve:
D’ L F B’ D B2 (6; 2×2×2 block)
U’ F’ U2 F’ U’ F2 (6/12; 2×2×3 block)
(F) U F2 L2 F (4/16; EO)
L U2 L2 U’ (L’) (4/20; F2L)
(L) R U2 L’ R’ B’ U F’ U2 B U’ F U’ (12/32; G perm)

U F’ U B’ U2 F U’ B R L U2 R’ U L2 U2 L’ F’ L2 F2 U’ F U F U2 F U B2 D’ B F’ L’ D (32f)

I tried to substitute the G perm with an antisune to get a skeleton (lol, skeleton with 20+ moves), but none of the four possible eight-mover insertions had any move cancellations and they all gave longer solutions. Besides, that’s one nice OCLL skip.


Scramble: R F D U R2 D’ L’ R U B’ L’ U2 L D B U F B’ L’ U2

Annotated solve:
R D’ L F2 D2 F (6; 2×2×2 block)
U’ B L2 U L (5/11; 2×2×3 block)
B’ L U L’ (4/15; extra 2×2 block)
L U L’ U’ (4/19; EO)

B2 U2 B’ U’ B U (B’) (6/25; F2L)
(B U) B’ U B U’ B’ U B U2 B’ U2 (10/35; double Sune)
B2 (1/36; fix pseudoblock)

Solve: R D’ L F2 D2 F U’ B L2 U L B’ L U L’ B2 U2 B’ U’ B U2 B’ U B U’ B’ U B U2 B’ U2 B2 U L U’ L’ (36f)

First time NISS was actually useful for me. Without the second inversion, F2L and 2GLL would’ve taken the same number of moves total, except there wouldn’t be move cancellations to reduce the move count. The bad thing about having 2GLL is that it’s impossible to use a short alg to fix edges and then insert a corner 3-cycle, but on the other hand I know a handful of optimal 2GLL algs. Also, it’s easy to write the moves with my right hand while executing them with my left.

I’ll try doing some more of the past weekly scrambles for practice.

Programming is hard(?)

Like I mentioned before, I’m taking CS1010S Programming Methodology this semester, which is an introductory course on programming taught in Python. It’s part of the program requirements (pun, etc.) for applied maths and statistics majors, but there’re also plenty of random people like myself around who are just taking it for fun.

Side note: I only recently found out that the “evil counterpart” CS1101S Programming Methodology, which has a functional programming focus, is taught in… JavaScript. And yes, it’s officially nicknamed CS1010S’s evil counterpart. Probably understandable, because how the heck would you teach functional programming in JavaScript?

I have about ten years of programming experience if you want to be really generous with what the words “programming” and “experience” mean, but that’s still a gigantic headstart compared to my classmates, most of whom seemingly have had zero programming knowledge.

The course materials for the module are not hosted on IVLE, but on this platform called Coursemology, which does this very buzzword-y thing of trying to gamify the learning process to better engage the students. (Not to be confused with trying to game the learning process, which would have the opposite effect.) I’ve mentioned about the leaderboard feature before, and of course I’m still near the top, though I probably won’t be able to claim first or second place by the end of the semester because I’ve lost too many points to careless mistakes. There’s still a contest for writing a 2048 AI open which I believe I have a high chance of winning (and getting extra points), but it’s not like I can tell what my classmates have implemented. We’ll see. (Edit: the results were already announced during the one lecture I overslept through, and apparently my code was so fantastic I got a bonus in addition to the first-place prize. The lecturer also flashed my smugshot on the projector, which was, uh, extra lols.)

The site also has forums, which were quite active until the post rate mysteriously plummeted about two weeks ago. There, we can start threads to clarify our doubts and to seek help in resolving problems, and, oh dear, the questions they ask. I don’t mean that the questions themselves are bad; it’s just that I hang out mostly with programmers on IRC and I’m a somewhat seasoned programmer myself, so it’s a bit of a culture shock to notice that there actually are a lot of people who aren’t programmers!

It seems that I’ve forgotten that I spent a few years completely struggling with code before I reached my current standard, and in comparison, the CS1010S module is taught over less than four months. No wonder so many people have difficulty keeping up with the pace. Taking this module also opened my eyes as to how hard teaching programming can be. Python is very simple to understand if you don’t touch anything remotely hairy. Like, say, if you restrict yourself to playing with integers. Except you can’t divide, because then you’d get into the territory of floating-point arithmetic, which is a 9 out of 10 on the Hairiness Scale. (It’s not hard to understand per se, but most people don’t even try and spread misconceptions about float arithmetic all the time.)

The very concept of a variable isn’t exactly easy to convey either. We could go with a bottom-up approach in teaching how variables work, e.g. “in Python a variable is really just a pointer to some data”, except Python doesn’t have pointers so now you’re referring to a concept that’s external to the language. It’s a notion that should be immediately familiar to anyone who’s used any mainstream programming language, but the problem is that a lot of people haven’t.

Functions are also seemingly hard for many students to grok. Functions were taught in calculus classes, right? Right?! Well, sort of. In introductory calculus classes, there often isn’t much of a strict distinction between the notion of variables being related by an equation and that of a variable being a function of other variables. They’re sort of interchangeable because we can divine the exact meanings through context, and that’s why you see the derivative of a function f being written as df/dx instead of df(x)/dx even though the former is technically an abuse of notation. It might be because we’re so used to this abuse of notation that, when it’s taken away from us, our feeble minds start breaking down.

It doesn’t help that languages like Visual Basic and MATLAB try to support it by having f = result_of_doing_some_operations_on_the_input syntax, so the students who have programming experience but in the wrong languages might have even more difficulty understanding why mainstream high-level programming languages don’t work in the same way.

The brain damage caused by conventional high-school-level mathematics education to our notions of what functions are has the side effect that understanding “higher-order functions” is a lot harder than it should be. If you think of a function as just a thing that maps things to other things (as in conventional college-level mathematics education), higher-order functions are just the same as ordinary functions! And here, it doesn’t help that “higher-order function” is the disjunction of two related but distinct concepts: a function that takes in a function as input, and a function that returns a function as output. Of course students are confused.

A lot of this post is extrapolated from what I’m seeing people ask about in the forums and during the tutorial sessions, rather than any personal difficulty I’m experiencing with programming or this module, so there’s a reasonable chance that my interpretations are off the mark. I think there’s some interesting research to be done with the course data (especially with the Coursemology platform, which allows tutors to view every single student’s submissions), but I mainly just want to sate my own curiosity.

Ideas about comb sort

Uh, I’m not dead yet. I just kind of forgot to write anything here for a while. There’s a partially written post about prime numbers in the queue, but I noticed a flaw in my arguments and then couldn’t be bothered to fix it up. So anyway.

Let’s talk about comb sort.

Say we have a list A of n distinct elements, and assume there’s a total order on them because you know, we’re talking about sorting algorithms. The indexing is assumed to be zero-based (from 0 to n-1, inclusive).

A comparator is a routine taking in two distinct indices i and j, and swaps the values at these indices if A[i]>A[j]. (Note that we’re keeping this definition even if i>j, in which case the comparator actually “reverses” the values, though we won’t make use of this in this particular blog post.)

A bubble pass is the composition of comparators (l,l+1),(l+1,l+2),(l+2,l+3),\dots,(u-2,u-1),(u-1,u) (applied from left to right), where l is the lower bound and u is the upper bound. Bubble sort is then the composition of bubble passes from (0\to n-1),(0\to n-2),\dots,(0\to1); as an optimisation, if in any bubble pass no swaps are made, the rest of the bubble passes may be skipped. We generalise the concept of a bubble pass to a k-bubble pass, where we have the comparators (l,l+k),(l+1,l+k+1),\dots,(u-k,u).

The complexity of an algorithm will be measured in terms of the number of comparisons or the number of swaps. These will generally be within a constant factor of each other, so we won’t qualify which exactly we’re referring to unless the distinction matters. We don’t consider time complexity per se due to transdichotomous-model-related annoyances.

Bubble sort is known to be Very Bad among sorting algorithms, but we literally just described it in a few lines of simple English above. (The similarly simple insertion sort is still faster, though.) The failure mode of bubble sort is when there are turtles—small elements near the end of the array. In fact, even a single turtle will bring the algorithm to its worst case \Theta(n^2) complexity, and since most random permutations will have some turtles, this means that the average complexity is also \Theta(n^2).

Cocktail sort is a slight modification of bubble sort that alternates the directions of the bubble passes, which effectively kills turtles. Define a reverse bubble pass as the composition of comparators (u-1,u),(u-2,u-1),\dots,(l+1,l+2),(l,l+1). A cocktail sort is then (0\to n-1),(0\gets n-2),(1\to n-2),(1\gets n-1),\dots; it still retains the \Theta(n^2) average/worst case complexity, though the constant factor is reduced. (But insertion sort still beats this!)

Comb sort is another modification of bubble sort which uses k-bubble passes, with the gap k reducing over each pass until it reaches 1, where it switches to bubble sort. The complexity of comb sort depends on the gap values used, and it is known that there exists a sequence of gap values such that comb sort attains \Theta(n(\log n)^2) complexity. One such example is the sequence of 3-smooth numbers, as shown by Pratt some decades ago.

A bidirectional variant of comb sort can also be considered, where we alternate the direction of the bubble passes as in cocktail sort. This sometimes called shaker sort. Shellsort is another similar algorithm where, instead of applying a single bubble pass for each gap value, all the k subarrays [A[0],A[k],A[2k],\dots],[A[1],A[k+1],A[2k+1],\dots],\dots are sorted at each pass. Call these k subarrays the k-subarrays.

Now it’s time for some public shaming: in Bronislava Brejová’s “Analyzing variants of Shellsort”, a method of converting a sequence of Shellsort gaps into a sequence of shaker sort gaps is given, with O(\sqrt{T(n)}) overhead, where T(n) is the complexity of Shellsort with the original gap sequence. By using Pratt’s sequence of 3-smooth numbers, a bound of O(n^{3/2}(\log n)^3) is derived. However, the entire reason the 3-smooth numbers give such a good asymptotic bound for Shellsort is that, by using the 3-smooth numbers as gaps, every k-subarray will have the property that every element is at most one position off from the sorted state, so a single bubble pass suffices to sort it. In other words, comb sort works exactly identically to Shellsort in this case (and shaker sort does redundant reverse bubble passes for a constant factor overhead), so we get the complexity of \Theta(n(\log n)^2).

Being an otherwise respectable publication, some people have quoted this as an example of one of the best worst-case bounds we have for comb sort. Hurfdurf.

On the other hand, people usually take comb sort to mean that the gaps are approximately in a geometric progression. With the 3-smooth numbers, we have \Theta((\log n)^2) gaps to slog through, compared to only \Theta(\log n) gaps if the sequence is approximately a geometric progression. Practically speaking, comb sort with the sequence of 3-smooth numbers tends to be much slower than a sequence of gap values with the gaps decreasing by a factor of about 1.3, even though we have no nontrivial bounds on the complexity for the latter sequence.

Side note: We observe something similar with Shellsort, except that we have solid bounds for more types of sequences. Pratt’s 3-smooth sequence (and similar constructions based on numbers divisible by only a few primes) is asymptotically the best known, but the gap sequence of (3^i-1)/2 (with the asymptotically worse complexity O(n^{4/3})) has better practical performance.

An argument from entropy (see Vitányi’s “Analysis of Sorting Algorithms by Kolmogorov Complexity”) shows that, with p passes, the average complexity of comb sort is \Omega(n^2/2^p). However, this lower bound is nontrivial only when p is smaller than \log_2n, which is not the case with a gap sequence decreasing by a factor smaller than 2. For instance, with a gap sequence decreasing by a factor of 1.3, we would get a lower bound of \Omega(n^2/n^{\log_{1.3}2})=\Omega(n^{-0.642}). Yes, the exponent is negative; this lower bound is completely useless.

Recall that we can always use a standard counting argument to show that any comparison sort must be \Omega(n!)\equiv\Omega(n\log n) complexity on average. There are n! permutations and each comparison gives at most one bit of information, so the average-case complexity must be at least \log_2n!.

In spite of the above, there are many sources that misinterpret the \Omega(n^2/2^p) to mean that comb sort is \Theta(n^2) worst/average case for geometric gaps. It still probably is, but we have no proof of that.

I’ve done my fair share of Googling for relevant papers, but the closest I’ve found was this newsgroup thread giving a construction for sequences that are “near” the worst case for geometric gap sequences. The main useful insight is to note that the comparisons made in every $k$-bubble pass before the final bubble sort is data-agnostic and so can be considered as a component of a sorting network, where the 0-1 principle can be used to simplify analysis. This is definitely an approach worth looking into.

Minimising the size of an algorithm set for a two-alg ZBLL

Some preliminary information: ZBLL is the state of the cube where only the U layer is left to be solved and the edges are oriented (i.e. there’s a cross formed). This can be considered to be a subgroup of 4!2⋅33/2 = 7776 elements, but a common convention is to somehow quotient out the effect of U turns (aka “adjust U face”, AUF), which can be done by considering the subgroup of order 7776/4 = 1944 with the ULF corner fixed (but possibly misoriented). There are other schemes to reduce the number of states to consider, but this is the simplest that still preserves the group structure. (Note: I don’t think there’s a surjective homomorphism from the whole ZBLL group to this subgroup, but we don’t need that anyway.) The number of “looks” is defined as the number of times the solver has to inspect the cube state to decide what algorithm to use, but note that this is not a very precisely defined term as, technically, blindfold solves are “one-look” but involve very many substeps and possibly visualising the cube state mentally.

The ZBLL group is itself a subgroup of the whole last layer group of index 8, as there are eight possible edge orientations. An earlier search by a forum member Tempus led to a set of eighteen algorithms (up to AUF, mirroring and inversion) such that for every state in the LL group, either this state is in the set or there exist two algs such that their composition gives the desired state.

Such a one-look two-alg system provides a reasonable tradeoff between a fully one-look system and a fully two-look system. It retains the relatively low algorithm count needed for a two-look system (a full one-look LL needs over 1200 algs, compared to only 44 for the two-look system CLL/ELL) while allowing for optimisations in choosing different pairs of algs for the same result. Furthermore, it may even require fewer algs than a two-look system for comparable efficiency. However, memorising the optimal choice of alg for any given state is still necessary (so it’s still 1200+ cases to remember), which may be why two-alg systems have not gained much traction.

ZBLL is a much smaller group in comparison, and unsurprisingly also a lot more tractable. About two weeks ago, I discovered a set of 12 algs such that all combinations of two algs (i.e. the square of the set) is exactly the whole ZBLL group. This was inspired by my laziness to learn all the COLLs and all the 2GLLs, so I just combined the Sune cases from either (5 from COLL, 8 from 2GLL, one alg shared between the two) and realised this was enough to span ZBLL. This combination apparently had the lowest known alg count of any two-alg ZBLL system known then (at least, I did my part in reading all the thousand-odd posts in the concept/method/substep thread on’s forum), and more importantly, it doesn’t even require memorising over a hundred cases to be applied; it’s more like only six cases in addition to all the algs. (I have a post written elaborating on this system which I’ll clean up and publish when I have the time to.)

Yes, had the lowest known alg count. More recently, last weekend I wrote a program to test the ZBLL coverage of any given input alg set in ZBLL, and after some experimentation, I found a set of just ten, six of which were heuristically-chosen “seed” algs that I already knew (Sune, another 2GLL, G perm, Buffy, L3C commutator, Bob), and the remaining four were found with (human) trial and error. Is ten minimal? Probably not, but it took me quite a few hours to even improve on an 11-alg set I’d found earlier. (I think an 8-alg set might be possible, in lieu of counting arguments demonstrating otherwise.) This, of course, loses the low case count of the “Sunes” method of the previous paragraph and three of the four non-seed algs are kind of nasty to execute IRL, so it’s probably useless for speedcubing purposes, but it’d be interesting to know what the theoretical limit is.


Everything’s going south again. Totally couldn’t’ve seen that one coming.

The arithmetic derivative

The arithmetic derivative is defined over the nonnegative integers by the following relations:

\displaystyle\forall p\text{ prime}\quad p'=1\\  \forall a,b\in\mathbb Z_{\ge0}\quad(ab)'=a'b+ab'

The second one is effectively the product rule. We can actually assign any value to the derivative of the primes and then we can use the product rule to extend it over all the real numbers, but having p'=1 for all prime p leads to what looks like the nicest arithmetic derivative. One justification is that the product rule gives (p^n)'=np^{n-1}p' (see also: chain rule), so by taking p'=1, we get a simple expression for this special case and the resulting function x\mapsto x' is nontrivial. (Another choice is p'=0, which maps every value to zero. There’s also the option of p'=p, which we might investigate in a later post.)

The arithmetic derivative, unlike the usual differential operator, is not linear: 3'+4'=5\ne1=7'. Or, well, that’s obvious since a linear map on the nonnegative integers must have the form n\mapsto cn for some c\in\mathbb Z_{\ge0}.

The product rule can also be written in terms of the “log derivative”: \tfrac{(ab)'}{ab}=\tfrac{a'}a+\tfrac{b'}b. Which is cool until you realise it’s nothing more than just dividing the product rule equation by ab.

For any positive integer n, we have that n=\prod_pp^{\nu_p(n)}. From the product rule, we derive n'=n\sum_p\nu_p(n)/p. (Pun intended.) Suppose we want to know what the growth rate of the arithmetic derivative is like, but obviously this depends a lot on the number of prime factors and their multiplicities, so we’ll smooth out the fluctuation in values by considering \sum_{k\le n}k'/k instead.

But first, a simple lemma: \sum_p1/(p\log p) converges. This can be proven by pulling in the PNT, but that’s way overkill. It’s been quite a while since I did any nontrivial number theory, so let’s see how far I get before I give up. The sum over integers \sum_k1/(k\log k) diverges, so all we need to show is that the primes are sparse enough.

Consider the binomial coefficient \binom{2n}n for n>1, which is singly divisible by all the primes between n and 2n (exclusive). We have that \binom{2n}n<2^{2n}, so \sum_{n<p<2n}\log p<(\log4)n. The LHS can further be bounded below by (\pi(2n)-\pi(n))\log n, so we get \pi(2n)-\pi(n)<(\log4)\tfrac n{\log n}. This is actually only a constant factor worse than what the PNT gives us, so that’s cool.

\displaystyle\sum_p\frac1{p\log p}=\sum_{k\ge0}\sum_{2^k<p\le2^{k+1}}\frac1{p\log p}\\  =\frac1{2\log2}+\sum_{k\ge1}\sum_{2^k<p<2^{k+1}}\frac1{p\log p}\\  <\frac1{2\log2}+(\log4)\sum_{k\ge1}\frac{2^k}{k\log2}\cdot\frac1{2^k(k\log2)}\\  =\frac1{2\log2}+\frac2{\log2}\sum_{k\ge1}\frac1{k^2}\\  =\frac1{\log2}\left(\frac12+\frac{\pi^2}3\right)\\  <\infty


Getting back to the arithmetic derivative:

\displaystyle\sum_{k\le n}\frac{k'}k=\sum_{k\le n}\sum_{p|k}\frac{\nu_p(k)}p\\  =\sum_p\frac1p\sum_{k\le n,~p|k}\nu_p(k)\\  =\sum_p\frac1p\sum_{1\le j}\left\lfloor\frac n{p^j}\right\rfloor\\  =\sum_p\frac1p\sum_{1\le j\le\log_pn}\left(\frac n{p^j}+O(1)\right)\\  =\sum_p\frac1p\left(\sum_{1\le j}\frac n{p^j}+O(\log_pn)\right)\\  =\sum_p\frac1p\left(\frac n{p-1}+\frac{O(\log n)}{\log p}\right)\\  =\sum_p\frac n{p(p-1)}+O(\log n)\sum_p\frac1{p\log p}\\  =T_0n+O(\log n)

where T_0\approx0.773 is a constant defined as \displaystyle T_0=\sum_p\frac1{p(p-1)}. In other words, roughly speaking, the derivative is T_0 times of the input if we somehow smooth out all the fluctuations. This error bound is probably tight (in that it’s not o(\log n)) because it comes from approximating the floor function with the identity. Notice that the estimate given on the Wikipedia article (which comes from E. J. Barbeau) is the unnecessarily loose T_0n+O(\log n\log\log n), which was brought up on the talk page as well. Funny that because Barbeau wrote the first major paper, everyone’s just citing his results, including his inaccurate approximation of T_0.

Anyway, Wikipedia also has \sum_{k\le n}k'=T_0n^2/2+O(n^{1+\epsilon}), which again is unnecessarily loose. We could follow a similar derivation as above (which was what I initially did), or we could just shortcut it by using summation by parts, which is a Cool Technique for quickly evaluating this type of related sums when the hard work has already been done for one case.

\displaystyle\sum_{k\le n}k'k^a=\sum_{k\le n}\frac{k'}kk^{a+1}=n^{a+1}(T_0n+O(\log n))-(a+1)\sum_{k\le n}(T_0k\cdot k^a)=T_0\frac{n^{a+2}}{a+2}+O(n^{a+1}\log n)

I skipped some steps but you get the idea. The above is valid for all nonnegative real a; for negative a, the series is convergent so there’s no asymptotic growth to speak of. Plug in a=0 and we get \sum_{k\le n}k'=T_0n^2/2+O(n\log n) with barely any additional effort.

That is not how you cite things from the Internet

Here we are, in the glorious twenty-first century, where nearly everyone has internet access yet seemingly no one knows what the world wide web is or how it works.

Let’s get this clear: if you want to cite user-generated content, you don’t cite the website hosting it; you cite the bloody user. The user is the publisher, not the host!

Yet almost every time I flip open the newspapers I get to see marvels like “Source: YouTube” along with a shitty screenshot of an overcompressed video upscaled a thousand times. Or sometimes there’s the equally useless “Source: Twitter” caption.

Maybe I should stop reading MyPaper.


I must’ve missed the memo saying that CM1111 Inorganic Chemistry 1 was only for chemistry majors or something, because CORS didn’t offer it at all. The other chemistry module I could take was CM1131 Physical Chemistry 1, but the minimum bid points for that was higher than the all the bid points I’d been given over three semesters. The line-up ended up being Algebra I, Combinatorics and Graphs I, Ordinary Differential Equations, Programming Methodology, and Japanese 1.

Funny thing though, the Physical Chem 1 class is right before ODE in the same lecture theatre, and I was quite surprised when I saw some NUS High students seated in the LT when I got there. (They’re taking level-3000 modules already?!) Until they all stood up and left as the ODE class started. Uh huh.

I don’t think they’d recognise me as “that strange guy who loitered around hidden corners of the school” because, well, hidden corners are hidden, and it’s also been about four years since I graduated.

Anyhow, this semester’s scheduling is pretty awful, though I guess it could’ve been worse. The Japanese class is all the way over at FASS, which is about twenty minutes walking distance. (My mnemonic says my walking speed is about 100 metres per minute, so that’s 2 km.) It’s not much further away from FoS than UTown is, anyway, so what the heck, I’ll walk.

SoC, however, is even further. Screw that. I mean, I already skipped one recitation session for Programming Methodology because I couldn’t find my way to the freaking seminar room until I was already 25 minutes late, and while I did attend the first tutorial, I’m having this strong inclination to just skip that too because the only way to commute to/from SoC is to take the shuttle bus, and oh god, the shuttle bus. They’re almost always packed like sardines. I hate that. On one hand, the tutorial and recitation sessions are surrounded by one-hour breaks. On the other, do I really want to spend all that time walking? No thanks. I’ll probably just skip the recitation tomorrow as well just for consistency, because, y’know, I’m already acing the homework section. I’m sure I can tank a few percentage points fewer in the attendance grade by maxing out everything else. There’s actually a leaderboard and I’m number 8 out of about 190 right now, so yeah, I’m literally not worrying about it.

English classes are right in the morning at 8 am, and I can’t skip them, so RIP sanity. Or, I could skip them, and then everything would be a repeat of last semester. Sounds like an extremely bad idea. Anyhow, I’m finding myself zoning out during the classes because of insufficient sleep. But poorer life decisions have been made.

Shitposting IRL

I’m taking ES1102 again, and this time I’m going to… idk.

I don’t really care anymore.

Whatever motivation I had in scoring well already died out a few weeks into the previous semester, so now it’s time for the spiral of depression to start again.

Have I mentioned that it’s almost 3 am, the English class is at 8 am, and I have eight hours of classes in total tomorrow? I have no idea how I’m going to pull through that with just three hours of sleep. I probably won’t. Caffeine doesn’t really do much for me, at least not in the form of coffee. All drinking coffee does is make my urine smell like coffee.

If tutorial registration plans go too far south, Tuesdays could end up being packed with eleven hours of classes from 8 am to 8 pm. Unfortunately, I missed the memo on CS1010S’s tutorial timeslots when picking the modules, so things were already going south.


It’s fairly trivial that the product of the gcd and the lcm of two positive integers is equal to the product of those integers, and the easiest way to prove this is to use the unique prime factorisation given by the fundamental theorem of arithmetic.

How can we prove that without resorting to prime factorisation? Preferably, without invoking the notion of primes at all.

For starters, \tfrac{ab}{\gcd(a,b)} is a common multiple of a,b (since \tfrac{ab}{\gcd(a,b)}=a\cdot\tfrac b{\gcd(a,b)}), and so we have \mathrm{lcm}(a,b)|\tfrac{ab}{\gcd(a,b)}. Multiplying by the gcd, we get \gcd(a,b)\cdot\mathrm{lcm}(a,b)|ab.

On the other hand, \tfrac{ab}{\mathrm{lcm}(a,b)} is a common divisor of a,b (since \tfrac{ab}{\mathrm{lcm}(a,b)}=a/\tfrac{\mathrm{lcm}(a,b)}b), so \tfrac{ab}{\mathrm{lcm}(a,b)}|\gcd(a,b). Multiplying by the lcm, we get ab|\gcd(a,b)\cdot\mathrm{lcm}(a,b).

Since a,b were assumed to be positive, the above divisibility statements imply ab=\gcd(a,b)\cdot\mathrm{lcm}(a,b). QED.

Two lemmas were implicitly used above. The first is that the least common multiple is a divisor of every common multiple, and the second is that the greatest common divisor is a multiple of every common divisor. These are also very straightforward to prove through the Euclidean algorithm.


There’re two sets of lights in my room, which we’ll call the left and right lights, with the reference frame being me facing my old laptop’s monitor. As the old laptop had this reflective casing (wtf?), and having the lights reflected directly into my eyeballs is kind of annoying, I pretty much never used the left lights. Of all the people I live with, my maid is the sole person to have realised that I use the right lights 99% of the time.

Also, since the monitor was glossy (wtf?!), I often used the computer in darkness anyway. And then every so often my mother would come in, blabber a bit about how “it’s bad for your eyes” and whatever, and then switch on the left lights before leaving. Every time this happened made me extremely pissed, because (i) it’s patently false that having the unnecessarily bright lights reflected directly into my eyes after they were accustomed to the darkness is any better than using the computer in darkness and (ii) oftentimes, I was about to go to sleep so I had to go switch off the lights anyway. But hey, if she wants to believe she’s doing something useful for her son, whatever; I’ll play along.

Last year I got a new monitor, but some hurfdurfery about Linux graphics drivers left me unable to use it with the old computer, so the situation didn’t really change much. (Unless I used VGA, which does work perfectly with Linux, but for some reason that has lots of ugly horizontal ringing.)

Anyhow, the left lights are roughly neutral, but the right lights were recently changed to be warm/orangey. Needless to say, this also frustrates me plenty. I could use the left lights and be annoyed by reflections, or I could use the right lights and be annoyed by the whole room being orange. The net effect is that I’m unhappy whenever I’m in my room in the evening and not sleeping, and then I spend time thinking about how unhappy I am instead of doing whatever I should be doing.

It’s all rather depressing.

PS: Oh, and apparently I lost the water bottle I got as a souvenir from national service on Friday. Not really sure how that happened since I usually check my surroundings before leaving, but okay. It was a shitty bottle, though still better than not having one at all.


I didn’t want to admit this, but I got addicted to cubing once again, and for just about the whole of July I’ve done roughly nothing of value because I spent a lot of time playing with this cube-shaped toy. So much that my nails are breaking and chipping.

I never was much of a speedcuber, and while getting a better cube has improved my time by a few seconds, I still haven’t managed to break the 30-second barrier consistently. I’ve been assuming that this could be attributed to my fingers being slower than the “average” person’s; after all, I also use the Dvorak keyboard layout and I’m only just on par with the average Qwerty user in the same age group. (I’m a lot faster than people who don’t need to type fast, but that’s not something to brag about.) I don’t know, but that’s not really the point!

My right thumb nail cracked, which was when I switched to playing one-handed, achieving a completely unremarkable average of ~80 seconds. And then my left pinky nail cracked, but I continued anyway. This probably forced a bad form because I was actively trying to avoid using my pinky and the weird posture I adopted is already burnt into my muscle memory.

There’s something ridiculously appealing about solving cubes. With the standard 3×3×3 cube, it takes less than 20 seconds to scramble and I can solve it in under 40 seconds, so a solve-scramble cycle pretty much always takes around a minute. A single solve is not a huge time investment, and it’s easy to fall into the trap of “one more won’t take long, so I might as well go for it”. It’s easy enough to squeeze in a minute here and there, and then for the past few weeks I was carrying the cube with me practically everywhere throughout the house all the time. (Other than the washroom. That’d’ve been unhygienic.) Even if I had to use my right hand for whatever, that left one hand free to cube.

Cubing, however, is not an activity that only involves the hands. It involves attention too, and that meant that I was constantly being distracted from everything else I should’ve been doing. That sucked.

I have a megaminx too, and my most recent timings were about 400 seconds average. I don’t remember if that was better or worse than my timings years ago, but again, that’s not the point! Doing a megaminx solve or a 4×4×4 solve is a considerably larger time investment, not just because the solving takes five to ten times longer, but also because the scrambling takes that much longer too. The difficulty of fairly scrambling a megaminx without a computerised random number generator puts me off from it, which is a bit sad because I feel like I’m done with cubing and I won’t improve any further, so I want to have fun with dodecahedroning (still with the knowledge I won’t even get near world record speeds!), but it’s harder to sink time into that.

I don’t know what to make of this. So I’m sick of cubing, but it’s a “fast” time killer so I do it anyway and I accidentally killed too much time.


Is this a thing? Probably not. It doesn’t seem very useful, because the preceding step would be simultaneously phasing and permuting corners, and I don’t think anyone’s ever bothered generating algs for that. That also corresponds to a coset space of 18 elements, so there wouldn’t be too many cases.

If we fix the corner permutation, 2GLL ∩ ZZLL is a subgroup of order 108. Thanks to symmetry there are a lot fewer cases. (m/n means there are n cases and I know m of them. It’s not a probability, because not all cases are equiprobable.)

Oriented corners (1/27): 3/3 cases.
Sune (8/27): 4/4 cases.
U (4/27): 3/3 cases.
T (4/27): 3/3 cases. (One case is shared with U.)
L (4/27): 2/4 cases.
Double Sune (2/27): 2/3 cases.
Bruno (4/27): 1/3 cases.

So that’s a total of 21 cases (not counting the trivial case), 16 of which I know, and the probability of hitting a known case is 43/54 ≈ 80%. (Update: now I’m at 19 cases out of 21, with the only two I haven’t bothered learning being the only-all-corners-misoriented cases, which also happen to have the longest algs.)

If we look at all the 2GLL cases, the break down is more like:

Oriented corners: 4/4 cases.
Sune: 8/12 cases.
U: 4/7 cases. (p = 1/2)
T: 4/7 cases. (Again, one case shared with U; p = 1/2)
L: 2/8 cases. (p = 1/6)
Double Sune: 4/5 cases. (p = 11/12)
Bruno: 2/7 cases. (p = 1/4)

26 out of 48 nontrivial cases covered, with probability 83/162 ≈ 51% of hitting a known case.

Man, I think it would be pretty fun to tally up all the algs I know. I recently picked up a bunch of COLLs too, but like I said before, some of them are plain useless. Only two out of six of the Sune COLL cases have short and very fast algs, so it’s on average better to just use Sune/Niklas and do PLL. (The diagonal-corner-swap PLLs can always be avoided too, which saves about a third of a move.) I’m not particularly intent on learning the whole set of COLLs because memorising things is hard; of the 33 nontrivial cases, I know 21 of them. Since CP/2GLL and COLL/EPLL are distinct substeps of the last layer, there’s some overlap between the list of COLLs with the list of 2GLLs above.

Counting the PLLs and non-edge-preserving OLLs too, that puts the total number of last layer algs I know at about 60, and most of them are not what I’d use in a speedsolve in the first place. I’m kinda just learning COLL and 2GLL “for fun”, so there’s that, though COLL is sorta useful on larger cubes with the sandwich method.

But anyway, I wanted to mention something about antisymmetry. I think antisymmetry’s pretty cool.

Take the Rune algorithm, for instance. The effect of the move sequence is antisymmetric, but the move sequence itself (R U R’ U R’ U’ R2 U’ R2 U2 R) is not, which is kind of lame. On the other hand, consider this Sune + H perm alg: (R U R’ U’ (R’ U’ R) U (R U’ R’) U R U’ R’). Here, not just the effect is antisymmetric; the move sequence is too!

There’re some other examples, like U M’ U2 M U being the x-axis mirror of its inverse. This is an edge 3-cycle; by conjugating this with F2 or M2, we get an edge 3-cycle in one layer. There’s also Bruno (R U2 R2 U’ R2 U’ R2 U2 R), which is the z-axis mirror of its inverse.

R U2 R2 U2 R’ U2 R U2 R’ U2 R2 U2 R is another example that’s the z-axis mirror of its inverse.

R U R U R U’ R’ U’ R’ U’, an edge 3-cycle, is the (x−y)-axis mirror of its inverse. Incidentally, conjugating this with R U’ gives the standard 2-gen U perm alg, but the resulting move sequence is not antisymmetric.

Niklas (L’ U R U’ L U R’) is the x-axis mirror of its inverse.

F R’ U2 R F’ R’ F U2 F’ R (an L3C/COLL alg) is the (x−z)-axis mirror of its inverse.

This antisymmetry seems to make memorising the alg easier, because it effectively compresses the alg length by half.

I’m apparently really bad at counting cases so some of the values I’ve reported before might be wrong. I don’t count mirrors separately (I just mentally swap the roles of my hands) but most websites do, so it’s hard to corroborate my values with external sources. Grain of salt, etc.

So there’s this double Sune with H perm edges alg I found with Cube Explorer that’s pretty cool. It goes (r U2 r’ U’ r2 R’ U R’ U’ r2 R’ U’ R’), and if you replace all the wide turns with single-layer turns, then this becomes just a double anti-Sune. Alternatively, we can rewrite it as a double anti-Sune with inserted M-slice turns (M’ R U2 M R’ U’ M2 R U R’ U’ M2 R U’ R’).

Executing this alg doesn’t require any regrips at all, which is extra cool. (Same holds true for the usual anti-Sune, actually, and this extends to applying the anti-Sune twice.) We can keep the left hand holding the LF and LB edges, and start with the right thumb on DR, index on UFR, middle on UR. Its inverse, which was what Cube Explorer actually found, does require regrips; this disparity comes from how fingertricking a U’ turn is harder than a U turn with the right index finger.

It could be done with just R and U turns, but that’d need 15 moves instead of only 13.

I am pleased to announce that since I last posted about cubing, my left ring fingernail snapped, my right index fingernail chipped off, and both my left pinky nail and my right thumb nail have cracks along their sides.

Actually, no, I’m not pleased at all. The moral of the story is that speedcubing, stickerless cubes, and long nails do not go together. I’m assuming the presence of stickers actually provides a little bit of cushioning for the nails impacting ABS at high speed because this is the first time my nails are getting worn down and broken so rapidly, but it could just be that I’ve been spending a lot more time on cubing than I’ve ever had before.

PAD, 2015 edition

Sorry if this seems a bit uninspired because I put this off for a whole week and only started this morning, and ended up with nothing I hadn’t already come up with a few months ago.

Remember when I asked about noncontrived series for π with an initial term of 22/7? Well, here’s a not-quite-example.


Obtained from forcing a splitting ratio in transforming the Gregory-Leibniz series so that the first term is 11/14 (22/7 divided by four) and choosing the free parameter so that the quadratic coefficient in the numerator vanishes. It’s certainly contrived! However, it’s rather coincidental that the first two terms under the summation actually cancel each other out (1/(3⋅5⋅7⋅11) = 5/(5⋅7⋅11⋅15)), so we could equally well rewrite the above as


PLL timings, more last layer notes, and some more FMC

Preface: Maybe you’re sick of reading about all this cubing stuff already, but I’m not, so you’ll have to bear with it. Usual programming will resume eventually.

G perm not included because I still can’t immediately recognise them. For the rest, I did ((alg) y)12 or something similar, to avoid “optimisations” when the first and last moves are the same. Times given are for doing the algorithm 12 times (along with 12 cube rotations). I have no idea how large the standard deviation is, so maybe take that to be around 1.5 seconds. (I could do more runs to get the sample standard deviation, but all this speedcubing has been filing my right index fingernail down faster than it grows.)

35.60 U perm, cw, ftm/stm-optimal (F2 U M’ U2 M U F2)
35.18 U perm, ccw, ftm/stm-optimal (F2 U’ M’ U2 M U’ F2)
33.08 U perm, cw, stm-optimal, 〈M,U〉 2-gen (M2 U’ M’ U2 M U’ M2)
35.47 U perm, ccw, stm-optimal, 〈M,U〉 2-gen (M2 U M’ U2 M U M2)
38.77 H perm, stm-optimal, 〈M,U〉 2-gen (M2 U M2 U2 M2 U M2)
43.10 H perm, ftm-optimal including AUF (L R U2 L’ R’ y L’ R’ U2 L R)
45.70 H perm, ftm/stm-optimal excluding AUF (R2 S2 R2 U R2 S2 R2)
44.25 Z perm, qtm-optimal (L F U’ R …)
40.77 Z perm, qtm-optimal (F R U R’ …)
50.63 Z perm, ftm/stm-optimal (M S2 M’ D’ M2 u M2)
41.41 A perm, ccw, ftm/stm-optimal (R2 B2 …)
41.39 A perm, cw, ftm/stm-optimal (R’ F R’ …)
35.16 A perm, ccw, ftm/stm-optimal (l’ R’ D2 …)
36.98 A perm, cw, ftm/stm-optimal (l’ U R’ …)
49.27 E perm (l U’ R’ D …)
45.32 F perm (R’ U R U’ R2 …)
35.84 T perm, ftm-optimal, right-handed, 〈U,R,F〉 3-gen (R U R’ U’ …)
39.03 T perm, ftm-optimal, left-handed, 〈U,L,F〉 3-gen (L’ U’ L U …)
35.81 J perm, unflipped, qtm-optimal, 〈L,U,R〉 3-gen (L’ U R …)
37.95 J perm, flipped, qtm-optimal, 〈L,U,R〉 3-gen (R U’ L’ …)
42.91 J perm, unflipped, qtm/ftm-optimal (B2 L U L’ …)
41.57 R perm, ftm-optimal, UBL/UL block (R U2 R’ U2 …)
40.96 R perm, ftm-optimal, UFL/UL block (R’ U2 R U2 …)
51.40 N perm, right block, ftm-optimal, 〈L,U,R〉 3-gen (R U’ L …)
51.35 N perm, left block, ftm-optimal, 〈L,U,R〉 3-gen (L’ U R’ …)
39.00 V perm, ftm-optimal, (R’ U R’ d’ …)
40.24 Y perm (F R U’ R’ …)

Writing CSS is hard, so deal with this stupid layout. I presume this is quite slow compared to “real” speedcubers, and there are some aberrations like my U perm being the fastest even though it’s not an 〈R,U〉 2-gen algorithm, and all the serious speedcubers seem to swear by 2-gen algorithms. (Note: “2-gen” means it’s in a subgroup generated by two moves.) Cases I know multiple algorithms for have been listed multiple times (U, H, Z, A, T, J) ; there are slight surprises like the standard A perm being slower to execute than my rotated version, but that’s probably because the rotation conjugated with the y-axis rotation I applied after every algorithm simplified to a single z-axis rotation. The net effect is that the average amount of cube rotating in this speed test is less than what a real solve would entail.

(Friendly reminder that in cubing notation, up/down is the y-axis and front/back is the z-axis, just in case Rubik’s cubes weren’t confusing enough.)

Sorting these by speed, it goes like this: U perm (both types), A perm (anticlockwise cycle), J perm (right-handed), T perm, A perm (clockwise cycle), J perm (left-handed), H perm, V perm, Y perm, Z perm, R perm (both types), F perm, E perm, N perm (both types), and finally G perm (all four types).

Well, G perm is only slow now because I still suck at recognising it (and using dumb mnemonics…), but given that it’s the most common PLL (with a 2/9 chance of coming up), I’ll probably get better over time.

The very tiny amount of corner control I exert during corner orientation (by keeping a 2×2 block if it exists) restricts the PLL in special cases to A/J/V, all of which are among the faster ones execution-wise for me, and V also happens to be the rarest. (Probabilities of occurrence are 1/3, 1/3 and 1/6, respectively.) It probably doesn’t make sense to learn all the cases because it’s pretty rare to begin with. Unless I switch to a 2-look/3-look last layer that starts by solving a 2×2 block…

How would that work? I’ve been pondering a bit since I blogged about it. Let’s assume edge orientation is done, as usual. If there are no misoriented corners, go straight to PLL, which can be done in two looks with a bare minimum of {A,U,J,V}, where A, J, V are already needed later on anyway. Otherwise, there is at least one misoriented corner. Picking any such corner as a reference, the two edges that should be adjacent to it can be in 12 possible (relative) locations. This gives a total of 24 cases to learn, or 12 if mirrors aren’t included. Not too big, and I bet most of the cases can be dealt with with a single algorithm applied in different locations.

The next step is a combination of L3C and “L3C plus permutation parity”. The former is well-studied and has 22 cases. As for the latter, the two swapped edges give us a fixed orientation in space, so there are 32⋅3 = 27 cases. If you exclude mirrors, L3C has 12 cases and L3C+parity has 15 cases. (None of the L3C+parity states are symmetrical, except for the V perm and V perm + diagonally misoriented corners; similarly, in L3C, only the two-corner pure rotation states have reflection symmetry.)

To elaborate a bit on splitting PLL into CPLL and EPLL, H/Z can be done with two U perms (apply any U perm, and there will be three wrong edges left). All the diagonal-corner-swap cases can be done with V followed by U; for the Y perm, the only thing to “watch out for” is that this can end with a H perm instead of a U perm if you swap the wrong pair of corners. The adjacent-corner-swap cases can be classified by edge permutation, where F and T have Z-permuted edges (apply any J perm to fix corners), and the rest have U-permuted edges (apply any A perm to fix corners) or correctly-permuted edges (exactly the A perm). (Actually, all the cases in the U-permuted edges category can be handled with the J perm, but the previous sentence gives a one-liner summary that completely avoids special-casing.)

This has net 40 algorithms (not counting mirrors) for a purely 2-look (edge-oriented) last layer. For comparison, OCLL/PLL has net 20 algorithms, also not counting mirrors. Another problem here is probably that recognition of the first half involves finding the two adjacent edges, and since the edges are oriented, their colours won’t be on the U face.

[excised into new post for organisation and readability purposes]

On another note, I had a go at FMC again. (I got sick of failing at BLD.)

Scramble: F2 L2 F2 L2 U B2 D F2 L2 U2 R2 D’ F2 L D L’ D2 L R2 F’ D’ U2
(actually Cube Explorer generated a different scramble, but I screwed up while applying it, so I manually scrambled it some more, then fed it to Cube Explorer to get this)

First attempt (44):
B’ D R2 B’ U2 D2 L2 D (8; block)
U’ L U D’ B’ D (6; edge orientation)
B2 L’ B2 L2 (4; one corner-edge pair)
D B D’ B D B2 D’ (7; edge permutation using Sune)
U’ B’ L2 B’ R’ B L2 B’ R B2 (10; corner 3-cycle)
D’ B2 D U L R B2 L’ R’ (9; H perm)

Second attempt (not counting intermediate attempts that used more moves) (41):
B’ D R2 B’ U2 D2 L2 D (8; block)
U’ L U D’ B’ D (6; edge orientation)
B2 L’ B2 L2 (4; one corner-edge pair)
R B R’ B R B’ R’ B R B2 R’ (11; corner orientation using double Sune)
(D L’ D’ L)3 (12; corner swaps)

Third attempt (31):
R B2 (2; 2×2×2 block with U2 pre-move)
F’ D2 R F’ (4; 2×2×3 block)
R D2 R’ D’ R2 D2 R2 U2 (8; first two layers and undoing premove)
F L2 B’ L’ B L’ F’ (7; edge orientation using wide anti-Sune)
L’ D’ L D’ L D L2 D L2 D2 L’ (11; 2GLL using Rune)

This was actually an unusually good run, though I also exceeded the one-hour limit by quite a lot. I’m running a search for the optimal F2L and Cube Explorer’s getting stuck at depth 14, so 14 moves is actually optimal. I didn’t bother trying to fix the bad edges after making the block, because the F2L pieces happened to magically line up and, well, an optimal F2L happened. The last layer part is coincidentally also 14 moves optimal, so there’s a wastage of four moves in my solution. That said, I exhausted all the two-edge-flipping OLLs I knew and the right wide anti-Sune was the only one that left me with a last layer I already knew how to solve optimally.

I don’t have a “very huge” repertoire of last layer algorithms (the edge 3-cycles, some corner 3-cycles, the PLLs, random short OLLs, random 2GLLs, random COLLs, etc.) so this “brute-force through algorithms until something works” technique tends to fail more often than it works. Petrus’s site has a list of forty short-ish ZBLL algorithms such that all edge-oriented last layer cases can be solved with at most two of these algorithms. Sounds kinda useful, but >memorising 40 algorithms. It took me seven years to get around to memorising all of OCLL and PLL, and that’s only 28 algorithms!

Oh, and while I was trying to practise the Roux method, all the blocks just lined up, I switched back to Petrus, and I got a 30-move solve without even backtracking. That probably wasn’t a good scramble.