Even more scatter plots

Inversive congruential generators modulo 217, with all 216 odd-valued points plotted. The first two are given by x−1+2 and the other two by x−1+45678, where the first/third plots have lag 1 and the second/fourth plots have lag 2. (Edit: actually, I just noticed I forgot to comment out x ^= x>>>1, which provides a little bit of mixing, so these plots are not exactly of the ICG, but of the ICG conjugated with this xor-shift.)

ICGs have, in some sense, an optimal lack of lattice structure when the modulus is prime, but obviously that doesn’t carry over to prime power moduli. This statement apparently applies to both recursive ICGs (defined by a transition function x_{n+1}=f(x_n)=ax_n^{-1}+b) and to explicit ICGs (defined by x_n=(an+b)^{-1}).

We can prove that, for a recursive ICG modulo 2^m\ge8 with the transition function f(x)=ax^{-1}+b, it attains the maximum period of \phi(2^m)=2^{m-1} if and only if a\equiv1\pmod4 and b\equiv2\pmod4. The “only if” part is trivial. The rest will be proven by induction (boilerplate: “suppose this holds for some m blah blah blah”).

Lemma: 1 has exactly four square roots modulo 2m+1, namely, ±1 and 2m±1.
Proof: That the four given values are square roots is easily verified. Let x be some square root of 1 not congruent to ±1, and let x=a2^b\pm1 such that a is an odd integer and b\ge2. x^2=a^22^{2b}\pm a2^{b+1}+1\equiv1\implies 2^{b+1}\equiv0\implies b=m.

Since reciprocation gives a perfect matching among the 2^m-4 odd non-unity-square-roots, considering the 2^{m-1}-2 odd numbers between 3 and 2^m-3 (inclusive), an even quantity of these must map within the same set (i.e. they don’t toggle the high bit), and so the remaining numbers (which toggle the high bit) must be even in quantity as well. By the bijectivity of reciprocation, we similarly have that an even number of odd numbers with the high bit set toggle the high bit under reciprocation.

Using the same trick as before of summing the overflow over the period of m bits, we note that the overflow from reciprocation cancels itself out, as does overflow from the high bit of b and overflow from multiplying by a, which leaves overflow from adding b, but we already know that that’s 2^{m-1}\cdot2\pmod{2^{m-1}\cdot4}, or equivalently, 2^m\pmod{2^{m+1}}, which is exactly the same as saying that the high bit is toggled every 2^{m-1} iterations, thereby showing that the maximum period of 2^m is attained with the given conditions.


With “small” values of b, the ICG seems to have bad structure, which might be explainable with the help of Laurent series. For instance, we have 1/(x+2)=1/x(1-2/x+4/x^2-8/x^3+\cdots), which is actually a finite series because 2 is nilpotent. Or maybe it makes more sense to defer to the fact that the transition function is a fractional linear transformation and do something about that.

Reciprocation

Let’s say you want to find the multiplicative inverse of an odd number modulo some power of two. We’ve covered this before in the post about summing 2-adic series for π.

Start with a 1, then apply Newton’s method for division. It’s fast; it works.

But we can micro-optimise it. If we start with a 1 (as I did in the code snippet in the earlier blog post), the first iterate gives 2−x, and we can check that this guarantees only two correct bits. (Id est it is correct modulo 4, but not modulo 8.) So we’re using one subtraction to get to two bits of precision, but that turns out to be one more subtraction than necessary.

Behold: x alone already has three correct bits, because an odd square is always 1 mod 8. If we use this, we save on one subtraction and roughly 1.5 iterations (exact value being log2(3)), for a net saving of 2.5 subtractions and 3 multiplications over starting with 1.

In the event that the number of bits is fixed and known in advance, we can hardcode the number of Newton’s method iterations necessary, and since the number of bits is usually itself a power of two, we save only one iteration rather than two with the above optimisation. For a 32-bit inverse (where the least significant bit is 1), four iterations are needed, which gives a total cost of 8 multiplications and 4 subtractions. Without the optimisation, five iterations would be needed, and assuming we also hardcode the first iterate, it’d be a total cost of 8 multiplications and 5 subtractions.

Well, that’s not very impressive, but that’s why it’s called a micro-optimisation, right?

I did a brute-force search over cubic polynomials with small coefficients providing more than six bits of precision, and found just (16n−6)x+(7−16n)x3, where n is any integer. This gives seven bits, but going from six bits to seven is not enough of a win to justify having an extra multiply.

More scatter plots

First one is RANDU, second one is x\mapsto (65539x)\oplus123456789, third one is from weaving their bits. These have 12000 points, are mod 232, and have been projected onto the zx-plane (akin to decimating by a factor of 2). All three suck as random number generators, just in somewhat different ways.

No, the pictures haven’t been mislabelled. You’re really seeing that mul-xor has much worse structure in two dimensions than the infamously bad LCG that is RANDU. So much for the idea that nonlinearity is good. To make things more confusing, the strong autocorrelation for the mul-xor generator seems to happen only with lag 2; lag 1 or any larger lag looks fine. I don’t get what’s going on. Decimating by a factor of 3 (or larger) kills the obvious structure, but obviously also slows it down by the same factor, so it might be better to just use another PRNG.

One thing to note is that unlike LCGs, the set of mul-xor functions is not closed under composition, so chaining them can actually improve the “randomness” quality. With LCGs, we know that affine functions are closed under composition, so decimating an LCG with a well-chosen multiplier would actually result in a worse multiplier.

There’ll probably be more of this

How many days have I spent fooling around with this shit?

xorqcg 3d scatter plot

3D scatter plots for a vanilla QCG (given by 2x^2+3x+3) and an xor-perturbed QCG (given by (x\oplus1329)((2x)\oplus7423)). These are modulo 8192 and each plot contains all 8192 triples of points.

The renderer was hacked up from my earlier 3D figure rendering code, and since that didn’t properly support rendering things by depth, adding axes to the scatter plots was out of the question. I mean, it probably could be done, but what the hell. Because the whole point of making the scatter plots was to visually inspect linear relations, perspective foreshortening was commented out, but since this made figuring out which side was front and which was back impossible, the dots are scaled by distance. So this ended up being a sorta-perspective-but-not-really projection.

Anyway, going through the scatter plot for the QCG revealed many sets of many collinear points, which is a gigantic red flag, and this weakness persists even when decimating the generated sequence by a factor of 3 or 5. (Here, “decimating” by a factor of n means to keep only every nth term.) This is probably related to the use of a tiny quadratic coefficient. On the other hand, merely choosing a quadratic coefficient of, say, 128 wouldn’t be useful; if the quadratic coefficient is divisible by a large power of 2, the points tend to end up lying in a much smaller number of lines than expected.

So maybe if you want to inject lots of nonlinearity into a QCG, what you should do is to use a singly even quadratic coefficient. Emphasis on “maybe”.

I don’t know how relevant or irrelevant these results are when scaled up to more reasonable moduli. The main reasons the scatter plots were only done with 13 bits were, firstly, that rendering 16k points on canvas wasn’t something that my computer could handle in realtime, and secondly, that even if it could render 16k points in realtime, it’s too cluttered to see any structure in it. After all, even bad PRNGs become good if you throw enough state at them, so if we want to find defects in a certain class of PRNGs, we should look at versions with less state than usual.

Maybe a fruitful approach to finding good parameters would be to test for avalanche; or slightly more generally, by treating the recurrence function as a hash function of sorts and applying the standard hash function tests for it. Then again, the only hash function test I know is avalanche, so mentioning that was kind of pointless.

More on congruential generators

Previously, we had an inductive proof that, if a\equiv3\pmod4 and b\equiv1\pmod2, then x\mapsto ax\oplus b has full period modulo 2^m\ge4, and also mentioned in passing that a similar proof can be done for linear congruential generators. I won’t bother presenting that; that’s left as an exercise for the reader. (Of course, the real reason is that I didn’t bother to fully work it out. Merely knowing that it can be done is enough for me!)

There’s this really amazing fact I just realised, which is that the above proofs rely only on the fact that the low bits are full period, without exploiting any other structure in it, so we can use an arbitrary bit mask to combine any two full-period congruential generators of either the mul-xor kind or the linear kind to get another one. In fact, by a parity argument, this is true for congruential generators (mod 2^m) in general.

Why is this such a powerful result? It means we can immediately improve lousy LCGs simply by combining them with another! Consider a perturbed RANDU with given by x\mapsto 65539(x\oplus b). By picking two sufficiently different values of b and interleaving the bits of the two generators, the “15 planes” are completely broken apart. In C, something like:

uint32_t f(uint32_t x)
{
    uint32_t a = 65539*(x^1);
    uint32_t b = 65539*(x^123456789);
    return (a & 0x55555555) | (b & 0xaaaaaaaa);
}

It’s RANDU, but a bit less bad. (Also, mod 232 instead of mod 231.)


RANDU mod 232 consistently gets p-values below 10−11 on the permutations test with two million 5-permutations tested. Not unexpectedly, the b=1 perturbation doesn’t affect the p-value much. What is surprising is that using any large-ish value of b seems to have the effect of making RANDU worse wrt the permutations test rather than improving it. b=123456789 actually drops the p-value to somewhere around 10−300, and b=970201 drops it to around 10−2400. The combined generator given above gets to around 10−560, which is somehow worse than its constituents. (These values were extrapolated from lower sample sizes because they’re beyond what double precision arithmetic can handle.)

You win a little in not having a lattice structure, you lose very big somewhere else. Oops.

I’m not sure I understand what’s going on here. Maybe there’s something wrong with my incomplete gamma function calculation code. It behaves weirdly for extreme values, which is probably what’s happening. Incidentally, the Wikipedia article suggests to use the continued fraction to evaluate the lower incomplete gamma function, but it turned out to be both more annoying to implement and less numerically stable than the series representation. For the upper incomplete gamma function (which is used for getting the p-values above), my code evaluates directly with a continued fraction if z<s, falling back to the lower incomplete gamma otherwise.

Also, fun fact: my code runs into overflow issues when trying to do the permutations test for 6-permutations, because \Gamma(\tfrac{6!-1}2) is just too large and everything ends up being infinity divided by infinity. Not sure if this can be worked around somehow.

Bleh

[render: mul-xor spectral test]

Been fooling around a bit with the pseudo-LCG. Or maybe I should call it xormul or something else with fewer syllables. Anyway, I was curious about how it’d fare in the spectral test, so I picked a particularly bad multiplier just to make things obvious. The above pic uses a multiplier −385 modulo 216. Just for comparison, here’s what a comparable LCG’s 3D plot looks like with exactly the same rotation matrix applied.

[render: LCG spectral test]

The nonlinearity of xor causes some “smearing”, and while it doesn’t perfectly remove the lattice structure entirely (it’s still approximately linear in \mathbb Z_{2^m}), you have to keep in mind that the multiplier used in this example is kinda pathological. Then again, with better multipliers, even the LCG doesn’t have immediately visible defects, so this seems to be a significant improvement only in the awkward region where the LCG is bad, but not too bad.

In other news, I have figured out a proof for the exact conditions for an xormul generator to have full period, following the same idea of keeping track of the overflowing bit. In fact, we can also use this idea to get an alternative proof for the LCG’s conditions, but I guess I’ll leave that for another time.

Consider things modulo 2^{m+1}, and for any integer n in [0,2^{m+1}), we can define n_l as the low m bits of n and n_h as the high bit of n. More precisely, n_l:=n\,\bmod\,2^m and n_h:=\lfloor n/2^m\rfloor. Likewise, let f_l:=f\,\bmod\,2^m and f_h:=\lfloor f/2^m\rfloor.

Let f(x)=(ax)\oplus b where a\equiv3\pmod4 and b\equiv1\pmod2, and suppose that f\,\bmod\,2^m has a full period.

{f(x_l+x_h2^m)=(ax_l+ax_h2^m)\oplus b=(ax_l+x_h2^m)\oplus b=(ax_l)\oplus b\oplus(x_h2^m)=f(x_l)\oplus(x_h2^m)=f_l(x_l)\oplus(x_h2^m)\oplus(f_h(x_l)2^m)}

\displaystyle f^n(x)=f^n(x_l+x_h2^m)=f_l^n(x_l)+\left(x_h+\sum_{k=0}^{n-1}f_h(f_l^k(x_l))\right)2^m

This gives us a way to consider just the values of the high bit. In particular, subbing in n=2^m and assuming x_h=0, we get:

\displaystyle f^{2^m}(x)=f_l^{2^m}(x)+\sum_{k=0}^{2^m-1}f_h(f_l^k(x_l))2^m

By the inductive hypothesis, k\mapsto f_l^k(x_l) is a permutation on [0,2^m), and since f_h(x)=(ax)_h\oplus b_h, we can rearrange the terms of the sum to get:

\displaystyle{(f^{2^m}(x))_h2^m=\sum_{k=0}^{2^m-1}(ak)_h2^m=\sum_{k=0}^{2^m-1}(ak-(ak)_l)=\sum_{k=0}^{2^m-1}(a-1)k=\frac{a-1}22^m(2^m-1)=2^m}

Noting that (a-1)/2 and 2^m-1 are odd integers, the high bit is toggled every 2^m terms, so f has full period modulo 2^{m+1}. Therefore, by induction, the conditions a\equiv3\pmod4 and b\equiv1\pmod2 are necessary and sufficient.

PS: I’ve been rather loose with the notation, keeping the domains implicit to avoid having to type \mathbb Z_{2^m} all the time, though keeping track of whether an operation uses integer or \mathbb Z_{2^m} semantics shouldn’t be too hard.

Xor-add

In 2008, I came up with the “genius” idea of making a PRNG out of x_{n+1}=(x\oplus a)+b.

Basically an XORADD RNG generates numbers by xoring the current state with a value, then adding it with another value. Simple, and well, doesn’t hold up very well to randomness tests, in it’s purest form, because like polynomial congruential generators with full period with a base divisible by 2, the parity alternates.

(Yeah, I used “it’s” instead of “its”. I know better now.)

In the previous post, we contemplated the pseudo-LCG given by the recurrence x_{n+1}=(ax)\oplus b, where we proved some necessary conditions for this to have full period, with the conjecture that those conditions were sufficient as well. That turned out to be too hard, so let’s get to something ever simpler.

The necessary-and-sufficient conditions for xor-add to have full period for modulus 2^m\ge4 are a\equiv0\pmod2 and b\equiv1\pmod2. Necessity is trivial (just brute-force the other three possible combinations of values of a,b mod 2).

Sufficiency is trivial for m=2. We will now proceed by induction. Suppose the aforementioned conditions are sufficient for some m\ge1. Let a,b\in[0,2^{m+1}-1] satisfying those conditions. These parameters give a full-period generator modulo 2^{m+1} iff x_{2^m}=x_0\oplus2^m. (Note that this can be quantified equally well by “for all x_0” or by “there exists x_0“.)

The first observation we can make is that (a,b) and (a+2^m,b+2^m) generate identical sequences, because the highest bit gets toggled twice. In fact, what we should do is to focus on how many times the high bit gets toggled over 2^m iterations.

There are three sources of high bit toggling we have to take into account: from addition overflow in the low m bits, from the high bit of a, and from the high bit of b. The latter two sources are irrelevant because they will each toggle either 0 or 2^m times, resulting in no net toggling. By the inductive hypothesis, the low m bits cover all values from 0 to 2^m-1, so overflow of the low m bits happens exactly a\,\bmod\,2^m times, and since a is odd, a\,\bmod\,2^m is also odd, so we have that x_{2^m}=x_0\oplus 2^m, just as we needed to show.

By induction, the stated conditions are sufficient for all m\ge2.


Some conjectures (assuming the modulus is a sufficiently large power of 2):

  • (ax)\oplus b: full period iff a\equiv3\pmod4\text{ and }b\equiv1\pmod2
  • (x\oplus a)((bx)\oplus c): full period iff a\equiv1\pmod2\text{ and }b\equiv0\pmod2\text{ and }c\equiv3\pmod4 (note that b=0 gives a conjugated version of the above)
  • (x\oplus a)(bx+c): full period iff a\equiv1\pmod2\text{ and }b\equiv0\pmod2\text{ and }c\equiv3\pmod4
  • (x+a)((bx)\oplus c): full period iff a\equiv1\pmod2\text{ and }b\equiv0\pmod2\text{ and }c\equiv1\pmod4 (note that b=0 reduces this to a quadratic congruential generator)

Is checking overflow the way to go for proving these too?

Previously, previously.

More little observations

Let’s start small here. Consider the PRNG defined by the following recurrence, where a,b are some constants and arithmetic is done modulo 2^m. Bitwise xor is represented with a circled plus, as is standard in the literature.

f(x):=(ax)\oplus b\\x_{n+1}=f(x_n)

We’ve previously established that for this PRNG to have full period (i.e. a period of 2^m), a\equiv3\pmod4 and b\equiv1\pmod2 are necessary conditions. Empirically, these seem to be sufficient too, but how would we go about proving this?

Constant multiplication is linear as is constant xor, but they’re linear over different rings so the composition of these two operations is nonlinear in both \mathbb Z_{2^m} and \mathbb Z_2^m.

We’ve already solved the b=\pm1 case (see hyperlink above), where this “pseudo-LCG” is effectively a perturbed LCG, but to be more precise:

If b=-1, then f(x)=(-a)x+(-1), which is an LCG, so it has full period iff -a\equiv1\pmod4 iff a\equiv3\pmod4.

If b=1 and x_k\equiv0\pmod2, then if a\equiv3\pmod4, f(f(x_k))=(a((ax_k)\oplus1))\oplus1=(a(ax_k+1))\oplus1=a^2x_k+a-1. This is again an LCG, where by a change of variables to y=x/2 (modulo 2^{m-1}), we see that f\circ f has period 2^{m-1} and so f has period 2^m.

Non-starters: multiplication does not distribute over xor, so there’s no (obvious) way to use a change of variables to simplify the problem to the b=\pm1 case, which we’ve already solved.

Actually, let’s review how to prove the full period conditions for an LCG. Take g(x):=ax+b as the defining function, where again we work modulo 2^m. We have as a lemma that a congruential generator must have a power-of-two period. By considering the case of m=2 (i.e. only the two least significant bits), we get a\equiv1\pmod4 and b\equiv1\pmod2 as necessary conditions for all m\ge2. We will now prove that this is also sufficient by a method similar to the above. Considering g\circ g, we get g\circ g(x)=a^2x+b(1+a), where we still have the linear coefficient satisfying a^2\equiv1\pmod4, but the constant coefficient satisfies b(1+a)\equiv2\pmod4. Dividing the even-valued subsequence of generated numbers from the LCG by two gives an LCG with modulus 2^{m-1}, linear coefficient a^2 and constant coefficient b(1+a)/2, where we can now apply copious amounts of induction to prove our claim.

And actually, since we didn’t clearly define what a “congruential generator” is, we should do that. (Treat this paragraph as a self-contained section.) For a given modulus M, a function f:\mathbb Z_M\to\mathbb Z_M is said to be congruential iff, for all positive d|M and x\in\mathbb Z_M, we have that f(x\,\bmod\,d)\equiv f(x)\pmod d. A congruential generator is then derived from the n-fold composition of f applied to a seed value. (In particular, if M is a power of two, this means that high bits cannot touch low bits; if M is prime, then any function f:\mathbb Z_M\to\mathbb Z_M would trivially be congruential.)

Is there a way to generalise this proof for LCGs to the pseudo-LCG? I probably should spend a couple of hours working it out on paper instead of trying to do it in my head while facing a computer and getting distracted by everything.


For a bijection f:\mathbb Z_M\to\mathbb Z_M chosen at random, we have the nice result that the mean cycle length over all the elements is (M+1)/2. However, if M is a power of two and we condition f on being congruential in the sense defined earlier, then we get a considerably smaller value. It’s a bit “late” so my reasoning ability is shot; enjoy this entirely heuristic argument.

Each of the m:=\log_2M bits of the generator is equally likely to double the cycle length as it is to leave it alone, so the distribution of the binary log of the cycle length over all elements in all possible congruential bijections is (probably) a binomial distribution. Ergo, the expected cycle length is given by

\displaystyle\frac1{2^m}\sum_{k=0}^m\binom mk2^k=\frac{3^m}{2^m}=M^{\log_2(3/2)}\approx M^{0.58}.

This is kind of awful, but at the same time it looks like creating a power-of-two-modulus congruential generator out of addition, multiplication and xor always gives a period of the form 2^{m-a} for some a that possibly depends on the seed, but not on m.

Some little PRNG observations

Just a very minor sort of remark to make about a very simple LFG. Take the congruential recurrence relation x_n\equiv x_{n-2}+x_{n-3}+x_{n-4}+x_{n-5} modulo 2^m. The polynomial 1-x^2-x^3-x^4-x^5 is primitive in \rm{GF}(2^m), so this attains the maximal period for an LFG, (2^5-1)(2^{m-1}).

Obviously, we don’t expect an LFG with this little state to be good, and some paper I was skimming through mentioned that for a four-tap LFG to be “good”, the offset should be at least two orders of larger. This paper also mentions that a two-tap LFG is always bad, so let’s take a moment to look into that. (Actually, I misread the paper and it was referring to LFSRs, not LFGs, but the theory is close enough anyhow.)

For a two-tap LFG defined by a recurrence relation x_n\equiv x_{n-j}+x_{n-k} (where we assume k>j), we have a linear relation between three variables. Wait, that sounded a bit too trivial. How about this: for three variables x_n,x_{n-j},x_{n-k}, assuming that they happen to be pairwise distinct, you’d expect that all 3!=6 orderings are equally likely. However, only four of these six orderings are possible; x_{n-j}<x_n<x_{n-k} and x_{n-k}<x_n<x_{n-j} are impossible. (The easiest way to prove this is to rescale the problem to modulo 1, generalise to real numbers, and then split into cases.)

Even if the low bits are chopped off (a common trick to improve LCGs), the high bits will still approximately satisfy the linear relation and the ordering probabilities will still be as biased. Of course, it's (trivially) true that for a two-tap LFG, there will be no deducible relationship without looking at at least k+1 elements, so picking a very large value of k mitigates this somewhat.

This is the permutations test, a version of which is also in Marsaglia's DIEHARD suite of PRNG tests. Basically, generate a large number of n-tuples of pseudorandom numbers, compute the ordering for each of these n-tuples, then check whether the distribution of orderings is close to uniform. Incidentally, back in 2009, I mentioned that this test is “the single, most effective killer of LCGs” but what do you know, I was (probably) full of shit. In 2013, when I was slacking around in the office, I wrote an LCG in JavaScript with modulus 216 (or was it 224) that didn’t manage to conclusively fail a hastily-coded version of the permutations test for 3, 4, 5 or 6 elements. Maybe it’s just because I eyeballed the chi-squared statistics rather than computing p-values and doing a KS test. (These things were not things I knew how to derive on paper and I sure as hell don’t have a table of KS statistic values memorised.)

… Anyway. As would be expected, our toy 4-tap LFG passes the 5-permutation test, since there is no linear relation for a window of fewer than six elements. However, for 6-permutations, there are deviations of around 15% (order-of-magnitude estimate), and for 7-permutations, 1680 of the 7!=5040 permutations are outright impossible. This is like, really bad, but we shouldn’t have expected more of it in the first place, given that the sixth output sample is exactly determined by the previous five. It is, however, interesting to note that even though the 6-permutation distribution is definitely nonuniform, it’s not so skewed that any of the 720 6-permutations are impossible.


Time and again I come to the not-really-a-problem of wanting a simple PRNG in JavaScript that is deterministic, and I’ve used a whole lot of different solutions for this over the years because I’m this much of an indecisive person. And every time I reach this stage I end up doing a little bit of research into PRNGs because hey, why not?

LCGs are bad, but at the same time, they’re also known to be bad. You can’t just use a nothing-up-my-sleeve number as a parameter because it turns out that most parameters are bad. On the other hand, it’s easy to remember what the conditions for a full-period LCG are, so I could easily type out, say, x=(x*3456789+1)&0xfffffff and get something that works. Oh, and because this is JS we’re talking about and Math.imul only became a thing somewhat recently, we have to make sure our multiplies fit in 53 bits.

We also have LFGs, which are faster and generally less bad, but require multiple variables of state and have subtle correlation problems that can appear unexpectedly. Also, it’s hard to remember primitive polynomials. Off the top of my head I can only name 1+x+x^2 and 1+x^{24}+x^{55}. On the other hand, it’s not too difficult to check whether a polynomial is primitive. A sorta inefficient way is to completely ignore characteristic polynomials and search directly for a full-period LFSR by matrix exponentiation in GF(2). Testing a polynomial would then take about O(k^4\omega(2^k-1)), where \omega is the number of distinct prime factors. Choosing a Mersenne prime here would speed things up.

Then we have QCGs, the full-period conditions for which are a bit harder to remember (and a lot harder to derive), but it’s possible to use the LCG conditions as a mnemonic since LCGs are a special case of QCGs. So we could have, say, x=((((2*x*x)&0x3ffffff)+3*x+1234567)&0x3ffffff) as an implementation of x_{n+1}\equiv2x_n^2+3x_n+1234567. Again, 53-bit arithmetic limitation, so here we have a 26-bit QCG and we have to do modular reduction twice to make sure everything is in range. In LCGs the constant coefficient can be reduced to 1 (or any other odd number) via a change of variables and so is practically useless, but this is apparently not true for QCGs. At the very least, even if such a change of variables to reduce the constant coefficient exists, it must affect the linear coefficient, and since we need the linear coefficient to be small-ish (to avoid overflow when multiplying), we might as well pick a large-ish constant coefficient instead.

I’ve mentioned the Coveyou generator before, which is effectively a 4x^2+5x+1 QCG after a change of variables. The coefficients are small, so it takes around \log_2 m iterates for it to recover from nonrandomness where 2^m is the modulus. This is not really a big deal (it’s a double-log in the modulus!), but it’s easily improved by adding a single constant. In JS, this could be written as x=((4*x+1)*(x+1))&0x1ffffff (25 bits) or x=(x*(x+1))&0x3ffffff (effectively 24 bits). I’ve not actually used the Coveyou generator anywhere, but I guess this could be for reference. (Some other simple QCG recurrences along the same lines are 2x^2+3x+1=(2x+1)(x+1) (full period) and (x+1)(x+2) (half period), which are equivalent up to change of variables.)

For slightly more overkill scenarios, there’s also RC4, which I’ve written maybe twice or thrice. (Code reuse? What’s that?) I don’t have the algorithm memorised, but I do remember that it involves a state consisting of a counter and a 256-permutation updated at every iteration. Anyway, RC4 is pretty simple (it’s only a few lines of code), but it’s not memorising-friendly (it’s more than just one line of code) and initialising the permutation itself takes multiple lines of code. One trick I’ve used was to use the output of a full-period mod-256 LCG as the permutation, where the parameters of the LCG are derived from the seed.

Another one on the list is Bob Jenkin’s “small noncryptographic PRNG” (which apparently has no other name), which I used for benchmarking my rotate4 solver on lots of random states. Generating random permutations requires lots of random bits (\Theta(n\log n)), which rules out generators with only ~32 bits of state. I didn’t feel like using RC4 then, and I didn’t want to risk LFG’s correlations significantly biasing the permutation distribution, which was why I chose this “small PRNG”. (A pretty ad hoc rationale, sure, but it worked.)

Something that might be worth looking into would be modified LCGs/QCGs where some additions are replaced with xor. I recall reading a paper about that, though (iirc) it came to no useful conclusion.

Previously.

???

Probability exam was easier than expected, and Geometry was much easier than expected. (I got stuck on one question though; didn’t manage to see the self-intersecting cyclic quadrilateral until I set up the thing in GeoGebra and dragged stuff around until it was no longer self-intersecting.) But anyway.

Suppose you have a standard deck of 52 cards, dealt to four hands of 13 cards each. What are the statistics like on the number of pairs within each hand? (Just to be clear, here a pair refers to two cards of the same numerical value not belonging to another pair; four cards of equal numerical value count as two pairs (not six), and three cards of equal numerical value count as only one pair.)

By symmetry, the mean and variance of the number of pairs are equal for all four hands. The covariance between any two distinct hands should also be choice-independent. Because I’m too lazy to work it out right now, it’s time for some Monte Carlo magic.

import random
from collections import Counter

def count_pairs(l):
    a = Counter(l)
    pairs = 0
    for e in a:
        pairs += a[e]//2
    return pairs

def calc_stats(n=1000000):
    s_x = [0,0,0,0]
    s_xy = [[0,0,0,0],[0,0,0,0],[0,0,0,0],[0,0,0,0]]
    cards = list(range(13))*4
    for i in range(n):
        random.shuffle(cards)
        counts = [count_pairs(cards[j:j+13]) for j in range(0,52,13)]
        for j in range(4):
            s_x[j] += counts[j]
            for k in range(4):
                s_xy[j][k] += counts[j]*counts[k]
    print('s_x  = %s\ns_xy = %s'%(str(s_x),str(s_xy)))
    E = [s_x[i]/n for i in (0,1,2,3)]
    mean = sum(E)/4
    covmat = [[(s_xy[i][j]/n-E[i]*E[j])*n/(n-1) for j in range(4)] for i in range(4)]
    var = 0
    cov = 0
    for i in range(4):
        var += covmat[i][i]/4
        for j in range(i):
            cov += covmat[i][j]/6
    print('mean = %f\nvar  = %f\ncov  = %f'%(mean,var,cov))

This gives the following output. (The line breaks in the s_xy line are added for readability and are not part of the program output.)

>>> calc_stats(1000000)
s_x  = [3380736, 3379025, 3381239, 3378845]
s_xy = [[12219206, 11435312, 11443571, 11433551],
        [11435312, 12208265, 11437240, 11429323],
        [11443571, 11437240, 12226235, 11436153],
        [11433551, 11429323, 11436153, 12208325]]
mean = 3.379961
var  = 0.791369
cov  = 0.011721

The covariance is rather tiny, but is likely a small positive number rather than exactly 0; the correlation coefficient works out to be 0.015. Now that we have some “experimental” data, we should probably work out the exact values. And there are 39 cases.

Thirty-nine.

Working it out is looking like not so good an idea now, but since I’ve already enumerated all the cases, I might as well just do it, right?

No pair (one case)
All different: 413=67108864

1 pair (two cases)
*: 3925868544
One triplet: 3598712832

2 pairs (four cases)
*: 40485519360
One triplet: 40485519360
Two triplets: 6747586560
One quadruplet: 749731840

3 pairs (six cases)
*: 121456558080
One triplet: 106274488320
Two triplets: 21254897664
Three triplets: 984023040
One quadruplet: 5060689920
One quadruplet + one triplet: 1180827648

4 pairs (nine cases)
*: 119558799360
One triplet: 79705866240
Two triplets: 13284311040
Three triplets: 632586240
Four triplets: 6589440
One quadruplet: 7970586624
One quadruplet + one triplet: 2214051840
One quadruplet + two triplets: 105431040
Two quadruplets: 36900864

5 pairs (ten cases)
*: 35867639808
One triplet: 14944849920
Two triplets: 1423319040
Three triplets: 29652480
One quadruplet: 3321077760
One quadruplet + one triplet: 711659520
One quadruplet + two triplets: 29652480
One quadruplet + three triplets: 183040
Two quadruplets: 39536640
Two quadruplets + one triplet: 2471040

6 pairs (seven cases)
*: 2241727488
One triplet: 320246784
One quadruplet: 266872320
One quadruplet + one triplet: 22239360
Two quadruplets: 5559840
Two quadruplets + one triplet: 205920
Three quadruplets: 11440

To summarise, the table below gives the number of 13-card hands containing n pairs.

0: 67108864 (0.01057%)
1: 7524581376 (1.185%)
2: 88468357120 (13.93%)
3: 256211484672 (40.35%)
4: 223515122688 (35.20%)
5: 56370041728 (8.877%)
6: 2856863152 (0.4499%)

This gives the mean as 70382/20825 ~ 3.379688 and the variance as 370854753256/468808755625 ~ 0.791058. The experimental values we got before were accurate to three decimal places, so the code that generated those is probably correct.

And since we already went to the trouble of classifying all thirty-nine cases, what other stats can we pull out from that? We could classify hands by the number of quads.

No quads: 613295870464 (96.58%)
1 quad: 21633003392 (3.407%)
2 quads: 84674304 (0.01333%)
3 quads: 11440 (0.000001802%)
Mean: 0.034334
Variance: 0.033422

Rather unsurprisingly, quads are rare, but obviously also not as rare as in five-card poker. On average, out of the four 13-card hands, there will be a total of 0.137334 quads and 0.136801 hands with at least one quad. In other words, roughly once in every eight deals there will be some hand that has a quad. Monte Carlo suggests that the correlation coefficient of the number of quads between different hands is around 0.0096±0.0018.

Classifying by the number of distinct values gives the following table.

4: 400400 (0.00006305%)
5: 96164640 (0.01514%)
6: 3499651584 (0.5511%)
7: 37026941952 (5.831%)
8: 145979817984 (22.99%)
9: 237641564160 (37.42%)
10: 162691809280 (25.62%)
11: 44084232192 (6.942%)
12: 3925868544 (0.6182%)
13: 67108864 (0.01057%)

I have no idea where I’m going with this. Something that might be worth looking at is the covariance or correlation between different statistics on different hands, rather than just the covariance of a single statistic. Like, it’s obvious that having lots of distinct values in your hand will reduce the chances of someone else having a quad, as long as you interpret “lots” as being “at least 10″, but is this really true or is it just some headcanon interpretation of reality my brain came up with?

Hypocrisy

Throughout high school I held the view that education should be for education purposes, not merely to get a certificate of education to increase chances of employment and shit. It was this incredibly naive and idealistic point of view that led me to putting in practically zero effort in schoolwork (which led to a lot of trouble for everyone—my teachers, parents, classmates, and even the school principals), and yet at the same time I have never regretted it, because I still believe this.

But it’s exam week now, and what have I been doing? Just like the students I’ve been despising since about forever, I spent time studying to the test (where the ends is merely to get a good grade) instead of actually trying to understand the subject matter. Do I have to be this much of an unprincipled person?

About two months ago I happened to meet up with an NUS High schoolmate (this is a big deal because I cut contact with most of them), and he brought up the Pandora’s box that was the topic of studying for the sake of grades. I didn’t think much of it at the time—such is the nature of small talk—but it certainly is thought-provoking now. I mean, yeah, good grades are nice to have, but since when have I ever prioritised grades? Is my desire to get good grades only spurred by the fact that my grades for the first semester just happened to be nigh perfect?

I feel like I didn’t put in a particularly huge amount of effort last semester. Two of the four modules I took were hardly new (Linear Algebra I’s contents were covered in NUS High’s maths honours programme, and Fundamental Concepts of Mathematics was just introductory axiomatic set theory), and I thought I could wing the other two. It turns out that I did, but only barely. My performance on those two exams was horrible, and the fact that most others did worse was more comforting than it should have been.

I messed up my scheduling very badly this semester thanks to the English module, which I ended up dropping despite all that blogging earlier. Sunk cost is a bitch, but the sunk cost fallacy is an even greater bitch. Long story short, I got moderately depressed over everything and couldn’t find the motivation in myself to wake up early to go for morning classes. This persisted even after dropping English, though at that point it was more because my sleep schedule was fubar’d and it wouldn’t have been possible to fix it. (The really cool thing about this is that I get to undergo the same depressing thoughts for at least the next two semesters.)

Anyway, of the two exams thus far, in an ironic twist of events, the one I attended all the classes for was the one I messed up lots of questions on, and the one I skipped for the second half of the semester was the one I had no difficulty with. (Other than wasting 15 minutes trying to find where in my Gaussian elimination I made a mistake.) In all likelihood this trend will continue for the remaining three exams. I have no confidence in scoring well for Geometry (where I attended every lecture), but I expect Probability to be guaranteed-A+ and Linear Algebra II to be manageable.

I don’t think I regret taking Geometry despite my weakness in geometric intuition. I mean, I regret it slightly, but there weren’t really a lot of other options about what to take anyway, and on the plus side, at least my weak geometric intuition is somewhat less weak now than it was at the beginning of the semester.

Yeah so tl;dr I’ll probably be spending the rest of today doing last-minute studying-to-the-test for Linear Algebra II. I hate myself.

Today

Lol today.

1(a)

Let n\mapsto a_n be a sequence of positive reals such that \lim_{n\to\infty}a_n=\alpha. Show that \lim_{n\to\infty}a_n^n/n!=0.

For sufficiently large n>N, we can bound a_n by \tfrac32\alpha, and we can also bound n! by (N!)(2\alpha)^{n-N}.

1(b)

Let x_1:=1 and x_{n+1}:=3/(1+x_n) for n\in\mathbb N. Show that \lim_{n\to\infty}x_n exists.

For all n\in\mathbb N, we have 1\le x_n\le2. (This may be shown with induction.)

\displaystyle x_{n+1}=\frac3{1+x_n}\\  x_{n+2}=\frac3{1+\tfrac3{1+x_n}}=\frac3{\tfrac{4+x_n}{1+x_n}}=\frac{3(1+x_n)}{4+x_n}\\  x_{n+1}-x_n=\frac{3-x_n-x_n^2}{1+x_n}  x_{n+2}-x_{n+1}=3\frac{(1+x_n)^2-(4+x_n)}{(1+x_n)(4+x_n)}=3\frac{-3+x_n+x_n^2}{(1+x_n)(4+x_n)}\\  (x_{n+2}-x_{n+1})/(x_{n+1}-x_n)=-\frac3{4+x_n}\in\left(-\frac35,0\right)

Therefore n\mapsto x_n is a contractive sequence and thus converges. (Oh god I just realised I made a very dumb mistake in the exam itself.)

2(a)

Suppose f is a monotone increasing function on the interval [a,b] such that f(a)\ge a and f(b)\le b. Show that there exists a fixed point x_0\in[a,b].

Couldn’t figure this one out. It’s kind of intuitively obvious, but eh. (If f is continuous we can just apply IVT.)

Suppose to the contrary that f has no fixed point; thus for all x\in[a,b], either f(x)>x or f(x)<x. Let S:=\{x\in[a,b]|f(x)<x\} and y:=\inf S.

If y\notin S, then let \varepsilon:=f(y)-y>0. Thus \exists x\in S such that x<y+\varepsilon=f(y)\le f(x), but also x>f(x) from the definition of S, which is a contradiction.

If y\in S, for all x\in(f(y),y) (where this open interval is well defined because f(y)<y), f(x)>x>f(y), contradicting the monotonicity of f.

Therefore the initial supposition is false and there exists a fixed point.

(Apparently, I’m horrible at utilising infimum/supremum to prove things.)

2(b)

Suppose \gamma is a strictly monotone increasing function on the interval [a,b]. Let n\mapsto x_n be a sequence where a\le x_n\le b for all n\in\mathbb N. Show that \lim_{n\to\infty}\gamma(x_n)=\gamma(b)\implies\lim_{n\to\infty}x_n=b.

Holy shit I misread this question. Though I guess the proof I gave was slightly more general so I’m in the clear. Probably. We’ll prove the more general form here where if \lim_{n\to\infty}\gamma(x_n)=\gamma(c) for some c\in[a,b], then \lim_{n\to\infty}x_n=c.

Suppose otherwise; i.e. there exists some sequence n\mapsto x_n such that \lim_{n\to\infty}\gamma(x_n)=\gamma(c) and \neg(\lim_{n\to\infty}x_n=c).

There exists some \varepsilon>0 such that for all N\in\mathbb N, there exists n\ge N where x_n>c+\varepsilon or x_n<c+\varepsilon. Let \epsilon:=\min\{\gamma(c)-\gamma(c-\varepsilon),\gamma(c+\varepsilon)-\gamma(c)\}. (If c\in\{a,b\}, then disregard the invalid value.)

Since n\mapsto\gamma(x_n) converges, there exists N'\in\mathbb N such that for all n'\ge N', |\gamma(x_{n'})-\gamma(c)|<\epsilon, a contradiction. The proposition is proven.

3(a)

Show that the series \sum_{n\ge1}H_n\tfrac{\sin(n\pi/2)}n converges, where H_n is the nth harmonic number.

I quoted the master theorem here because I was short on time and being rigorous with only the theorems taught in class would’ve been too slow. Shooting myself in the foot, as it were. Anyway. This is an alternating series if you disregard the zero-valued terms, so the only important thing to be shown is that n\mapsto H_n/n is decreasing and tends to zero.

Actually, I just realised that the master theorem can’t even be applied here because this case is too trivial for it, so I guess it’s shooting myself in both feet or something.

3(b)

Find all the sequences n\mapsto a_n such that the series \sum_{n\ge1}a_n converges and for all n\ge0, a_n=\sum_{k\ge1}a_{n+k}^2.

I figured this one out on my way home, which was unfortunately about half an hour too late. There’s a hint right in the question because we wouldn’t otherwise need a_0. As a_n is a sum of nonnegative numbers, it is itself nonnegative for all n, and the sequence is clearly decreasing. Consequently, it is also convergent.

\displaystyle a_n-a_{n+1}=a_{n+1}^2\\  a_{n+1}=\sqrt{\frac14+a_n}-\frac12

We may thus consider sequences uniquely defined by a seed value a_0\ge0 and the above recurrence relation, and since solving for the fixed point gives \lim_{n\to\infty}a_n=0, this exhausts every possible sequence satisfying the stated criteria.

Edit (2015-04-28): So I thought I had it figured out, and I was wrong. The above involves a circular argument and obviously that doesn’t work very well. It turns out that even though we can construct a sequence n\mapsto a_n recursively as above, the partial sums of this sequence diverge because a_n doesn’t converge to 0 fast enough, unless a_0=0. Just in case I’m wrong again, I should probably work out the details now. Suppose a_1>1 and that a_n>1/n for some n\in\mathbb N.

\displaystyle a_{n+1}=\sqrt{\frac14+a_n}-\frac12\\  >\sqrt{\frac14+\frac1n}-\frac12\\  =\sqrt{\left(\frac12+\frac1{n+1}\right)^2+\frac1{n(n+1)^2}}-\frac12\\  >\frac12+\frac1{n+1}-\frac12=\frac1{n+1}

If a_1>0 and a_1\le1, add dummy negative-indexed terms and try again. Then by the comparison test \sum a_n diverges at least as fast as the harmonic series. Therefore the only sequence satisfying the criteria in the question is the zero sequence.

4(a)

Let f be a continuous function on the closed interval [a,b]. Suppose x_1,\dots,x_n are n points in the interval [a,b]. Let \lambda_1,\dots,\lambda_n be n positive reals such that \sum_k\lambda_k=1. Show that there exists x_0\in[a,b] such that f(x_0)=\sum_k\lambda_kf(x_k).

Tutorial 11, question 8. I initially thought of using induction but a direct proof is pretty straightforward too (which was what I wrote).

Since f is continuous over a closed interval, there exist x_{\min},x_{\max}\in[a,b] such that f(x_{\min})=\min_{x\in[a,b]}f(x) and f(x_{\max})=\max_{x\in[a,b]}f(x). Then \sum_k\lambda_kf(x_k)\in\sum_k\lambda_k[f(x_{\min}),f(x_{\max})]=[f(x_{\min}),f(x_{\max})], so by IVT x_0 exists.

4(b)

Evaluate \displaystyle\lim_{x\to0}\left(\frac{2^{x^2}+3^{x^2}}{2^x+3^x}\right)^{1/x}.

\displaystyle\frac1x\left(\frac{2^x+3^x}2-1\right)=\frac1{2x}(2^x-1)+\frac1{2x}(3^x-1)\to\frac12(\log2+\log3)=\log\sqrt6\\  \left(\frac{2^x+3^x}2\right)^{1/x}\to\sqrt6\\  \left(\frac{2^{x^2}+3^{x^2}}2\right)^{1/x}\to(\sqrt6)^x\to1\\  \left(\frac{2^{x^2}+3^{x^2}}{2^x+3^x}\right)^{1/x}\to\frac1{\sqrt6}

Ugly brute forcing; some steps not reproduced here because typing them out would be annoying.

5(a)

Find all continuous functions on \mathbb R satisfying |f(x)|\le1 and f(x+y)+f(x-y)=2f(x)f(y) for all x,y\in\mathbb R. (Hint: If 0<f(a)\le1, then there exists 0\le\theta<\pi/2 such that f(a)=\cos\theta. Then use induction to show that f(na)=\cos n\theta and f(a2^{-n})=\cos\theta2^{-n}.)

The hint given in this question is not quite correct and also slightly misleading (though it’s still useful anyway).

The very first thing you should show is that f must be an even function (substitute -y\to y), which I forgot to write down. Time pressure does funny things to people. Anyway, we’ll focus on what f does to the nonnegative reals.

Substituting x=y=0 in the functional equation, we get f(0)=f(0)^2, so either f(0)=0 or f(0)=1.

Suppose f(0)=0. Substituting y=0 in the functional equation, 2f(x)=0, so f vanishes everywhere. We may check that there is another trivial solution given by f:x\mapsto1.

Suppose f(0)=1 and f is not identically 1. There exists some a such that f(a)\ne1; if f(a)\le0, IVT implies the existence of some a'\in(0,a) such that f(a')=1/2, so WLOG suppose f(a)\in(0,1).

Let a':=\inf\{x>0|f(x)=f(a)\}, where a'\ge0 by the continuity of f. (More specifically, there exists a sequence converging to the infimum, so we may move the limit outside, giving f(a')=f(a)\ne f(0)\implies a'\ne0.) Again, WLOG suppose a=a'. Applying the contrapositive of the IVT, we have that x\in(0,a)\implies f(x)\in(f(a),1). Let \theta(x)=\arccos f(x).

Since f(a)=\cos\theta(a) and f(a/2)>0, we have \displaystyle f(a/2)=\sqrt{\frac{1+\cos\theta(a)}2}=\cos\theta(a)/2, where by induction \displaystyle f(a2^{-n})=\cos\theta(a)2^{-n} for all n\in\mathbb N.

By (strongly) inductively applying the functional equation, we also have that f(nx)=\cos n\theta for all x\in\mathbb R,~n\in\mathbb N and f(x)=\cos\theta. (Details omitted.)

\displaystyle f(x)=f\left(\lim_{n\to\infty}\frac{\lfloor2^nx/a\rfloor}{2^n}a\right)\\  =\lim_{n\to\infty}f\left(\frac{\lfloor2^nx/a\rfloor}{2^n}a\right)\\  =\lim_{n\to\infty}f\left(\lfloor2^nx/a\rfloor\frac a{2^n}\right)\\  =\lim_{n\to\infty}\cos(\lfloor2^nx/a\rfloor\theta(a)2^{-n})\\  =\cos(x\theta(a)/a)

Thus our function must be of the form x\mapsto\cos(bx) for some real b, where it is readily checked that all such functions satisfy both the absolute value bound and the functional equation. This gives us the following solution set: \{x\mapsto0\}\cup\{x\mapsto\cos bx|b\in\mathbb R\}.

5(b)

Suppose that \lim_{n\to\infty}\frac{a_1+\cdots+a_n}n=\alpha. Show that \lim_{n\to\infty}\frac{a_n}n=0.

Stolz’s theorem gives \lim_{n\to\infty}a_n=\alpha, so \lim_{n\to\infty}a_n/n=\alpha\cdot0=0.

Well. I guess there’s still some chance of getting A+ for the module but with so many mistakes, getting just an A might be more likely.

Some fixed-point theorem

Let I be a closed interval and f:I\to I be a function satisfying |f(x)-f(y)|<|x-y| for all distinct x,y\in I. What we want to show is that \lim_{n\to\infty}f^n(x) converges to the unique fixed point of f.

Proofs for the existence and uniqueness of the fixed point are standard. Additionally, since |f\circ f(x)-f\circ f(y)|\le|f(x)-f(y)|<|x-y| for distinct x,y\in I, f\circ f also has a unique fixed point. We also need the “trivial” result that if \lim_{n\to\infty}f^n(x) converges, it converges to the unique fixed point, and likewise for f\circ f; what’s left is to show that this limit exists regardless of starting point.

For any x_0\in I, let x_n=f^n(x_0). Also let y be the fixed point of f. Suppose to the contrary that \lim_{n\to\infty}x_n does not converge. Consider the sequence n\to|x_n-y|, which is clearly positive and decreasing, and thus convergent. If it converges to 0, then \lim_{n\to\infty}x_n=y, a contradiction. Therefore z:=\lim_{n\to\infty}|x_n-y| must be positive, and there exists some N\ge0 such that for all n\ge N, z<|x_n-y|<2z.

If n\to x_n-y is eventually always positive or eventually always negative, then \lim_{n\to\infty}(x_n-y) exists, which is a contradiction. If n\to(x_n-y)(x_{n+1}-y) is eventually always negative, then n\to(x_{2n}-y) is eventually always positive or eventually always negative, where as before we get a contradiction.

Consequently, there exist some i,j\ge N such that

(x_i>y\wedge x_{i+1}>y\wedge x_j>y\wedge x_{j+1}<y)\text{ or}\\  (x_i<y\wedge x_{i+1}<y\wedge x_j<y\wedge x_{j+1}>y).

Then we have

|x_{i+1}-x_{j+1}|=|(x_{i+1}-y)+(y-x_{j+1})|=|x_{i+1}-y|+|y-x_{j+1}|>2z,

but also, since x_i\ne x_j,

|x_{i+1}-x_{j+1}|<|x_i-x_j|=|(x_i-y)-(x_j-y)|<z<2z,

a contradiction. Therefore the initial supposition that \lim_{n\to\infty}x_n is divergent is false, and the proposition is proven.

There’s probably a better way of proving this. This one relies on the structure of the real numbers even though it should hold true when generalised to arbitrary metric spaces. The caveats are that we can’t use the intermediate value theorem to show the existence of a fixed point and that it’s not necessarily true that an ordering exists at all, so splitting into the positive/negative/alternating/none-of-the-above cases is an invalid operation.

Last-minute hurfdurfery

MA2108S, Tutorial 10, let’s go. 16th April, 23:34. All the files are on my home server but it’s not like it’s public anyway, so maybe I’ll upload them to Mediafire some time. I lost focus somewhere around the last part of question 6, so the times are kinda meaningless.

1

Claim: f is continuous on (a,b).

Let x\in(a,b) and \varepsilon:=\min\{x-a,b-x\}. Then since x\in[a+\varepsilon,b-\varepsilon] and f is continuous over that interval, f is continuous at x. Therefore f is continuous on (a,b).

Claim: f is not necessarily continuous on [a,b].

Consider f:x\mapsto1/x over (0,1), where f is clearly discontinuous at 0.

(23:38)

2

For any x\in[a,b], the truncation of the (canonical) binary expansion of x (i.e. \lfloor2^nx\rfloor/2^n) is a rational sequence converging to x also bounded by [a,b], and since f is stated to be continuous,

\displaystyle f(x)=f\left(\lim_{n\to\infty}\frac{\left\lfloor2^nx\right\rfloor}{2^n}\right)=\lim_{n\to\infty}f\left(\frac{\left\lfloor2^nx\right\rfloor}{2^n}\right)=\lim_{n\to\infty}0=0.

(23:44)

3

Um.

\min\{f(x),g(x)\}=\tfrac12(f(x)+g(x)-|f(x)-g(x)|)\\  \max\{f(x),g(x)\}=\tfrac12(f(x)+g(x)+|f(x)-g(x)|)

Since the set of continuous functions is closed under addition, scalar multiplication, and taking the absolute value, the set is also closed under \min and \max.

(23:53)

4

Assumed above, but I guess this actually needs to be proven? (If we assume 3, we can use |f(x)|=\max\{0,f(x)\}+\max\{0,-f(x)\}, but circular arguments are circular.)

Let a be some element of f‘s domain and assume that it is a cluster point too. (If it’s not a cluster point, continuity is trivial.)

\lim_{x\to a}g(x)=\lim_{x\to a}|f(x)|=|f(a)|=g(a)

(00:00)

5(a)

Consider x':=x\mod\pi. If x'\in[0,\pi/6)\cup(5\pi/6,\pi), then f(x)=x. If x'\in\{\pi/6,5\pi/6\}, then f(x)=x/2. Otherwise, |2\sin x|>1, so f(x)=0.

f is continuous over \mathbb R\backslash\pi(\{1/6,5/6\}+\mathbb Z) and discontinuous elsewhere.

5(b)

Continuous over the irrationals, discontinuous over the rationals. Gone through in class etc.

5(c)

Continuous at 0, discontinuous elsewhere. Given any nonzero x, for any \delta, consider the \min\{\delta,|x|/2\}-neighbourhood of x. There exists both a rational and an irrational in this neighbourhood; let these be x_r,x_i respectively. (WLOG assume x,x_r,x_i are all positive.)

x_r>x/2\implies g(x_r)=x_r>x/2\\  g(x_i)=0

Therefore g is discontinuous at every nonzero value.

5(d)

This is clearly discontinuous at the reciprocals of the square roots of nonsquare integers, and continuous between the reciprocals of the square roots of integers. This leaves the cases |x|>1 and x=1/n for some nonzero n.

In the former case, h(x)=0 is constant and hence continuous at x.

In the latter case, suppose WLOG that n>0. We have that h(x)=0 while h(x-)=(n^2+1)(-1)^n\ne0, so h is not continuous at such x.

To summarise, h is discontinuous on \{\sqrt n|n\in\mathbb N\}\cup\{-\sqrt n|n\in\mathbb N\} and continuous elsewhere.

(00:45) (I got distracted, okay.)

6(a)

\displaystyle{\left(\frac{1+x2^x}{1+x3^x}-1\right)/x^2=\frac{2^x-3^x}x\frac1{1+x3^x}=\left(\frac{2^x-1}x-\frac{3^x-1}x\right)\frac1{1+x3^x}\to\left(\log2-\log3\right)1=\log\frac23}

\displaystyle{\left(\frac{1+x2^x}{1+x3^x}\right)\uparrow x^{-2}=\left(\left(\frac{1+x2^x}{1+x3^x}\right)\uparrow\left(\frac{1+x2^x}{1+x3^x}-1\right)^{-1}\right)\uparrow\left(\left(\frac{1+x2^x}{1+x3^x}-1\right)/x^2\right)\to\exp\log\frac23=\frac23}

6(b)

\displaystyle\frac{(x+e^x)-1}x=1+\frac{e^x-1}x\to2

\displaystyle(x+e^x)^{1/x}=\left((x+e^x)^{1/(x+e^x-1)}\right)^{(x+e^x-1)/x}\to e^2

6(c)

\displaystyle x^{\sin x}=\exp((\log x)(\sin x))=\exp\left((\log x)x\left(\frac{\sin x}x\right)\right)\to\exp(0\cdot1)=1

6(d)

Apparently, for this kind of questions you’re supposed to use cheating methods like Taylor series to get a guess of what the behaviour is, then rigorously prove it without Taylor series! Seems kinda backwards.

\displaystyle\log(x+\sqrt{1-x^2})=\log\left(1+x-\frac{x^2}2+O(x^3)\right)=x-x^2+O(x^3)\\  \log(nx+\sqrt{1-n^2x^2})=\log\left(1+nx-\frac{(nx)^2}2+O(x^3)\right)=nx-(nx)^2+O(x^3)

Hmm. Not quite the right approach.

\displaystyle(x+\sqrt{1-x^2})^{1/x}=\left((x+\sqrt{1-x^2})^{1/(x+\sqrt{1-x^2}-1)}\right)^{(x+\sqrt{1-x^2}-1)/x}\to\exp1=e

\displaystyle{\lim_{x\to0}\frac{\log(nx+\sqrt{1-(nx)^2})}{\log(x+\sqrt{1-x^2})}=\lim_{x\to0}n\frac{\log(nx+\sqrt{1-(nx)^2})^{1/(nx)}}{\log(x+\sqrt{1-x^2})^{1/x}}=n\lim_{x\to0}\frac{\log(nx+\sqrt{1-(nx)^2})^{1/(nx)}}{\log(x+\sqrt{1-x^2})^{1/x}}=n\frac ee=n}

6(e)

\displaystyle0\le\sin\log\left(1+\frac1x\right)<\log\left(1+\frac1x\right)\to0\implies\sin\log\left(1+\frac1x\right)\to0\\  \sin\log(x+1)-\sin\log x=2\left(\sin\log\frac{x+1}x\right)\cos\log\sqrt{x(x+1)}\to0

6(f)

Second derivative?

\displaystyle\frac{\log(x+h)+\log(x-h)-2\log x}{h^2}=\frac{\log(1+h/x)+\log(1-h/x)}{h^2}=\frac{\log(1-h^2/x^2)}{h^2}=x^{-2}\log(1+h^2/x^2)^{x^2/h^2}\to x^{-2}

(01:57)

7

Let f:x\mapsto[x=0] and consider behaviour around x=0.

\displaystyle\lim_{h\to0}|f(x+h)-f(x-h)|=0\\  \lim_{h\to0}|f(x+h)-f(x)|=1

(02:25)

Actually, I might as well do tutorial 11 too.

1

Follows from intermediate value theorem and induction; in this case it would be easier to prove the more general statement for any convex combination, not just the mean.

2

Again, intermediate value theorem. For sufficiently large and sufficiently small x, f(x)>0, and since f(0)=-1<0, IVT gives us the existence of at least two distinct roots.

3

Bi-Lipschitz blah blah. f(x) is well defined (cf. tutorial 9 question 1), so all that remains is to show that f is continuous.

|x-x_0|=|y-y_0-\varepsilon(\sin y-\sin y_0)|\ge|y-y_0|-\varepsilon|\sin y-\sin y_0|\ge|y-y_0|(1-\varepsilon)\\  |y-y_0|\le|x-x_0|\frac1{1-\varepsilon}

f is Lipschitz continuous and therefore also continuous. (A special case of a more general theorem on how a small Lipschitz perturbation of the identity function gives a bi-Lipschitz result.)

4

\forall n\in\mathbb N\quad g(x_n)-f(x_n)=f(x_{n+1})-f(x_n)

If n\mapsto x_n is either nonincreasing or nondecreasing, then it is a convergent sequence, where we may let x_0:=\lim_{n\to\infty}x_n and f(x_0)=g(x_0).

Otherwise, there exists some n for which (x_{n+1}-x_n)(x_{n+2}-x_{n+1})\le0.

\mathrm{sgn}((f(x_{n+1})-f(x_n))(f(x_{n+2})-f(x_{n+1})))=\mathrm{sgn}((x_{n+1}-x_n)(x_{n+2}-x_{n+1}))\le0

But also

\mathrm{sgn}((f(x_{n+1})-f(x_n))(f(x_{n+2})-f(x_{n+1})))=\mathrm{sgn}((g(x_n)-f(x_n))(g(x_{n+1})-f(x_{n+1}))),

where, by IVT, there exists some value of x between x_n and x_{n+1} such that f(x)=g(x).

5

This is a homework question.

6

Suppose for contradiction that f is continuous. Then it attains a minimum value a at two points x_1\ne x_2, and likewise a maximum value b for two other points x_3\ne x_4. WLOG let x_1<x_2 and x_3<x_4.

If x_1\ne0, let \epsilon:=\min\{(x_1)/2,(x_2-x_1)/3\}; if x_2\ne1, let \epsilon:=\min\{(x_2-x_1)/2,x_2\}. Let a':=\min f(\{x_1-\epsilon,x_1+\epsilon,x_2-\epsilon,x_2+\epsilon\}\cap[0,1]). If a=a', we have a contradiction. Suppose a\ne a'. By IVT, at least three of these statements are true (with the fourth possibly failing due to asserting the existence of an element in an empty set):

\displaystyle\exists x\in(x_1-\epsilon,x_1)\quad f(x)=(a+a')/2\\  \exists x\in(x_1,x_1+\epsilon)\quad f(x)=(a+a')/2\\  \exists x\in(x_2-\epsilon,x_2)\quad f(x)=(a+a')/2\\  \exists x\in(x_2,x_2+\epsilon)\quad f(x)=(a+a')/2

As the open intervals are all disjoint, this implies there are at least three distinct values of x such that f(x)=(a+a')/2, a contradiction.

Therefore x_1=0 and x_2=1. By the same argument, x_3=0 and x_4=1. However, this means that \max_{0\le x\le1}f(x)=f(x_3)=f(0)=f(x_1)=\min_{0\le x\le1}f(x); in other words, f is a constant function. This is also a contradiction.

Consequently, the initial assumption that f is continuous is untenable.

7

“A continuous endomorphism has a fixed point.”

If f(0)=0 or f(1)=1 we are done. Suppose otherwise, and let g(x):=f(x)-x.

g(0)>0 and g(1)<0, so by IVT there exists some x\in(0,1) such that f(x)=x.

8

This is exactly what I said for question 1.

9

Definition quoting time!

Suppose that f is uniformly continuous and that \lim_{n\to\infty}(x_n-y_n)=0. For any \varepsilon, there exists some \delta such that |x-y|<\delta\implies|f(x)-f(y)|, and there exists some N such that for all n\ge N, |x-y|<\delta. Putting two and two together, \lim_{n\to\infty}(f(x_n)-f(y_n))=0.

In the other direction, suppose that f is not uniformly continuous. Then there exists some \varepsilon>0 such that for all \delta>0 there exist some x(\delta),y(\delta) such that |x-y|<\delta and |f(x)-f(y)|>\varepsilon. Then construct sequences x_n:=x(1/n) and y_n:=y(1/n), where \lim_{n\to\infty}(x_n-y_n)=0 and \lim_{n\to\infty}(f(x_n)-f(y_n))>0.

10

Oh god it’s 4 am I can’t think any more.

Counterexamples

It turns out that my mathematical intuition is the visual sort of intuition and for some reason, in spite of that, I’m terrible at reasoning about geometrical objects. Is it merely due to lack of experience? Probably. But no, that’s not the point of this post.

MA2108S, Tutorial 8, question 4.

Suppose f(x)\ge M for some constant M>0 and x\in[a,\infty). Suppose that \lim_{x\to\infty}f(x+1)/f(x)=\alpha exists. Show that \lim_{x\to\infty}\sqrt[x]{f(x)}=\alpha.

Sounds like this could be true, right? After all, if you consider f(x) as just a discrete sequence f(a),f(a+1),\dots this follows immediately from Stolz’s theorem. In fact, if f is bounded over every interval [a',a'+1] where a'\ge a, this statement is true. If we impose the requirement that f is continuous, this implies boundedness over every such interval and thus the proposition holds. (Proof of this is omitted, because hey, look at the post title. But basically the idea is to prove that f(a+n+x)^{1/(a+n+x)} cannot differ too much from f(a+n)^{1/(a+n)} when n\in\mathbb N is large enough and 0\le x\le 1.)

However, finding examples in analysis is all about pathological functions too hard to graph and comprehend. A bounded function won’t do, so we need an unbounded function. But we can’t have a function that has singularities either (like x\mapsto1/x). If we can’t have a function that’s nicely defined, we might as well make it super-pathological.

\begin{array}{rl}f:\mathbb [0,\infty)\to&[1,\infty)\\  x\mapsto&\begin{cases}1\text{ if }x\notin\mathbb Q\\n\text{ if }x\in\mathbb Q\text{ and }x=m/n,~\gcd(m,n)=1,~n>0\end{cases}  \end{array}

This is sort of like the reciprocal of Thomae’s function. Just happened to be the first thing that came to mind because we covered that in class a while ago, though I can’t think of a simpler and equally suitable function. Anyway.

\displaystyle\forall x\in\mathbb R^+\quad f(x+1)=f(x)\\  \implies f(x+1)/f(x)=1\\  \implies\lim_{x\to\infty}f(x+1)/f(x)=1

Kinda obvious, yes? The problem is that now we can also show that \limsup_{x\to\infty}\sqrt[x]{f(x)}=+\infty, and we know that 1 is not infinite. Probably. Come to think about it, we never actually proved the finiteness of 1. Let’s assume that.

For any positive integer N, let \displaystyle x:=N+\frac1{N^{N+1}}>N.

\displaystyle f(x)=N^{N+1}\\  \sqrt[x]{f(x)}=N^{(N+1)/\left(N+\frac1{N^{N+1}}\right)}\ge N

Thus we have shown that \limsup_{x\to\infty}\sqrt[x]{f(x)}\ge\limsup_{x\to\infty}\lceil x\rceil=+\infty, and consequently, \lim_{x\to\infty}\sqrt[x]{f(x)} diverges as well.

Actually, the crux of this proof is that for every a'\ge a, f is unbounded over the interval [a'+n,a'+n+1] for some nonnegative integer n, and put that way, it’s a lot easier to come up with other counterexamples.