March 27, 2024
Substack recently introduced native survey functionality, which is a neat feature. The survey editor in the dashboard suggests that writers ask their readers such questions as “How much total combined money did all members of your HOUSEHOLD earn last year?” (capitals sic), “What is the highest level of education you have completed or the highest degree you have received?”, “In a typical day, about how much time do you spend watching videos?”, “What is your relationship status?”, “What is your mailing address?”, and many, many others like this.
I put together my own survey for readers of Orbis Tertius. I think I have come up with questions that will really capture the insights about readers that the Board and I are most interested in. It would mean a lot to us if you took a few minutes to fill it out:
June 29, 2024
I’m terribly sorry I haven’t been able to share any writing with you all lately. It’s been three months since I sent out that reader survey, and since then, the Orbis Tertius Board of Directors has been pressuring me to deliver insights from the survey data to inform the direction of the publication. Since it went out, I haven’t had the space to work on any drafts—in fact, if they found out that I was writing this note right now (it’s after midnight, I have a sticky note over my webcam, and a towel under my laptop to dampen the sound of my typing) I have no doubt they would fire me. It’s a very critical time in the fiscal year and in the broader trajectory of the organization, and they—the members of the Board—have made it very clear that my fate hangs in the balance of the insights rendered from the very survey that you all filled out a few months ago.
Amazingly, there were over four hundred thousand responses to the survey, a sample size actually well in excess of the population, which makes the statistical possibilities nothing less than miraculous. The survey went out a couple weeks before the second quarter Board meeting, and so I conducted what I thought was a very compelling analysis of the survey data to show to the Board. I sliced the audience into several different cohorts, I constructed a few personas, I conducted look-alike analyses and clusterings, I made beautiful geographic and demographic visualizations, and I projected revenue for the rest of the fiscal year.
I joined the meeting as usual, confident and relaxed, right on time and with my camera on. The Board members all joined the meeting in their usual fashion, eight minutes late and with their cameras off. I presented the results of my analysis to the familiar grid of squares: K, B, C, D, D, C, T, H, N, J, M, a thirty-minute extrapolation of my methods, figures, and results.
To be perfectly blunt, the Board was not impressed. “Dawson,” K said after unmuting, “this is all very interesting, but where are the insights?”
I could detect agreement from the other mute letters.
“Well, I think the insight lies in our ability to…” I began to say, but hesitated, trying to gather my thoughts.
T unmuted and said: “Yes, unfortunately I just don’t think any of this is very actionable.”
“Well,” I started to say, “maybe we should define the outcomes—”
But I had completely lost control of the conversation at this point. M interrupted me: “I think I speak for all of us on the Board when I say that this simply isn’t good enough. Let’s reschedule this meeting for two weeks from now, and Dawson, I hope you’ll have some valuable insights by then.”
I spent the rest of the day staring at the raw data from the survey, feeling overwhelming despair. I had no idea what the Board wanted from me. Try to remember, I wanted to tell them, I’m only pretending to be a data scientist—in reality, I’m a writer! But I knew such a plea would fall on deaf ears. Orbis Tertius is a complex organ and the Board is completely focused on the things that are most important to its functioning.
When I woke the next morning, still at my desk, I groaned and tried to go back to sleep. But as I lay there, trying to avoid thinking about my situation, a possibility revealed itself: maybe what the board was looking for was not insight, but foresight.
Suddenly I saw a path ahead. I got to work immediately, feeding the survey data into state-of-the-art machine learning models. What was I trying to predict? Anything. Nothing. In my statistical ecstasy I felt that it did not matter what, exactly, one could foresee; rather, it was the simple power of foresight that was compelling to the Board. There was no specific question that the Board wanted to ask an oracle, they just wanted an oracle. If I could offer that power to them, it would satiate their lust, I was sure of it.
I worked day and night to build an oracle inside of two weeks. Eventually it became clear that the survey data, miraculous though they were, would not be sufficient for the power I needed to manifest. Even though the models I was training were achieving over 100% accuracy in cross-validation, with the survey dataset alone the only variables they could predict were variables from within the survey dataset itself… even in my ecstasy I saw how useless it was. What good is an oracle that can only tell you things you already know? But, I realized, I could combine the survey data with the Orbis Tertius user activity data—the millions upon millions of observations of user activity, from likes and comments to simple page views. I collated these data into tabular form, aggregated them, and joined them with the survey data.
Endless prophetical possibilities were suddenly visible. Reinvigorated, I resumed my frenzied search for oracular power. I probably spawned a thousand useless oracles in the next few days. Some were more interesting than others, but all of them were disappointing. The augmented survey data were still not enough (or, maybe, I was beginning to lose my grip on outcomes and had been consumed by lust myself). I spent a night without sleep struggling over what to do. The rescheduled meeting would take place in a few days, and I still had nothing to show to the Board.
Then, another revelation—this one I feel even now was a stroke of genius: one could combine to the survey data and the user activity data the actual writings of Orbis Tertius, merging the audience behavior and survey attributes with the very linguistic spirit of the publication. This I did, distilling the essays and stories into pure numerical representations by way of transformer embeddings (each writing represented by a matrix of staggering dimensions, the whole corpus comprising a vast rank-three tensor), and joining them up with the activity data and the survey data to form a truly awe-inspiring statistical sphinx.
It was not six hours later that I held a true oracle in my hands. I slept soundly that night, dreaming numeric-prophetical visions.
The meeting began and I joined once again confident and relatively relaxed. The Board of Directors joined the meeting a little extra late, as if to show their impatience. I launched into my demonstration of the power of foresight without introduction. For fifteen minutes I waxed to the Board about the power I had forged.
Afterwards, there was silence. At first I thought I could detect the experience of awe behind the mute squares; but then I was certain I could hear them whispering:
“You poor fool…”
“Obscene, isn’t it…?”
“How disappointing…”
“Foul, foul, foul…”
“Gods be with ye…”
“We are far too busy to have our time wasted like this,” M finally said, and the whispers stopped. “The third quarter Board meeting is in two months.” Before I could say anything, the meeting was ended.
In the following silence I was overcome with fear. Everyone knows the fate that belies the man who gives the wrong answer to the sphinx’s riddle. I crawled out of my office, along the floor of the hallway, and into my bed, taking all of my clothes off along the way, and lay there waiting for my judgement to be delivered. I did not eat or drink or go to the bathroom. I went in and out of sleep and dreamed useless numerical dreams and lost track of how much time had passed until I realized that I was still alive. The sphinx had given me another chance. To this day, I have no idea why I was spared. I carried myself back to my desk, and—naked, starved, and unaware of how much time I had left—I began to ponder the survey data again.
It seemed that the harder I tried, the further I looked, my despair at the investigation only multiplied. The insight that lay behind the survey data seemed more and more elusive and illusory.
This is the thing about statistics. A statistic is actually a reduction of information—the average, for example, is an irreversible reduction of the information contained in a dataset. You can reduce the information in a dataset in infinitely many ways, some useful and some not, but you can never increase the amount of information. This is why we speak about the “entropy” of data—statistics is very much analogous to the thermodynamic evolution of the universe and its distant fate as a cold, unmoving blackness (in fact, if you really get down to it, the evolution of the universe is, at bottom, a statistical process), but, where matter and energy can never be created nor destroyed, information can never be created but it can be destroyed. You are destroying information every time you compute a statistic.
Even machine learning models, which are nothing more than elaborate statistical objects, do not create new information. The predictions they generate are also statistics, reductions of the data you feed them; mere numerical waste, information that was already contained in the data from whence you began.
This is the information-theoretical certainty which forever limits the abilities of statistical devices. The creation of new information, even a single bit, is the Holy Grail, the Philosopher’s Stone, the one true statistic, transcending variance… but such artifice would be magic.
I reflected on this as I gazed upon the sphinx, all its individual feathers and claws and teeth, that is, the individual rows of data, the un-reduced information. After so much careful attention and manipulation, I understood the data completely… the insights I was pursuing were, in fact, superfluous to me, because I saw things as they really were, un-reduced, un-altered, in all of their individuated richness. Maximum information is static; the sphinx was like a sea of static that I was able comprehend in its full resolution, noticing each colorful pixel each fractional second… but this did not help me. I could not make the Board see this way. They could not, or would not, comprehend the sphinx.
Weeks of contemplation passed. I continued to fast and dedicate all of my energies to comprehending the sphinx, trying to divine the answer to its riddle through methods well outside the traditions detailed in my textbooks. I rested only to dream my numerical dreams, which became increasingly weird—visions of odd, unreal, even irrational numbers—but also seemed to inform my contemplation more and more. Eventually my waking contemplation became indistinguishable from my nocturnal reflection, and there would be times that I would realize that I was straining to deploy complex statistical methods against fluctuating nonsense data in my sleep; or that I was applying impossible dream-logical analysis to the real data while awake.
I could go on for ages detailing the texture of my contemplation and the intimacy with the data which I gained, but I have already gone on far too long. The Board meeting is supposed to begin in mere hours. The purpose of this note is to share my final revelation, the true answer to the sphinx’s riddle. In retrospect it is obvious… I should have been able to see it from the beginning. All along, the only truths that lay behind the data were more questions. I realized this six hours ago, in a moment of utter calm, like the moment a candle is blown out. The sphinx was not urging me towards an answer, but to more questions—another survey, begotten by the survey data themselves. A statistical ouroboros. I have extracted these new questions, which are the true answers, from the dataset—using methods that, to be honest, I cannot quite remember—and created another survey:
You can respond to it, or not—it doesn’t entirely matter, as it is the survey itself the Board is looking for, not the answers, I’m sure of it… or perhaps this survey will beget another, and so on, an eternal return of question and answer.