Datafication, Phantasmagoria of the 21st Century

Tag: Algorithms

Algorithmic Bias in Education. Case Study From The MarkUp.

The MarkUp (an investigative publication focusing on Tech) has released an investigation into the Wisconsin’s state algorithm used to predict middle school students’ dropout before they graduate from high school.

Read the whole story here.

The algorithm is called The Dropout Early Warning System (DEWS). Students dropout is an important issue that needs to be addressed. Improving the chances of students staying in school and graduating from high school is a laudable goal. The question is: are algorithms reliable tools to do so? As it happens, it seems that they are not.

DEWS has been used for over a decade. The data used to create scores includes test scores, disciplinary records, and race. Students scoring below 78.5% are marked as High Risk (and a red mark appears next to their name). The MarkUp reports that comparisons between DEWS prediction and state records of actual graduations show the system is wrong three quarter of the time, especially for Black and Hispanic students. In other words, the system invalidates the very purpose for the system to exist.

Even more telling: the Department of Public Instruction (DPI) ran its own investigation into DEWS and concluded that the system was unfair. That was in 2021. In 2023, DEWS is still in use. Does this mean that our over-reliance on algorithmic systems has created a situation where we know they fail us, but we have no credible alternative, so we keep using them?

I am reminded of the seminal book by Neil Postman “Technopoly”. He says that in a Technopoly, the purpose of technology is NOT to serve people or life. It justifies its own existence merely by existing. In a Technopoly, technology is not a tool, “people are the tools of their tools” (p68). More importantly, and more problematically, “once a technology is admitted, it plays out its hand; it does what it is designed to do. Our task is to understand what that design is” (p128). It is safe to say that digital technologies have been admitted, but do we really have any understanding of what their design is?

Algorithmic Technology, Knowledge Production (And A Few Comments In Between)

So, digital technologies are going to save the world.

Or are they?

Let’s have a no non-sense look at how things really work.

A few comments first.

I am not a Luddite.

[Just a side comment here: Luddites were English textile workers in the 19th century who reacted strongly against the mechanisation of their trade which put them out of work and unable to support their families. Today, they have become the poster-child of anti-progress, anti-technology grumpy old bores, and “you’re a Luddite” is a common insult directed at techno-sceptics of all sorts. But Luddites were actually behaving quite rationally. Many people in the world today react in a similar fashion in the face of the economic uncertainty brought about by technological change.]

That being said, I am not anti-technology. I am extremely grateful for the applications of digital technology that help make the world a better place in many ways. I am fascinated by the ingenuity and the creativity displayed in the development of technologies to solve puzzling problems. I also welcome the fact that major technological shifts have brought major changes in how we live in the world. This is unavoidable, it is part of the impermanent nature of our worlds. Emergence of the new is to be welcomed rather than fought against.

But I am also a strong believer in using discrimination to try to make sense of new technologies, and to critically assess their systemic impact, especially when they have become the object of such hype. The history of humanity is paved with examples of collective blindness. We can’t consider ourselves immune to it.

The focus of my research (and of this post) is Datafication, i.e., the algorithmic quantification of purely qualitative aspects of life. I mention this because there are many other domains that comfortably lend themselves to quantification.

I am using a simple vocabulary in this post. This is on purpose, because words can be deceiving. Names such as Artificial Intelligence (AI) or Natural Language Processing (NLP) are highly evocative and misleading, suggesting human-like abilities. There is so much excitement and fanfare around them that it’s worth going back to the basics and calling a cat a cat (or a machine a machine). There is a lot of hype around whether AI is sentient or could become sentient but as of today, there are many simple actions that AI cannot perform satisfactorily (recognise a non-white-male face for one), not to mention the deeper issues that plague it (bias in data used to feed algorithms, the illusory belief that algorithms are neutral, the lack of accountability, the data surveillance architectures… just to name a few). It is just too easy to discard these technical, political, social issues in the belief that they will “soon” be overcome.

But hype time is not a time for deep reflection. If the incredible excitement around ChatGPT (despite the repeated urge for caution from its founder) is any indication, we are living through another round of renewed collective euphoria. A few years ago, the object of this collective rapture was social media. Today, knowing what we know about the harms they create, it is becoming more difficult to feel deliciously aroused by Facebook and co., but AI has grabbed the intoxication baton. The most grandiose claims are claims of sentience, including from AI engineers who undoubtedly have the ability to make the machines, but whose expertise in assessing their sentience is highly debatable. But in the digital age, extravagant assertions sell newspapers, make stocks shoot up, or bring fame, so it may not all be so surprising.

But I digress…

How does algorithmic technology create “knowledge” about qualitative aspects of life?

First, it collects and processes existing data from the social realm to create “knowledge”. It is important to understand that the original data collected is frequently incomplete, and often reflects the existing biases of the social milieu from where it is extracted. The idea that algorithms are neutral is sexy but false. Algorithms are a set of instructions that control the processing of data. They are only as good as the data they work with. So, I put the word “knowledge” in quotation marks to show that we have to scrutinise its meaning in this context, and use discrimination to examine what type of knowledge is created, what function it carries out, and whose interests it serves.

Algorithmic technology relies on computer-ready, quantified data. Computers are not equipped to handle the fuzziness of qualitative, relational, embodied, experiential data. But a lot of data produced in the world everyday is warm data. (Nora Bateson coined that term by the way, check The Bateson Institute website to know more, it is well worth a read). It is fuzzy, changing, qualitative, not clearly defined, and certainly not reducible to discrete quantities. But computers can only deal with quantities, discrete data bits. So, in order to be read by computers, the data collected needs to be cleaned and turned into “structured data”. What does “structured” mean? It means that it has to be transformed into data that can be read by computers; it needs to be turned into bits; it needs to be quantified.

So this begs the question: how is unquantified data turned into quantified data? Essentially, through two processes.

The first one is called “proxying”. The logic is: “I can’t use X, so I will use a proxy for X, an equivalent”. While this sounds great in theory, it has two important implications. Firstly, a suitable proxy may or may not exist so the relationship of similarity between X and its proxy may be thin. Secondly, someone has to decide which quantifiable equivalent will be used. I insist on the word “someone”, because it means that “someone” has to make that decision, a decision that is far from neutral, highly political and potentially carrying many social (unintended) consequences. In many instances, those decisions are made not by the stakeholders who have a lived understanding of the context where the algorithmic technology will be applied, but by the developers of the technology who lack such understanding.

Some examples of proxied data: assessing teachers’ effectiveness through their students’ test results; ranking “education excellence” at universities using SAT scores, student-teacher ratios, and acceptance rates (that’s what the editors at US News did when they started their university ranking project); evaluating an influencer’s trustworthiness by the number of followers she has (thereby creating unintended consequences as described in this New York Times investigative piece “The Follower Factory”); using credit worthiness to screen potential new corporate hires. And more… Those examples come from a fantastic book by math-PhD-data-scientist turned activist Cathy O’Neil called “Weapons of Math Destruction”. If you don’t have time or the inclination to read the book, Cathy also distills the essence of her argument in a TED talk, “The era of blind faith in big data must end”.

While all of the above sounds like a lot of work, there is data that is just too fuzzy to be structured and too complex to be proxied. So the second way to treat unstructured data is quite simple: abandon it. Forget about it! It never existed. Job done, problem solved. While this is convenient, of course, it becomes clear that this leaves out A LOT of important information about the social, especially because a major part of qualitative data produced in the social realm falls into this category. It also leave out the delicate but essential qualitative relational data that weaves the fabric of living ecosystems. So in essence, after the proxying and the pruning of qualitative data, it is easy to see how the so-called “knowledge” that algorithms produce is a rather poor reflection of social reality.

But (and that’s a big but), algorithmic technology is very attractive, because it makes decision-making convenient. How so? By removing uncertainty (of course I should say by giving the illusion of removing uncertainty). How so? Because it predicts the future (of course I should say by giving the illusion of predicting the future). Algorithmic technology applied to the social is essentially a technology of prediction. Shoshana Zuboff describes this at length in her seminal book published in 2019 “The Age of Surveillance Capitalism: The Fight for a Human Future in the New Frontier of Power”. If you do not have the stomach to read through the 500+ pages, just search “Zuboff Surveillance Capitalism”, you can find a plethora of interviews, articles and seminars she gave since the publication. (Just do me a favour and don’t use Google and Chrome to search, but switch to cleaner browsers like Firefox and search engines like DuckDuckGo). She clearly and masterfully elucidates how Google’s and Facebook’s money machines rely on packaging “prediction products” that are traded on “behavioural futures markets” which aim to erase the uncertainty of human behaviour.

There is a lot more to say on this (and I may do so in a later post), but for now, suffice it to say that just like the regenerative processes of nature are being damaged by mechanistic human activity, life-enhancing tacit ways of knowing are being submerged by the datafied production of knowledge. While algorithmic knowledge creation has a place and usefulness, its widespread use overshadows and overwhelms more tacit, warm, qualitative, embodied, experiential, human ways of knowing and being. The algorithmisation of human experience is creating a false knowledge of the world (see my 3mn presentation at TEDx in 2021).

This increasing lopsidedness is problematic and dangerous. Problematic because while prediction seems to make decision-making more convenient and efficient, convenience and efficiency are not life-enhancing values. Furthermore, prediction is not understanding, and understanding (or meaning-giving) is an important part of how we orient ourselves in the world. It is also problematically unfair because it creates massive asymmetries of knowledge and therefore a massive imbalance of power.

It is dangerous because while the algorithmic medium is indeed revolutionary, the ideology underlying it is dated and hazardous. The global issues and the potential for planetary annihilation that we are facing today arose from a reductionist mindset that sees living beings as machines and a positivist ideology that fundamentally distrusts tacit aspects of the human mind.

We urgently need a pendulum shift to rebalance algorithmically-produced knowledge with warm ways of knowing in order to create an ecology of knowledge that is conducive to the thriving of life on our planet.

Algorithmic Sociality

I had a discussion about cells membranes and boundaries with a friend. The discussion arose from a quote by Fritjof Capra in his course The Systems View Of Life: “Boundaries in the biological realm are not boundaries of separation but boundaries of identity”. My friend’s questions was ‘What is the function of a membrane in social dynamics?’

This discussion about social membranes creating social identity reminds me about the phenomenon of “filter bubbles” created by the algorithms of social media platforms (for those unfamiliar with the concept, Eli Pariser’s TED talk is a good entry point. Basically, by editing what information we get access to (through search or in our newsfeed), online algorithms create a membrane around us that narrowly define our identity, and this is reinforced by constantly feeding us more of the same.


A few years ago in 2017, I did an explorative and investigative study on Facebook to interrogate the algorithmic black box. I created two fake profiles, Samsara Marks (female) and Bertrand Wooster (male). Samsara’s profile was richly fleshed out (highly educated professional with feminist interests), but gave FB only minimal info about Bertrand: his age (late 40s), a (random) Hong Kong mobile number, his residency (Hong Kong), his country of citizenship (UK) (and of course FB had pieces of digital info even though I created the profile behind the university’s firewall from a random computer at uni).


With this limited info, FB suggested 150 friends for Bertrand at the first login (interestingly most of them outside of Hong Kong). I accepted all FB suggestions (and subsequent suggestions as well). I am not going to bore you with details, but to make a long story short, Bertrand found himself transported to the bowels of FB: explicit sexual content, prostitution and what I suspected could be pedophile networks, and “how to” videos on how to make weapons to shoot down missiles (I am not making this up) amongst others. Friendless but highly accomplished Samsara on the other hand keeps receiving ads for kitchen appliances and dresses.


My purpose for posting this is to bring attention to the active role of social platforms in shaping sociability and creating social membranes around us. One of the conclusions of the experience was that, once the algorithms has established the membrane, it takes conscious efforts, extreme determination and a very consistent strategy to change what the membrane lets in and out.