Okay, it’s time to roll up our sleeves for the big guns: ethics and data.
It’s probably not a surprising piece of information for me to share with you that much of what you engage with online — from Google to Facebook and back again — is mining your data, using analytics to tailor advertising for your personal eyeballs. Yes, your phone is listening to you; yes, your browser is tracking you; yes, your store loyalty card follows your buying habits like they’re the most fascinating telenovella ever made. You know all this. And you know that every week we seem to have a major data breech by some major corporation, and you know that Facebook is peddling disinformation, and you know that every online service you use for free is selling your data to someone else for big money.
And yet, what exactly are you supposed to do? I recently left Facebook, partly because I find its ethics so upsetting and partly because so too are the politics of my distant relatives, and it’s actually really hard to function in a small centre without Facebook, where very few businesses have any web presence beyond a Facebook page. This, of course, is Facebook’s intention — it wants to be your walled garden, the only path to connecting online. And I guess you could live without store loyalty cards, but it’s a hard pill to swallow when you have to pay higher prices for the privilege, and this price disparity also targets people who can’t afford to make a different choice. You could never use a Google product — unless you have kids in the K-12 system and you’ve already signed off on their use of ChromeBooks and GSuite.
These systems were not designed to be opted-out from.
I’m not interested in answers to the question of what we do to protect our data that rely on unplugging and disconnecting, because I don’t think those things are realistic expectations. And I struggle with the expectation that the end user is exclusively responsible for their own data privacy when Terms of Service agreements are inscrutable unless you have a law degree and appropriate corporate regulation is so much more effective (like, what is government for, exactly?). That’s not to say it’s not wise to be as data-aware as you can be (check out this phenomenal Data Detox that walks you through the steps of being more mindful about who knows what about you), because knowledge is power and it’s important to know what data you’re giving away, as best you can. But I am more interested in talking about the larger ethics of the companies involved in data collection, and by extension the choices we do make around data, particularly when we engage with the data of other people. And I want to talk about the data mining we may be less aware of: EdTech’s penchant for swiping data is every bit as menacing as Facebook and Google.
“But students don’t care about privacy!”
Here in BC, we have some of the strictest privacy laws in North America, and there’s no question that it limits some of what we can do with technology in the classroom. The shorthand for our privacy laws is basically that student data (names, grades, student numbers, assignments) cannot be stored on servers subject to the PATRIOT Act — eg, servers located in the USA — which reduces the companies we can engage with. There are workarounds (we can have students sign waivers, for example), and many companies have servers in Canada now to accommodate this need. And BCCampus leads the charge in making sure we have, where possible, made-in-BC solutions to many problems. But I still spend a lot of time talking to people about what is and is not FIPPA-compliant… and finding out about a lot of sketchy things instructors are doing (don’t worry, I’m not outing any of you here — lets call it technologist-client confidentiality — but I do hope you keep reading).
In some cases, non-compliance is just because people don’t know. I’ve been teaching in BC for a decade now, and everything I know about FIPPA is self-taught; most of us don’t seem to be well-briefed on our obligations around data privacy when we are hired, and I speak from the privileged perspective of being hired directly into a full-time position. I’m certain the training for precarious and non-permanent employees is even less. In some cases, instructors have a vague idea of FIPPA, but don’t know how to find out if a tool is compliant or what questions they’re expected to ask, and so when pressed for time and without other options, they use the tool anyway. And surprisingly often, I hear from instructors that they aren’t worried about FIPPA-compliance, because students don’t care about privacy, anyway.
In the first case, I provide information; in the second, I provide alternatives; in the third case, I push back. I’ve never seen evidence to support that students (or that amorphous group we call “young people”) care less about data privacy than other groups. And even if that were true, I increasingly believe that insofar as post-secondary education prepares students for life, it’s our job to make students aware of policies like FIPPA and the reasons for regulation of access to their data. Because the truth is, I don’t think any of us truly get the implications of poor data practices until someone explains it to us. Or until it’s too late.
The Shocker: Big EdTech is Mining Your Data, Too
Okay, maybe not a shocker. But it took me a long time to accept that educational technology is more a Silicon Valley space than an academic one — in my previous life as a literature scholar, I never anticipated having strong opinions about venture capitalism, but here we are. For-profit educational technology companies can make money in a limited number of ways: they can sell you or your institutions one-shot products, they can offer a subscription model that is institution-paid or student-paid, or they can sell the data they collect. Or some combination of the above. Many of the agreements our institutions sign with these companies give explicit rights to use student data for things like “targeted marketing,” and opt-outs are complicated and Byzantine. I want to take a minute to look at some of the data practices of the best-known players in this space, and spend some time thinking about how we might choose differently when we choose technologies for ourselves (and what we ask others to opt into).
If you’ve ever had more than a fifteen minute conversation with me, you probably already know my thoughts on Turnitin; I have absolutely no chill: I loathe Turnitin. (I’m not the only one.) If you’re unfamiliar with this particular piece of tech, Turnitin is a plagiarism detection program — students opt (or in many jurisdictions, are required) to run their assignments through Turnitin to “prove” their academic integrity prior to submission. Philosophically, I think it’s a pretty flawed system: nothing says, “I value academic integrity!” like assuming students are acting in bad faith and so therefore handing their intellectual property over to a for-profit business. I know there are folks who use it and appreciate what it offers, but beyond my general frustration with the idea of Turnitin as a tool, its data practices need careful consideration.
Turnitin has access to wide swaths of student data in the form of essays and assignments, which they mine in order to be able to compare submissions to their database to assess whether student work has been copied. Their business model relies on receiving student intellectual property for free — students, of course, are not compensated for providing the content for their database — and has expanded to include a Revision Assistant tool for students that is also built from this massive amount of student data. Revision Assistant is, in essence, a machine-taught tool to improve writing based on the vast swaths of student writing Turnitin can analyze. Are students fully informed about where their data is going, in this context, and who is profiting from it? Increasingly, we’re seeing student groups advocate for more transparency in the use of Turnitin, and for opt-out policies to be made more explicit. Instructors can ask to have the work their students submit deleted from the Turnitin database… but they have to know to ask. Most don’t.
Turnitin has always downplayed the data mining they do, but it is the backbone of their ability to offer their service. It’s also what makes them attractive to venture capitalists. In March, Turnitin was acquired by a VC firm for $1.75B, which gives you a sense of what all that uncompensated student intellectual property and mined data is worth.
Polling software is just fun. It’s also really pedagogically useful — instructors can check in on student understanding of key points or collect questions to address later in class, and quick quizzes can give students a formative opportunity to self-check. This isn’t a new desire: “clickers” for polling were the first piece of cutting-edge classroom technology I ever got the opportunity to pilot, and that was back in 2003. TopHat is software that does a little more than polling, but that is its core functionality (it also can be used to monitor attendance, which is something else I have feelings about, but I’ll spare you them today).
You may have noticed in the comments to the last post that TopHat’s predatory business practices came up: they boast a free version for instructors to use, but once engaged with the software it can become difficult to tell what is a free or a paid service, preying on anxious and stressed students who may then pay when they don’t actually have to.
But TopHat also gets to acquire lots and lots of student data through its classroom resources, and like any private player in this space, it is loathed to disclose what it does with it. The CEO of TopHat likes to talk about how much data they have access to, and that they can drill down into it enough to analyze individual student study habits. That’s not “exciting,” that’s alarming. And Jason Rhinelander has done the work of reading through the End User License Agreement for TopHat, which includes gems like students cannot link to TopHat in an article critical of its use, students are responsible for any data breeches that occur, and they offer no opt-outs for the collection of personal data beyond opting out of the service altogether. Yikes. What position does that put a student in if an instructor decides to make its use mandatory?
Pearson really wants student data. Student data is the One Ring, and Pearson is Gollum. Me, I’m more like the Samwise Gamgee to your Frodo in this conversation, and we’re going to take little stroll to Mount Doom. Watch your fingers. Have I tortured this Lord of the Rings bit enough?
Pearson is a textbook company, sure. As we talked about last day, it’s also a creator of homework systems or courseware, a layer of learning tool that gets between instructors and students (and absorbs a massive amount of data at the same time). It also owns many of the major standardized testing suites and it builds entire online degree programs. In all honestly, Pearson could feature in every single Digital Detox post and we’d never cover all the content they manage and mine. And Pearson has exclusive contracts with universities and colleges all over the world, sometimes achieved without a competitive bid process, as Pearson uses the goodwill and name recognition it developed in the textbook space to move into the big business of student data. In 2012, Pearson executives boasted that they have more access to student data in K-12 than anyone in the world.
In the higher education space, Pearson is the biggest player, and they have some incredible access to student data, including everything from financial aid applications to interim and final grades. They say they don’t sell student data, but they also publicly refused to sign the Student Privacy Pledge. And last year, the inevitable happened: a data breech, exposing data from 13,000 institutions and one million college students. The attack occurred in November of 2018, but Pearson waited to inform the FBI until the following March, and end users were not notified until August. While Pearson asserts that the breech was “limited” to first and last name, date of birth, and email address — enough to do a fair amount of damage! — it impacted data collected as early as 2001. The roll out of the disclosure (and the disappearance of the statement from their website) suggests that the top priority in this instance wasn’t ever student data, but brand management.
I was so excited when I learned about Academic.edu in the first year of my professional life. I even sometimes remembered to update it! Pitched as the social network for academics, it’s a place where grad students and academics alike can maintain a profile, upload articles and conference papers, and search and follow work in related areas. And if that’s all it was, it would be a dream.
Unfortunately, like any other social media you engage with, the product is you and your data. In addition to their free service, they offer a paid premium package that acts a bit like a high school drag rag: click here to see which academics are taking about you! And predatory conferences and phoney journals troll the uploaded content on Academia.edu for people to pitch their wares to (sometimes with hilarious results: an editorial I once wrote called “The Unbearable Blind Spots of Comics Scholarship” netted me regular invitations to publish in “top” fake ophthalmology journals, because I guess bots don’t really do metaphor). Academia.edu is a for-profit company backed by venture capital, and the only thing it really has to sell is your data.
The saddest part of Academia.edu’s rise as a content repository is that most institutions now operate some kind of digital repository, often through the library, where faculty and graduate students can freely showcase their work without fear of how that work is being stripped, stored, and sold. It doesn’t have the social networking function, but it is another way of thinking about a low-effort solution for distributing research more widely.
And then what?
So what’s the moral here? I guess it’s that I think we all need to be more careful not only with our data, but with what we ask other people to do with their own data — particularly when there’s a power imbalance. Can you really opt-out of submitting your paper to Turnitin, or will your professor assume that means you’re guilty? If your doctoral supervisor is really pushing you to maximize your Academia.edu profile, are you in a position to say no? We need to have more information about what companies are doing with our data if we’re going to be able to make good decisions about their use. It’s increasingly difficult for me to suggest that anyone should trust in what a for-profit EdTech (or any tech!) company offers them. And yet, I acknowledge that it’s hard to disentangle ourselves from these systems.
I’m interested to know your thoughts. Here are today’s prompts:
- Did you learn anything today about how data is used that will change your own practice?
- What questions do you have about the tools you’re required to use for work or school? Does a tool being mandated change your perceptions of it?
- What do you do to protect your data?
And of course, please comment on these ideas or anything else that got you thinking today.