Seminar: Social Media, Data & Society

Social Media, Manipulation & Data Repurposing

Insights from Kane et al. (2025) and Parsons et al. (2026)

Objective: Explain how contemporary social media architectures and data repurposing practices enable misinformation and manipulation, and explore what kinds of design and governance choices could strengthen information integrity.

Guiding question

In a world of algorithmic feeds and massive data repurposing, who (or what) really shapes what we see and believe online, and what would it take to protect information integrity?

Why this topic now?

The information environment

Social media has become the primary way many people receive news, entertainment, and social interaction, effectively functioning as the media for large parts of society.

At the same time, massive amounts of digital traces are collected and repurposed for analytics and AI, often for purposes never imagined when the data were first created.

Why it matters

Misinformation and disinformation can spread quickly in algorithmically curated feeds, while high‑quality information may struggle to be seen.

Repurposed datasets and models based on social media traces can appear objective, yet embed biases that further erode trust.

Session roadmap

1. Core concepts: Information disorder and key data concepts.

2. Kane et al. (2025): How social media networks and algorithms have evolved.

3. Parsons et al. (2026): How data repurposing works and why it matters.

4. Synthesis: Platforms and data practices as a single system.

5. Discussion: What it would mean to design for information integrity.

Key concepts: Information disorder & data use

Information disorder

Misinformation is false or misleading content shared without intent to deceive, such as forwarding an inaccurate post that seems true.

Disinformation is deliberately false or manipulative content, strategically produced to deceive or to gain political or economic advantage.

Malinformation is true information that is used in misleading or harmful ways, for example by doxxing someone or selectively leaking information in order to incite harassment.

Information integrity refers to the overall health of an information environment, including accuracy, context, provenance, and resistance to systematic distortion.

Data use vs reuse vs repurposing

Original data use refers to data created for a specific, planned purpose with a schema designed to support that task.

Data reuse involves asking new questions of the same data without changing the underlying schema.

Data repurposing means adapting existing data for a new purpose that requires changing or augmenting its schema, for example by adding attributes or linking with other datasets.

Kane et al.: What’s different about social media networks?

Four defining features (still relevant):

Digital profiles are curated identities that act as focal points for interaction, and they are now moving toward programmable “interfaces”.

Relational ties are explicitly articulated connections (friends, followers) that are increasingly shaped by algorithms.

Search and privacy mechanisms provide powerful search and visibility controls that change who can see what and when.

Network transparency makes connections and histories visible, which allows both strategic navigation and surveillance.

How these features evolved since 2014:

Profiles are becoming more like APIs: users set conditions on how others and platforms can interact with them.

Less emphasis on maintaining static social graphs, more on orchestrating short‑lived interactions.

Search, privacy, and transparency are now entangled with opaque recommendation systems and data monetisation.

Implication for manipulation

These features make it easier to engineer who sees what and when, creating new levers for targeted influence.

From social graphs to algorithmic networks

Relational events, not just ties

Earlier social media focused on persistent relational states such as friends, connections and followers.

Today, fleeting relational events (views, likes, shares, short interactions) drive visibility and value.

Many interactions now occur between people who are not directly connected (e.g., TikTok’s “For You”).

Algorithmic amplification & engagement

Recommendation systems decide what most users see, often independent of their explicit network.

These algorithmic networks mean influence depends on algorithmic ranking, not just follower counts.

Platform goals (engagement, watch‑time, ad revenue) can prioritise emotionally charged, polarising, or misleading content.

High engagement ≠ high truth

Surveillance capitalism & hybrid human–bot ecosystems

Surveillance capitalism

Platforms commodify user behaviour (clicks, pauses, locations) to predict and influence future actions.

Behavioural targeting and A/B testing optimise experiences for engagement and conversion, not necessarily for truth or wellbeing.

Users are watched, but companies and states can also be “watched” by networked publics, creating bidirectional surveillance.

Bots, AI agents & synthetic influencers

Non‑human actors generate content, like, share and comment, in ways that are sometimes indistinguishable from human behaviour.

Coordinated bot networks can create artificial consensus, inflate popularity metrics, and amplify disinformation.

AI companions and synthetic influencers blur lines between authentic and artificial presence, raising new questions of trust and accountability.

Connection to manipulation & integrity

When economic incentives favour engagement and influence, and non‑human actors can scale messaging cheaply, the system as a whole can drift toward manipulation even without a single malicious “villain”.

Parsons et al.: Understanding data repurposing

Original data use

Data is intentionally designed, collected and stored to support known tasks (e.g., hospital billing, transaction processing).

Quality is judged by “fitness for purpose” for those predefined tasks.

Data reuse

New questions are asked of the same data without changing the schema (e.g., extra reports, new metrics).

The data still “fits” the original conceptual frame.

Data repurposing

Data is augmented and reshaped, and is often combined with other sources, in order to answer questions that were not anticipated when it was collected.

Central to contemporary analytics and AI, but also introduces new risks for bias, privacy and misuse.

Framework & use‑agnostic properties

Repurposing activities (Parsons et al.)

Feasibility assessment considers whether it is better to repurpose existing data or to collect new data.

Task (re)conceptualisation involves defining the ideal schema and then adjusting the question or the data as needed.

Data and task alignment requires acquiring data, exposing or understanding its context and transforming it to approximate the ideal.

Three use‑agnostic properties

Accessibility describes how easily data can be found, accessed and searched, taking into account technical and legal constraints.

Transparency concerns how clear it is what the data represent, and how they were collected, cleaned and transformed.

Elasticity is the degree to which data can be “stretched” into new schemas and tasks, depending on granularity, storage and governance.

Link to information integrity

High accessibility and elasticity without matching transparency and governance can make it very easy to build impressive‑looking analytics or AI systems that rest on misunderstood, biased, or ethically problematic data.

Citizen science & healthcare: what the cases reveal

Citizen science bird study

Original plan to collect new data on light pollution and bird health collapsed because of COVID‑19.

Researchers repurposed years of citizen‑reported bird sightings and combined them with noise and light pollution maps.

Alignment required handling different spatial resolutions and accepting some loss of detail at coarser levels.

Result: nuanced insights (some species harmed, some benefitting) and new policy‑relevant questions.

Florida hospital discharge data (AHCA)

Designed for public health planning and policy, but extensively repurposed for research and analytics.

Strengths: good documentation, stable identifiers for hospitals and physicians, relatively high elasticity.

Limitations: lack of stable patient identifiers (no long‑term trajectories) and paywalls that restrict who can repurpose the data.

Illustrates how design and governance decisions shape which questions can be answered and by whom.

Bringing it together: platforms, data & integrity

Joint picture from both readings:

Social media architectures (algorithmic feeds, bots, dark social) create powerful channels for shaping attention and belief.

Data repurposing pipelines transform those traces into dashboards, risk scores, and AI systems that influence decisions.

Without careful design and governance, this combination can normalise manipulation and erode trust in information more broadly.

Key tensions to keep in mind:

Engagement versus truth: business models reward clicks, while societies need reliable knowledge.

Openness versus privacy: open, repurposable data enable innovation but also raise consent and surveillance risks.

Innovation versus equity and the environment: advanced analytics benefit those with resources and carry environmental costs.

Conclusion & discussion

Three big takeaways

Social media platforms are infrastructures that can systematically shape attention and belief, not just passive channels.

Data repurposing is essential for modern analytics and AI, but also a major source of hidden risk if accessibility and elasticity outpace transparency and ethics.

Protecting information integrity requires aligning incentives, platform design and data governance, and cannot rely on fact‑checking individual posts alone.

Questions for the group

Who should bear the main responsibility for protecting information integrity, whether that is platforms, regulators, data scientists or users, and why?

Are there types of data or contexts where repurposing should not be allowed, even if it generates useful insights?

Which concrete design changes (in feeds, recommendation systems, or data governance) would most immediately reduce manipulation without destroying what people value about social media?