Essays and Texts

Modularity as an Aesthetic Category
by Lev Manovich

In its usual meaning, “modularity” refers to modules, i.e., standardized parts of units employed in design and architecture. Famous 20th-century examples of modular architecture include Buckminster Fuller’s designs of the 1930s, Plug-in City by Archigram (1964), Moshe Safdie’s Habitat 67, and Nakagin Capsule Tower by Kisho Kurokawa (1972)[1]. A more vernacular example of modular design today is IKEA furniture.

I first used the term “modularity” in my book The Language of New Media (2001). The book proposed five “principles” (or tendencies) of computational media; “modularity” was listed as one of them. I noticed that digital artifacts are often made up of multiple components. When combined together in one artifact, these components keep their individual identities and can be assembled into other artifacts.

The prominent examples of such digital modularity in the 1990s included hypertext literature, three-dimensional computer graphics (i.e., virtual worlds made from distinct objects that are themselves made from separate flat shapes), and still and moving images composed from multiple layers in programs such as Photoshop, Premiere, and After Effects[2]. Previously only selected media technologies such as movable type, Islamic mosaics, or stained glass windows in Gothic cathedrals had a modular structure. The adoption of computers made modularity a necessary condition for all digital media.

Combined with other technologies such as networks, digital modularity enabled the emergence of some truly revolutionary forms of new media and communication. One such example is the World Wide Web. Defined by HTML language proposed by Tim Berners-Lee in 1990, the web consists of numerous web pages. Each page is defined as a collection of distinct media objects and text blocks delineated by tags. These elements can be located anywhere on the web. When a web page includes links to remotely-located images, video, or maps, a web browser running on the user’s computing device retrieves them and renders the page that includes these media elements. Most people in the 1990s were accessing the web via slow dial-up modems, and you saw how the page was gradually constructed from its modular elements on your screen, often taking 30–60 seconds. The page text elements were rendered first, and then the images would arrive later. In this way, the modularity of the web was “performed” for you every time you went to a new web page.[3]

In this essay, I would like to extend the concept of modularity beyond digital media. This broader perspective should make this concept useful for artists working in any medium, as well as art historians and critics who write about these artists. To test the usefulness of this reworked concept, I will use it to look at a few projects created by the artist Anton Ginzburg during his 2021 residency at the Technical University of Dresden‘s Schaufler Lab and shown in the Altana Gallery exhibition curated by Gwendolin Kremer.

Parallel to broadening the concept's scope, I also want to define different types of modularity. First I will distinguish between internal modularity and content modularity. Internal (or technical) modularity refers to an artifact’s internal construction – for example, a web page created in HTML language. This internal modularity may contribute to or define an artifact’s aesthetics, meanings, and uses – or it may not.

Alternatively, we can ignore an artifact’s internal construction and instead focus on its appearance and our aesthetic, semantic, and affective experiences of this artifact. Let us imagine two images which are rendered using the same computer program. The first shows a few smooth shapes without any visible edges that blur into the background. In the second image, the same shapes are rendered differently with their geometry clearly delineated. To render these different versions, a designer only has to change the values of a couple of parameters in the code. Both images have internal modularity: they are generated by the code consisting of a series of function calls. However, if this modular code is not visible to a viewer, from their perspective the two images are quite different visually. If we are concerned not with the artifact's internal construction but with its appearance and user experience, we can use a different term to characterize such artifacts if they have clearly articulated visible parts: content modularity.

I will also introduce a second distinction between free and constrained modularity. If the modular parts of an artifact can be arbitrarily rearranged by its designer or its users – i.e., you can change their positions, add new parts, or remove some without seriously affecting the artifact’s functionality, appearance, or user experience, I will call this condition free modularity. But if only limited rearrangements are possible, I will call this condition constrained modularity.

How shall we refer to a condition that is the opposite of modularity? It is logical to call it continuity. To illustrate the opposition between modularity and continuity, let us consider two sets of well-known modern artworks. In the first group, we can place modernist paintings such as Electric Prisms by Sonia Delaunay (1914) and The Breakfast by Fernand Leger (1921), films and videos with strong montage aesthetics such as Battleship Potemkin by Sergei Eisenstein (1925) and Dumb by K-pop group Red Velvet (directed by Beomjin J, 2015), early computer plots such as the marvelous Untitled series by Vera Molnar (1972), and manga series such as Naruto (Masashi Kishimoto, 1999–). In the second group, we will put “sound mass” music compositions by György Ligeti, Pauline Oliveros, and Iannis Xenakis; the Water Pavilion by architects Lars Spuybroek and Kas Oosterhuis (1997), the novel Mrs Dalloway by Virginia Woolf (1925), the film D’Est by Chantal Akerman (1993), and computer animation Jardins d’Été by Quayola (2016)[4]. Note I could have used a variety of other works instead. The above examples were chosen to show that modular concepts can be applied to works in any medium, and also to both high and popular culture.

Despite their many differences, the works in the first group share one characteristic that distinguishes them from the works in the second group. They are all modular. The works in the second group are also different from each other, and they also have one common characteristic – continuity. The gradual movement of a sound “cloud” in sound mass compositions by Ligeti and Xenakis, the development of a spatial form in the Water Pavilion, the uninterrupted “data stream” of observing, thinking, and reacting throughout a single day in Mrs Dalloway, the extremely long panning shots of D’Est, and the persistent fluctuation of summer garden imagery in Jardins d’Été stand out because modularity is more common in culture than continuity. And precisely because modularity is commonplace, exploring and pushing to the limits its opposite – i.e., continuity – was often the strategy of avant-garde arts in the 20th century.

Since the 1990s, the combination of computation and the internet led to a number of other new media with their own forms of modularity. One of the most important of such forms is Instagram (2010–). To maintain their Instagram account, a creator adds separate images and/or short videos over time to either their page or to their “Stories” (introduced in 2016). On the page, images can be accompanied by a text caption and multiple hashtags. Instagram automatically captures the date and time when a post is made and displays this information in a special format: as the time separating this moment from today (e.g., “Two days ago,” “28 weeks ago,” etc.). All posts added by the creator are organized in this reverse chronological order on their page. This episodic modular structure is shown either as one column of posts, or as a grid of posts.

In your feed, where you see new posts by people you follow, a different modularity is at work. Before 2016, your feed was displaying the new posts of the users you follow in chronological order, but in March 2016 Instagram switched to algorithmic feed curation.[5] (Facebook had already switched to such algorithmic curation in 2009, and Twitter adopted it in 2016.) This creates even stronger modularity, since you see people’s posts out of chronological order.

Instagram and other social media platforms also have another type of modularity. Each platform limits the maximum length of text in posts. So even if you wanted to create a very long narrative in a single post, you cannot. In fact, the very concept of a “post” affirms the modularity of social media. Currently Twitter has a limit of 280 characters, while Instagram has 2200, and Facebook 63,206. However, the majority of posts are much shorter. Studies found that Facebook posts that contain fewer than 80 characters receive more engagement, while on Instagram the ideal caption length is around 150 characters. (Note that the maximum post length on Weibo is 2000 characters, and since every word in Chinese is either one or two characters, such a post can be very long.)

So far my examples have illustrated either modularity or continuity. However, the same medium or a single artwork can have both. For example, video game worlds can be modular (e.g., a game may have many “levels,” and you need to go through each level to get to the next one) or continuous. A good example of the latter are open-world games which include some of the most popular games of the last two decades such as the Grand Theft Auto series, World of Warcraft, the Assassin's Creed series, Minecraft, Ghost of Tsushima, Microsoft Flight Simulator, and so on. Grand Theft Auto (2001–) is seen as a pioneer of this paradigm. According to one of its designers:

Up until that point, with very few exceptions, the game world was passive, or at least offered only minimal interaction with the player such as falling blocks, rising spikes or, if you were very lucky, swinging ropes. GTA turned this on its head and made the player just another character in a whole world going about its daily business. The environment is not merely the setting for the action, but is an active part of the overall gameplay, which affects and reacts to the player as they progress. [6]

In some open-world games, the world is generated procedurally (i.e., algorithmically) and potentially it can be infinite. The most extreme example of this so far is No Man's Sky (2016). The developers generated over 18 quintillion planets including flora, fauna, and other features. The founder of the UK company that developed the game said that "if a new planet was discovered every second after the game comes out [...] it would take 584 billion years to visit every one just for a second.” [7]

Every cultural artifact is created within some historical context. It is perceived against the background of the common conventions for the given media and/or genre of the given period and culture. Because of this, both strong modularity and absence of modularity can be perceived by audiences as original, highly creative, or impossible to understand. In the 1920s, the average shot in feature films was two or three times longer than the average shots in films by Eisenstein and Vertov. This contributed to the perception of their films as avant-garde and unusual, but also hard to understand by the general audience. Examples of the opposite strategy are a small number of films which do not even have a single cut, such as Empire by Andy Warhol (1964), Wavelength by Michael Snow (1967 – it actually has a few cuts but they are not visible), or Russian Ark by Alexander Sokurov (2002). Both the films with very quick cuts and films without cuts stand out against the conventions of narrative cinema, and the particular range of modularity the audiences are used to.

For an example of an artist who skillfully uses both strong modularity and continuity in different bodies of work, we can look at sculptures by Alberto Giacometti. In some of his sculptures such as Forest (1950) a few very thin and elongated human figures are placed on a large pedestal. The large spaces between the figures are kept empty which makes these figures appear to be completely disconnected from each other. However, Giacometti also employs the opposite strategy – greatly reducing the amount of modularity where we would otherwise expect it. In many of his sculptures, a human head or a figure becomes almost undifferentiated mass, with face or body parts blending together.

Anton Ginzburg’s projects, created during his residency at the Technical University of Dresden, can be seen as imaginative comments on modularity and continuity in various artistic media such as cinema and painting. More importantly, they demonstrate how computational media techniques – sentiment analysis, supervised machine learning, and algorithmic transformations of real-world data (used in Film Forms, ML CRSH, and Airport respectively) – can create works that use modularity and continuity in original ways.

Film Forms adopts data science methods to analyze a number of classic 20th-century feature films. The results of the analysis are visualized as three-dimensional sculptural forms made from carbon polyamide. The project transforms the most canonical modern modular media – i.e., cinema – into perfectly continuous forms. In its original format, a feature film is modular on multiple levels: each second contains 24 discrete frames, each scene contains multiple shots, and the scenes make up larger narrative parts. The construction of a film from hundreds or thousands of shots is an example of how required technical modularity can become the content modularity. Film cameras could only hold a limited length of film stock, and this led to a modular shot-based construction of a film’s narrative and a specific language of cinema where camera position and/or location changes from shot to shot. For example, a conversation between characters is presented as a series of shots with the camera changing position, as opposed to one continuous take.

In Film Forms, twelve classic films spanning decades of 20th-century cinema, from The Passion of Joan of Arc (Carl Theodor Dreyer, 1928) to Trainspotting (Danny Boyle, 1996) are “summarized” as three-dimensional continuous complex shapes. Each film becomes a single three-dimensional gesture – a single uninterrupted movement in an imaginary metaverse containing all the films ever made. The advantage of such compression, which transforms 90 minutes of film narrative and cinematography into a single shape, is that we can now compare many films and other time-based works more easily. Given the quantitative growth of consumer video, vlogs on YouTube, and various genres of professional video such as K-pop music videos in the last 15 years, we need new mechanisms for observing this exploding media universe, and the particular method employed in Film Forms offers one such technique.

ML CRSH connects 20th-century cinema and video with a new media of the early 21st century – supervised machine learning using deep neural networks and large amounts of training data. The project makes reference to a 1975 video by Ant Farm, a San Francisco avant-garde collective, where we see a big American Cadillac car driven into a wall of TV sets. We can also recall the famous 1973 novel Crash by J. G. Ballard where a group of former crash victims in London become fetishists of cars and crashes, reenacting car accidents that involved celebrities. Here a normally singular, statistically-rare, and unpredictable event is turned into a systematically and rationally planned and arranged series. It is also very appropriate to recall Paul Virilio, who developed the idea that every technology creates its own accidents, from train derailments to global stock market crashes. In his 1999 book Politics of the Very Worst he wrote:

When you invent the ship, you also invent the shipwreck; when you invent the plane you also invent the plane crash; and when you invent electricity, you invent electrocution [...] Every technology carries its own negativity, which is invented at the same time as technical progress.[8]

There is also a third reference important for understanding ML CRSH. One of the most popular genres of video games is car racing games. As you play these games as the driver of a simulated car, racing against other cars controlled by the game’s algorithms, you repeatedly experience crashes. Many games replay these crashes in spectacular fashion, but then you miraculously continue competing in the race in the same undamaged car. This is one of the conventions of such games, which sacrifice physical realism in order to offer uninterrupted gameplay.
The final reference for this project is the currently-dominant methodology of supervised machine learning (often abbreviated as ML). To train a neural network to, for example, classify real-life objects that are likely to appear in a video simulating the viewpoint of a moving car, tens of thousands of hours of video footage with already-labeled objects are run through the network until it learns to detect and classify them automatically. This repetitive process is currently necessary to teach computers to classify new data. The particular research work that ML CRSH creatively uses is about how to teach a computer to take a three-dimensional geometric model simulating streets and objects in a real city like Dresden (an approach we often see in video games) and render it to look like a video of that city.

ML CRSH mixes all these references and technologies to create a new unique situation. While the goal of self-driving car systems is to avoid any crashes, in this project crashes happen repeatedly and predictably. Rather than showing the normal – or rather ideal – functioning of technology such as cars, the project takes as its subject the accidents which according to Virilio’s analysis are equally important for an understanding of every technology. The uninterrupted human experience of looking at a world – one without any cinematic cuts simulated by a first-person perspective in video games – here becomes modularized by the regular rhythm of crashes that restart the simulation.

The three other projects by Ginzburg created during his residency also all engage with modularity and its various uses in art and design. In Dresden Series, algorithmically-generated geometric compositions made from differently-sized rectangles become the starting points for gouache paintings. Modular Composition 10x10 is inspired by the Formstein modular system developed by two Dresden artists (Karl-Heinz Adler and Friedrich Kracht) in the 1960s for urban design. Finally, the ongoing Airport project proposes a grid of sculptural three-dimensional modules which will all move and morph in response to data coming from the trajectories, times, and locations of airplanes as well as air traffic control information. Altogether, Ginzburg’s works demonstrate to us how different types of modularity and the contrast between modality and continuity can act as powerful systems for new meanings, aesthetics, and artistic experiences. These systems emerge when different types of modularity are juxtaposed with each other, along with their media, historical, social, and technological contexts, leading to surprising tensions and new aesthetic experiences.

[1] Kate Wagner, “The Modularity is Here: A Modern History of Modular Mass Housing Schemes,” 99% Invisible, undated, URL: (last accessed 18 March 2022).

[2] For a detailed analysis of the effects of this new format on the aesthetics of visual media, see my book Software Takes Command (New York: Bloomsbury Academic, 2013).

[3] Today many countries deliberately slow down the speed of social media sites during a crisis or a war, as happened, for example, in March 2022 in Russia when it attacked Ukraine. During such periods, people in these countries have the 1990s experience of the web, because pages load very slowly.

[4] Quayola, “Jardins d’Été,” series of 4K videos, 2016, URL: (last accessed 18 March 2022).

[5] Mike Isaac, “Instagram May Change Your Feed, Personalizing It With an Algorithm,” New York Times, 15 March 2016, URL: (last accessed 18 March 2022).

[6] “The complete history of open-world games (part 2),” PC Zone, 25 May 2008, URL: (last accessed 18 March 2022).

[7] Ravi Hiranand, “18 quintillion planets: The video game that imagines an entire galaxy,” CNN, 18 June 2015, URL: (last accessed 18 March 2022).

[8] Paul Virilio, Politics of the Very Worst (New York: Semiotext(e), 1999), p. 89.

Loading ...

End of content

No more pages to load