Infinite Scribble: proposal of an endless AI-powered collage sandbox game

In my previous post, I discussed some of the advantages of integrating real-world systems into game design. I believe that many of the advancements of AI technology open up exciting new avenues in this area. Specifically, natural language processing and generative AI make it much easier to create games that involve and respond to complex text-based user inputs, allowing for a much more direct connection between the user's thoughts and real-world experience and the game.

Many emerging features of AI image generators lend themselves well to creating content for games.
Models that support LoRA are capable of generating characters that maintain visual consistency across multiple images, and many models can generate images with transparent backgrounds. However, AI image generation consumes a lot of time and energy, and I feel that AI image generators are currently being used to produce a lot of throwaway content that doesn't really get put to an interesting use.

For the past week or so, I've been brainstorming some game ideas that could take advantage of AI image generation, while addressing some of the issues with the ways they have been implemented up until now. Eventually, I came up with a concept for a sandbox scene creation game, which I'm tentatively calling Infinite Scribble. What follows is a brief overview of my overall vision for the game.

Infinite Scribble: Design Overview

Infinite Scribble is a web-based sandbox game centered around creating collages of AI-generated images. Players issue natural-language commands, which are processed by an LLM and used to create AI-generated images. Any game elements that players create this way are uploaded to a shared community resource pool, and can then be reused by other players. The core design targets I want this game to hit are as follows:
  1. Allow players to realize and express their creativity by creating new characters and poses using natural-language prompts
  2. Foster a sense of community by placing emphasis on sharing and repurposing of resources created by other players
  3. Allow players to quickly and easily create complex scenes involving interactions between many characters without incurring the high time and energy cost associated with generating the entire scene as a single image

Game Structure

The game is built around entities, which possess states.

Entities

The most important game objects are entities. Entities consist of the following components:
  • Summary
  • Detailed description
The summary is a brief bit of text that helps the AI decide when it is appropriate to call the entity based on the user's prompt.

The description is a longer block of text that describes, in detail, what the entity looks like. The purpose of the description is to create a design for the AI that remains consistent across multiple images. This description but does not include information such as pose and expression, since those will be handled by the state.

States

Every entity has states. Every state has the following components:
  • Summary
  • Detailed description
  • Sprite
The summary is a brief bit of text that helps the AI decide when it is appropriate to call the state based on the user's prompt.

The description is a longer block of text that describes, in detail, what the entity looks like during this state. This is appended to the description of the entity in order to form the prompt that generates the sprites.

Each entity has a single "sprite" associated with each of their states, which is a static AI-generated image generated based on a combination of their entity description and state description. The sprites have transparent backgrounds, allowing them to overlap with each other in scenes.

Playing the Game

The main play area consists of a blank canvas and a command window. Entities appear on the canvas area. Entities can be selected, moved around the canvas, resized, rotated, deleted, and so on, similar to an image editing program.

The command window accepts prompts in natural language. The user has two commands they can execute via the command window: "add entity" or "set state." The "add entity" option will add a new entity to the scene based on the prompt in a default "idle" state, while the "set state" option will change the state of a selected entity based on the prompt.

Creating and Reusing Resources

When a user wants to add a new entity to a scene or set an entity's state, they have three options, which they can select using a setting in the command window. They can:
  1. Have the LLM create a completely new element
  2. Reuse an element that has been created by another user and uploaded to the community resource pool
  3. Have the LLM decide whether to reuse another user's element or create an entirely new element based on how similar the prompt is to the overviews of other users' entities (recommended option)
If the user decides they want to define their own objects, or if the LLM determines that it is necessary for the user to define a new object, the LLM will respond with messages asking for the user to describe the new object in greater detail, in order to fill out the object description and summary.

Whenever a user creates a new entity or state for an entity, they may choose to either "accept" the results and add it to their collage, or "reject" it. If the new resource is "accepted," it will be uploaded to a shared community resource pool, and be available for use by other users.

Community Voting

Whenever a user retrieves an entity or state that another player has generated, they can vote that object up or down. Objects with a higher score are more likely to be served to players by the algorithm when they are looking for objects to add to their scene.

Players may choose to have their usernames attached to objects they create, or to remain anonymous.

Potential Applications

The goal of this game is to offer a large-scale community-based sandbox, and I imagine that there are many directions in which players could take this game. Here are a few of the cool things that I expect could emerge from this system:
  • Game developers could create UI mockups or preliminary visualizations of sprites and other visuals
  • Animators could create storyboards and animatics by duplicating scenes and modifying the states of the entities between them
  • Online tabletop roleplaying groups could create character dioramas on the fly during sessions
  • The community could come together to create original characters and collaborate on creating new states for them, creating a reusable, recognizable "memetic character language" unique to this game

Next Steps for Realizing This Vision

From what I can tell, all of the individual technologies needed to create this game already exist out there. It's just a matter of figuring out how to get them all to communicate with each other, and how to optimize them in order to produce the best play experience.

Right now, I think that the first priority is to investigate tools for 2D character-based image creation, and assess whether they are applicable to this game concept. Ideally, such a tool would be able to create consistent character images solely from text prompts. For the purpose of this game, I think that it is less important that the AI is able to create a character design that matches with the user's vision to a high degree of accuracy, and more important that the AI is just able to create something that it can maintain consistently across multiple images.

Comments