Enter the story with Storysight

Pairing GenAI and spatial computing for a truly unique and immersive reading experience

# spatial-computing # artifical-intellegence

Project overview

Technologies

Generative AI
Virtual reality

Tools

Skybox AI
GPT-4
Xcode - RealityKit, SwiftUI

Key features

Fully immersive books
A new dimension for content consuption
AI assistance with understanding writing techniques and the story

Contributors

Why did we build Storysight?

Find out what was possible in combining two cutting-edge technologies in an innovative way (GenAI x spatial computing)
Creating a novel and new creative VR experience that enriched the reading experience, with huge potential for entertainment and education
Create a glimpse for what content consumption could look like in the near future with the recent advancements in machine learning and spatial computing

Methodology

Creating the AI Worlds

Most people know that GenAI can be used to generate images, but what about 360 degree panoramas? That's where Blockade Labs' Skybox AI comes in. It's a tool that can generate a 360 degree panorama of a scene from a prompt.

Looking to optimise for the best possible results, we used GPT-4 to condense the text from the chapter into a short but descriptive summary of the setting. We then fed this summary into Skybox AI to generate a skybox image of the setting.

Skybox AI allows for various parameters to be set such as the style, if you want the image to be realistic, or a water painting, or fantasy based, or even psychedelic!

Ensuring quality generative content

We do this after using GPT-4 to provide a descriptive summary of the setting. This allowed us to ensure that the Skybox image generated had the most detailed prompts possible, minimising loss of any key details from the scene in the book.

Skybox AI has tricks like placing parentheses around keywords to emphasis them, so we primed GPT to do this. Leveraging this kind of prompt engineering enabled us to ensure that the most important details were not lost in the generative process.

Placing the user in the content

In Xcode, I used RealityKit to create a sphere mesh to place the Skybox image onto. I then added the sphere to the VR scene with the user at the centre.

This was done by instantiating a `ModelComponent` with the sphere mesh being 1000m in radius (insane distance but it works!) and a `Material` with the Skybox image. I then added the `ModelComponent` to the scene and set the camera to the user's position. This ends up being what the user sees in app.

AI powered reading assistant

To add even more GenAI into the mix, I am using GPT-4 to provide a reading assistant. This was done by feeding the text of the chapter into GPT-4 and allowing the user to ask questions about the book.

The assistant avoids story spoilers and going off topic with a system prompt via the OpenAI API. This allows the user to ask questions like 'Give my a brief recap of events leading up to this chapter' or 'Provide details for each of the characters in this chapter' and get a response without revealing too much about the book as to give anything away.

This was done leveraging the "system" prompt in OpenAI's API to prime the model into its role as a reading assistant.

Tech stack

GPT-4

Condense the text from the chapter into a descriptive summary of the setting.

Skybox AI

Feed the description into Skybox AI to generate a Skybox image of the setting.

visionOS

Project the Skybox image onto a static sphere programmatically and place it in the scene with the user at the centre.

The prototype

Beyond books

Spatial journalism

Imagine being transported to the scene of a news article or blog post. You could be in the middle of a protest or at the scene of a natural disaster. This could be a new way to consume news.

This can give a new perspective on the news and allow the reader to feel more connected to the story. It could also be used to provide a more immersive experience for readers who are visually impaired.

Immersive documentaries

Documentaries could be taken to the next level with a Storysight-like spatial app. Picture being in the middle of a historical event or at the scene of a scientific discovery or standing among pre-historic dinosaurs.

Apple themselves have already begun to execute on an early version of this vision with their Prehistoric Planet Immersive show on Apple TV+.

This would offer a new dimension to the learning experience and could help people to better understand the subject matter especially for readers with learning difficulties such as dyslexia.

Try Storysight for yourself!

We have the full version of Storysight available on the App Store today. Jump into the story and see what it's like to be inside the book.