Enter the story with Storysight
Pairing GenAI and spatial computing for a truly unique and immersive reading experience
Project overview
Technologies
- Generative AI
- Virtual reality
Tools
- Skybox AI
- GPT-4
- Xcode - RealityKit, SwiftUI
Key features
- Fully immersive books
- A new dimension for content consuption
- AI assistance with understanding writing techniques and the story
Contributors
Why did we build Storysight?
- Find out what was possible in combining two cutting-edge technologies in an innovative way (GenAI x spatial computing)
- Creating a novel and new creative VR experience that enriched the reading experience, with huge potential for entertainment and education
- Create a glimpse for what content consumption could look like in the near future with the recent advancements in machine learning and spatial computing
Methodology
Creating the AI Worlds
Most people know that GenAI can be used to generate images, but what about 360 degree panoramas? That's where Blockade Labs' Skybox AI comes in. It's a tool that can generate a 360 degree panorama of a scene from a prompt.
Looking to optimise for the best possible results, we used GPT-4 to condense the text from the chapter into a short but descriptive summary of the setting. We then fed this summary into Skybox AI to generate a skybox image of the setting.
Skybox AI allows for various parameters to be set such as the style, if you want the image to be realistic, or a water painting, or fantasy based, or even psychedelic!
Ensuring quality generative content
We do this after using GPT-4 to provide a descriptive summary of the setting. This allowed us to ensure that the Skybox image generated had the most detailed prompts possible, minimising loss of any key details from the scene in the book.
Skybox AI has tricks like placing parentheses around keywords to emphasis them, so we primed GPT to do this. Leveraging this kind of prompt engineering enabled us to ensure that the most important details were not lost in the generative process.
Placing the user in the content
In Xcode, I used RealityKit to create a sphere mesh to place the Skybox image onto. I then added the sphere to the VR scene with the user at the centre.
This was done by instantiating a `ModelComponent` with the sphere mesh being 1000m in radius (insane distance but it works!) and a `Material` with the Skybox image. I then added the `ModelComponent` to the scene and set the camera to the user's position. This ends up being what the user sees in app.
AI powered reading assistant
To add even more GenAI into the mix, I am using GPT-4 to provide a reading assistant. This was done by feeding the text of the chapter into GPT-4 and allowing the user to ask questions about the book.
The assistant avoids story spoilers and going off topic with a system prompt via the OpenAI API. This allows the user to ask questions like 'Give my a brief recap of events leading up to this chapter' or 'Provide details for each of the characters in this chapter' and get a response without revealing too much about the book as to give anything away.
This was done leveraging the "system" prompt in OpenAI's API to prime the model into its role as a reading assistant.
Tech stack
GPT-4
Condense the text from the chapter into a descriptive summary of the setting.
Condense the text from the chapter into a descriptive summary of the setting.
Skybox AI
Feed the description into Skybox AI to generate a Skybox image of the setting.
Feed the description into Skybox AI to generate a Skybox image of the setting.
visionOS
Project the Skybox image onto a static sphere programmatically and place it in the scene with the user at the centre.
Project the Skybox image onto a static sphere programmatically and place it in the scene with the user at the centre.
The prototype
Beyond books
Spatial journalism
Imagine being transported to the scene of a news article or blog post. You could be in the middle of a protest or at the scene of a natural disaster. This could be a new way to consume news.
This can give a new perspective on the news and allow the reader to feel more connected to the story. It could also be used to provide a more immersive experience for readers who are visually impaired.
Immersive documentaries
Documentaries could be taken to the next level with a Storysight-like spatial app. Picture being in the middle of a historical event or at the scene of a scientific discovery or standing among pre-historic dinosaurs.
Apple themselves have already begun to execute on an early version of this vision with their Prehistoric Planet Immersive show on Apple TV+.
This would offer a new dimension to the learning experience and could help people to better understand the subject matter especially for readers with learning difficulties such as dyslexia.