Projects

AI for Accessibility - Design Technologist

March 26, 2024
Presentation

Role: Design Technologist
Tech: React, OpenAI API, Cloudflare, Accessibility,
Year: 2023 - 2024

I have been exploring AI to solve real-world challenges, focusing on accessibility and user-centric solutions. Below are key projects where I've used cutting-edge AI technologies to enhance efficiency and inclusivity in digital environments.

A microchip featuring the letters "AI" and a stylized human figure at its center.



Transcription Tool Development

Tech: React, Whisper API, GPT-3.5
Background:

I identified a significant issue that needed resolution for my mother and her specific needs. She has been working as a meeting secretary, traditionally relying on pen and paper for note-taking. Her work involved creating detailed protocols from complex meetings with union representatives, necessitating accuracy and thoroughness.

Challenge:
The challenge arose when she attempted to transition to digital note-taking. This shift was cumbersome due to the inefficiency of existing dictation tools, which produced poorly structured and difficult-to-follow text.

Screenshot of a user interface with buttons for transcription and summarization, and a prompt to upload a file.

Solution:
In the summer of 2023, I delved into AI to develop a transcription tool using Whisper API and ChatGPT 3.5. The goal was to learn more about the potential of AI but also improve text structure and speaker identification. The prototype significantly streamlined her workflow, enhancing efficiency in note-taking and protocol development.

Impact:
The tool not only benefited my mother by reducing her post-meeting workload but also proved useful in user testing sessions, speeding up insight gathering and presentation preparation.

Assistant API Development

Tech: React, Assistant API, Vision API, GPT-4
Background:
Inspired by the release of GPT-4 and the new possibilities, I initiated the development of an Assistant API in the fall of 2023. Starting with a GitHub boilerplate template, I aimed to explore the capabilities of custom GPTs and their applications in enhancing content accessibility.

Challenge:
The challenge was to understand the Assistant API and fully utilize its potential. GPT-4 is a much more powerful model, and OpenAI introduced a code interpreter and the ability to analyze chunks of text and documents. This allowed for faster summarization and analysis of documents, PDFs, and Word files. During the project, I realized I could leverage this tool for accessibility. One of the biggest pain points in the accessibility community is the lack of alternative text on images. By combining this tool with the Vision API, Assistant API, and GPT-4, I ensured that content was more accessible and contextually relevant.

Solution:
The solution evolved from merely syncing APIs to create a Chrome extension where I could analyze images and suggest or generate alt-texts. This extension was designed to aid content editors in efficiently managing and enhancing the accessibility of web content.

Impact:
The tool has become a valuable tool for content editors, enabling them to identify images lacking alt-text and generate suitable descriptions. This development not only improves the accessibility of web content but also streamlines the content creation and review process. The tool's success has led to considerations for its broader application and potential scale-up using a corporate account with OpenAI.

Transforming Visual Content into Narrated Experiences with AI

Tech: Flutter, Dart, Vision API, GPT-4
Background:

During the winter of 2023 and early 2024, I saw the potential to create a real-time vision AI to help visually impaired individuals see the world through their phones. Using Flutter and the Vision API, I developed an application that enables users to understand their surroundings. The application was able to bridge the gap between visual content and users who need accessibility features.

Challenge:
The main hurdle was to convert visual information from images into descriptive text that could be easily understood and narrated, ensuring the content was accessible to a wider audience.

Solution:
The developed application captures images for analysis by OpenAI, which then translates the visual data into descriptive text. This text is further processed using ElevenLabs' speech synthesis, incorporating a voice clone for a personalized narration experience.

Impact:
This innovative approach demonstrates the potential of AI in enhancing digital accessibility, offering a novel solution that converts visual data into narrated text, thereby improving content accessibility for individuals with visual impairments. While still in progress, the prototype highlights the transformative possibilities AI holds for accessibility enhancements.

Update May 2024 - GPT-4o

With the introduction of GPT-4o and collaboration with "Be My Eyes," this new feature will be coming soon. It's exciting that I was experimenting with similar solutions just a few months before the release.