Using the Microsoft Seeing AI App

The Microsoft Seeing AI App allows users with visual impairments to listen to descriptions of the world around them. It is not a new thing, but I had not seen it until fairly recently and decided to have a look at what it could do.

Firstly some background – this application was born out of an internal project at Microsoft; around 2014 a data scientist who was having skype calls with an aging grandparent realised that the grandparent, losing their vision was unable to recognise him.  At that time, the success rate of computer vision image classification wasn’t great, but was getting better and better. The employee made a prototype recognition app with a phone to try and help identify nearby objects which was not massively successful, but advances over the next year in vision to language technology meant the idea became more feasible, and at the Microsoft 2015 Hackathon “Deep Vision” (the first name of the seeing AI App) was born. So think big, something that starts off as a “I wonder if..” can turn in to something massive.

At the time of writing, December 2020 the app is free, and only available on iOS, and as you may not have seen it before either, lets go through the channels with some short videos to demonstrate:

Short text

This allows us to quickly read text in the environment, with the app instantly reading out loud. It relies on the user moving the phone around, and works well for small amounts of words e.g. writing on envelopes, signs or labels for example:

Document

Whereas short text enables us to read a few words or a line instantly, this feature allows the user to read a page of text; the application can detect the document edges to ensure correct position as well as determine the formatting (it can understand the difference between a heading and normal text); when positioned optimally, the app takes a photo of the page (it basically does optical character recognition – OCR), processes it and reads out the contents.

Product

Seeing AI can scan a barcode, giving an indication of what a product is. Objects may be boxed, or come in other shapes of packaging and the barcode could be anywhere; when the user rotates the object it can indicate the proximity of a barcode is with a series of beeps, the faster the beep the closer the barcode, allowing the user to home in on the barcode ultimately then allowing app to do a scan and determine what the user is holding.

Person

The app can be used to give a description of people when a photo is taken, and can also be trained to instantly recognise familiar people. If a photo is taken of a person, Seeing AI will attempt to identify their age and expression. This is useful to the VI user on a number of levels, and also simply if taking a photo for social media the user can tell that the subjects are smiling etc.

Answers on a postcard as to whether the app has got the age right!

….but let this be proof that frowning makes you look old!

We can add photos of relatives and friends in to Seeing AI and give them a name, so when the app is being used the person channel will instantly identify them.

Currency

Despite GBP being present in the list, the app does not currently appear to recognise British currency. In the UK, notes are different sizes which helps a VI person with identification, but in the US, all notes are the same size and look very similar so this is useful in determining different values of US bills. Note as you can see, the currency doesn’t have to be legitimate to be recognised! British Pounds as well as other currencies are on the list within the app, so I expect these will work in due course.

Scene

This is very much an experimental feature, but the app will attempt to identify scenes based the artificial intelligence it has. The various attempts I had around the room it did really well on. Here is an example:

Colour

Fairly standard and not always totally accurate if a page is in shadow, but useful to allow the user to differentiate between colours nonetheless.

Handwriting

This is a tricky one, I challenge anyone, human or artificial to decipher my spidery handwriting! It did well to get it.

Light

This allows the user to detect how much light there is around them, the higher pitch the more light, going through to lower tones for low light

Photos from other apps

If using the camera roll, or social media apps, a VI user may want to know what is contained within that photo inside that particular app. To do this, select the photo in that app and press the button to share it. Rather than actually share it, choose to “Recognize with Seeing AI”; the Seeing AI app will process and state what it thinks the photo is. We can also choose to explore the photo, where the user can move their finger around the screen and be alerted when it moves over a particular object. This example is looking at something on a camera roll:

That is all for now, I think this app is so useful in so many different ways, the intelligence behind it and the features are improving all the time, and I look forward to seeing what future releases bring.

About the author