technology Uncategorized

Fixing Skype’s “eye contact” problem

My extended family lives all over the country—from Vermont to Pennsylvania to Colorado to Oregon to Hawaii. That makes visiting relatives expensive—and, with a nine-month-old, nearly impossible. Video chat—services like Skype and FaceTime—are a godsend. Our daughter learns that she has a wider family who loves her; aunts, uncles, and grandparents can witness her cuteness first-hand.

Although we’re grateful that Skype exists, it’s still a poor substitute for sitting face-to-face. Physical touch can’t be digitized, and the stuttery, low-res video feed filters out key non-verbal communication.

But Skype feels impersonal for another, less obvious reason: there’s no eye contact. Because the front-facing camera is mounted above the screen, you can’t look at the camera and at your loved ones at the same time. Instead, you look past them, never quite meeting their gaze.

That’s a tricky technological problem to solve. One approach would be to place the camera behind the screen—laying the remote video feed on top of the image sensor. Back in 2007, Apple filed a patent application for a device that worked this way. It’s a clever idea that never became an actual product.

Apple’s recent acquisition of FaceShift got me thinking; could you solve the “look me in the eyes” problem with software instead of hardware? FaceShift enables “markerless” motion capture, in which a user’s facial movements drive a cartoon avatar’s live performance. The demos showcase everyday people, transformed into mutant warriors, killer clowns, and pug puppies—all in real-time.

That’s a fun parlor trick, but what if this tech were deployed more subtly, to achieve a more profound goal: meeting your loved one’s gaze? The software would still alter your image in real-time, but instead of dressing you up like Shrek, it would shift your perceived eyeline away from the screen and into the camera. FaceShift would detect your eye color, paint over your “real” eye with white, then redraw your pupil and iris so that they seemed to look at the camera. Imagine a video chat app that included the “fun” options (making you a pirate or a zombie), but whose default mode made just this slight tweak.

If the feature were implemented well,[1] users might not even notice that their own video feed was being altered. They’d simply sense a deeper, more personal connection to the person with whom they’re chatting. Ironically, the image would be artificially manipulated, but the conversation would feel more authentic.[2]

Pair some silly name (“FaceTime Presence” or “Skype Gaze”) with some schmaltzy copy (“Look into each’s others eyes, from anywhere”), and you’ve got a very marketable feature.

  1. Done poorly, this feature could prove hilarious—or terrifying. What would Grandma think if Little Susie suddenly became an eyeless demon child?  ↩

  2. This raises all sorts of philosophical questions. What makes a person’s representation “real”? Can an image be less accurate, yet more authentic? Is it problematic to “fix” an avatar that purportedly represents the “real you”? What if video chat software could remove your blemishes or take off ten pounds? Would you enable it?  ↩

technology Uncategorized

The podcast machine

If your favorite podcast is hosted by more than one person in more than one place, Skype probably makes that possible. Microsoft’s cross-platform communications service makes it easy to conduct high-fidelity conversations with people around the country or the globe.

But Skype doesn’t make recording those conversations easy. Right now, to capture Skype audio, podcast hosts rely on third-party Skype plugins and utilities. This software often proves tricky to use, and—because it’s not built by Microsoft—doesn’t integrate reliably when Skype gets upgraded. Plus, on sandboxed mobile platforms like iOS, third-party apps can’t access Skype’s audio output—that chains every podcaster to a traditional PC or Mac.

Microsoft could eliminate these janky add-ons altogether by adding record functionality to Skype itself. Users would no longer have to futz around with output channels, alternate mic inputs, or finicky background processed. And if the recording feature made its way to Skype’s mobile apps, phone and tablet users could join in on podcast sessions, too.

Even this dead-simple record feature would be welcome, but Microsoft could also do much more. For example, consider the “double-ender.” Right now, many podcasters eschew recording the Skype conversation itself, which may be riddled with compression artifacts or occasional audio drop-outs. Instead, each of the hosts records her own voice locally, then sends this isolated track to the show’s editor. That editor then re-assembles the component files into the final product and publishes the podcast episode. That’s a time-consuming workflow for amateur podcasters to follow week after week—and for many non-technical users, it’s almost impossibly complex.

What if Skype handled the double-ender automatically? For each podcast conversation, Skype could initiate local recordings on every machine on the call. Once recording stops, Skype would automatically upload each host’s track to a shared folder in the cloud (integration with Microsoft OneDrive makes sense here). Or, if Microsoft were really ambitious, Skype could sync up the tracks and spit out a multi-track file, ready for refinement and final publishing.

For those podcasters who don’t need meticulous control over the final edit, Skype could automate still more of this process. Imagine if Microsoft did the heavy auto-editing in the cloud, compressing, EQing, and leveling each track. Skype could tack on the podcasters’ preferred intro music and drop the final MP3 file into the podcast’s RSS feed. Suddenly, podcasting would be as simple as starting a Skype call.

I’d argue that this sort of all-in-one podcasting solution, done right, could at least break even, if not earn an outright profit. Microsoft could require Skype Premium (its paid option) for access to the podcast-friendly features. And OneDrive could serve as the podcast-hosting backend—again, for those users willing to pay. More importantly, such features would attract those creative, high-income users that Microsoft so covets.

So why doesn’t Microsoft go after the podcast market? Why haven’t they leveraged Skype’s central role in podcasting? I can think of at least two explanations. First, Microsoft might be worried about liability. Many countries have strict laws about recording phone conversations. In many US states, for example, all involved parties must be clearly notified when a call will be recorded—think of the automated disclaimer that you hear on most customer service hotlines. If Microsoft fears legal repercussions, Skype could implement a similar “auto-alert” function—notifying every user who joins a recorded call.[1] Or Skype could surface a consent dialog to every user before any recording could start.

There’s another potential reason Microsoft hasn’t augmented Skype for podcasters: maybe it thinks the market is too small to bother with. Compared with Skype’s vast user base, podcasting represents an infinitesimally tiny niche. But that’s a chicken-and-egg problem—which came first, the user-friendly tools or the customer base? Currently, technological hurdles make it difficult for non-technical users to leap into podcast production. Skype is uniquely positioned to lower that bar, grow the medium, and stake its claim at the center of the podcast universe.

  1. Skype for Business, Microsoft’s enterprise communication solution, already includes this feature.  ↩