Recommendation engines and the uniqueness of dislike

Twitter decided recently that it was going to change its fundamental paradigm and start putting content in your feed that it thinks you want to see, making it the last social site I personally use to try to guess what you want, instead of letting you tell it. It’s an obvious trend; it’s what Facebook has been doing with the News Feed for ages now. Google Plus refuses to let you hide your friends’ +1s, not to mention it’s still trying to find me more friends and teach me how to use it — which is to say, remind me about features it doesn’t think I’m using enough, like those aforementioned +1s.

What’s interesting to me is that even though this approach is so common, it’s still very hard to do well. This came up in a recent lunch conversation at work, unrelated to Twitter, about iTunes. My coworker wanted iTunes Discovery to play music that he didn’t have, but it didn’t seem to be able to do that. It either played music he already knew and liked, or music he didn’t like.

Even Amazon and Netflix, which are widely acknowledged to be relatively good recommendation engines, doing something relatively simple (recommending media), have trouble with edge cases. Another coworker shares a Netflix account with someone who enjoys “chick flick” movies, and watches them on Netflix, but the same person doesn’t like it when Netflix recommends that type of movie to her. Why not? After all, she likes them, so the recommendation engine is doing what it’s designed for. But it’s not doing what she actually wants, which is encouraging her to watch things she would like, but would otherwise have trouble finding. Chick flicks are easy to find. She wants help finding the difficult-to-find.

In the end, this is why automated recommendation systems fail: humans have preferences that are too diverse to code algorithmically, even with algorithms that learn from data. To Netflix’s recommendation system, “I watched this, and liked it” has only one meaning: show me more things like this! To a human, it can mean “I liked this, but I know it was kind of a waste of time to watch, and I’d really like to watch something more interesting next time”.

Take friend-suggestion as the simple case for social networks. Social networks are predicated on the idea that if we like Amy and Andrew, we probably also know and like their friends Bernard and Bailey. In general this is common; I do know and like many of my friends friends’, and if I haven’t met them, I’d like to, because there’s a good chance we’ll get along. But a good chance isn’t a certainty. If I like my friend Mitch because we both do linguistics, but his friend Chad likes him because they play basketball together, and I don’t like basketball, I might not like Chad that much either, even if he’s a fine guy. But social networks don’t have any way of coding that. All they know is that I like Mitch, so I might like Chad too, right? Or maybe after many meetings you just haven’t cottoned to a particular person in your larger circle. Facebook insists that you must know each other, so of course you want to be friends. Right?! But you don’t. You already assessed the situation in person, and decided that you don’t.

In the best-case situation, the social network lets you code that information in some way. On Facebook, click the X button and the suggestion is gone (forever? I’m not sure anymore). Or block the person, if you really don’t want to see anything from them. But while blocking and Xing convey some sort of information back to the system, it’s relatively coarse-grained. I just ignore Facebook’s friend suggestions at this point; like my friend and iTunes, it’s found me all it can find; the rest of its suggestions will never be useful, and I don’t care to X out all of them one by one.

Content is even harder to curate cleverly. Facebook has been trying it for years now with the News Feed, and although they’ve clearly had success in terms of engagement, it’s still an ongoing battle, the latest sally in which is reducing click-bait. Wait, didn’t we start out talking about content that people like, but don’t want to see more of? Oh, those silly humans, they just can’t stop themselves from sending mixed signals! Facebook is struggling with the same problem that my coworker’s friend finds in Netflix: a click currently can only code “I like this, it engages people, show me more of it.” Trying to make a click on clickbait not mean that, but a click on another kind of article keep its original meaning, is challenging to say the least.

I’m fascinated with learning algorithms (in case anyone reading this hasn’t noticed, I studied computational linguistics, and now work for a company that’s all about data) but if you spend much time at all working with them, you start to see their shortcomings very quickly. Humans are really remarkable creatures. Although we’re predictable in many ways, we also all have unique preferences that someone at Facebook, Netflix, or Twitter didn’t think to code in. Don’t want to see tweets from someone you love who passed away? Oops, someone at Twitter forgot to put that in as a criterion…wait, we don’t even have information on your relationship to this person or whether they’re alive? Oh crap. We forgot we aren’t Facebook. Wait, someone else does want to see that kind of tweet? Wait, what? Make up your minds!

While recommendation and curation systems are pretty darn cool as adjuncts to human judgment, intended to assist us in getting what we want, they’re not replacements for it. The data they collect is always incomplete, and their coding of it is always limited, and both of these are informed by their creators’ biases (does anyone remember Google Buzz? Yeah, those biases). Where these systems go wrong is where they assume they know better than the humans using them. There’s a big difference between adding a little box showing me people I might want to follow and insisting that I dismiss such a box before I can see my stream. There’s an even bigger difference between me getting to decide what’s in my feed and Twitter, Facebook, or Google Plus deciding at least some of it. Twitter has just crossed that line, and accordingly I expect it to become more noisy, less useful, and less pleasant to engage with over time, because however well it may think it knows me, its model of my preferences can never fully comprehend my complexity. That’s the uniqueness of dislike.

UI things.

I care a lot about design things. It’s part of caring about details, to me, and also caring about user experience. I’m not formally trained, but you don’t need two years of school to figure out that doors that say push, but have a pull handle, are confusing and annoying. (On the other hand, this is funny.)

Most of my Design Things experiences these days come about on the web. Some recent things:

Websites requesting usernames or emails where validation fails if there’s a single space character at the end of my entry. This just makes me livid. Email validation is hard (so I’m not quite so angry when you deny me a plus character, although it is in fact valid, BlueShieldofCaliforniaareyoulisteningtome?), but trimming trailing spaces? That’s easy. Why is there a trailing space character, you ask? Why do you care? Just trim the damn thing. But in case you just care because you actually care, it’s because when you do an autocorrect insert on a mobile phone, you often get an extra space.

Facebook videos. You think I’m about to say something mean, but I’m not. I am super impressed by Facebook’s video behavior. The video is muted by default, and stops playing when you scroll it off the screen. Someone thought about that one for a while. Good job, that person/people.

Google Calendar looks up locations in Maps (I think it’s in Maps — maybe also in your previous events?) when you start typing them in. It didn’t used to do this, and I wished it did, and now it does, and I absolutely freaking love it. I love it so, so much. It saves me so much tedious typing.

Flickr. I can’t even. Embedding a simple JPG is now practically impossible; the embed code is pretty much a little app. God, it is so terrible. Do they know that people use devices that don’t do those things? And I haven’t even used the new website enough to figure out how much I hate it, but I definitely hate it (although less than some people do? maybe?).

Two Google reader annoyances

Two things Google reader does wrong (in my opinion):

If two of your friends have shared the same post, it appears twice in your shared items.
If you follow a blog that one of your friends has shared a post from, you see both the blog post and the shared post.

I can see why there might be reasons for this, but on the face of it, this is just plain stupid, and even if it’s not plain stupid (e.g. if the comments are different on each post), there has to be a smarter way to handle this. I don’t want to read the same post twice. I have a hard enough time being patient enough to keep up with my feeds as it is.

Trust Google Maps

[Didn’t post this because I kept thinking I would add pictures, but it might as well be accessible while I fail to do so.]

A few weeks ago after work I was going to Merit Vegetarian Restaurant (548 Lawrence Expwy, Sunnyvale, CA). I asked Google Maps how to get there, since I haven’t been there before and I’m not very familiar with the roads in Sunnyvale south of my office.

It told me to take Maude to Fair Oaks. Maude is okay, but I wanted to take Arques, which is quieter and which I had to be on after Fair Oaks anyway, but Google was (sort of) correct: Arques is apparently not continuous, since two blocks of it serve as an offramp from Central Expwy. Which is, incidentally, a fantastic example of the way our road system is designed to favor cars. Arques would be an excellent through route for cyclists, except that it isn’t through because it’s repurposed as an offramp so that people can get to work faster because their roadway is limited-access. Fantastic.

I took Arques anyway; there’s less traffic than Maude and Fair Oaks. So I detoured by a block, managed to get into the left-turn lane on Fair Oaks to get back on Arques, and all was well. Until I got to Wolfe, crossed over, and discovered that there’s no bike lane on Arques for one block, presumably because of the way the road configuration at the intersection with Wolfe is set up. Fantastic, again.

When I finally got to Lawrence, I made another left turn and then discovered that Lawrence isn’t just an expressway, but a particularly sucky one. Unlike true limited-access, where there are only a few merge lanes at major intersections/exits, Lawrence in that section has a bunch of little roads that intersect it at “quasi-T” intersections: you can get off or on, but not cross the expressway. (Apparently Central in that same area has the same issue. Yuck.)

These are no fun for cyclists, to say the least. But luckily I wasn’t going far. I arrived at Merit with no further aggravations.

Leaving again, after a very good dinner of soup and tea, I recalled the quick search I had done earlier to find out how to get to Lawrence Caltrain (closer to the restaurant than my usual destination of (downtown) Sunnyvale Caltrain). Not surprisingly, Lawrence Caltrain is off of Lawrence, but the directions Google Maps gave me were strange, instructing me to do what looked like: exit right, make a U-turn, and go right back out to the original road. What? But both at the time I looked at it, and the time I left the restaurant, I was in a hurry and thought “Whatever. It can’t be that hard.”

As I left it started to rain, first lightly and then with increasing intensity. I got to the intersection of Lawrence and Kifer, which I recognized as the place to turn, but I saw a sign that I thought said to turn left for the train. That was wrong, it quickly transpired, but by that time I had already wasted precious moments waiting for a light to turn green for me (it didn’t), going down the wrong road, turning around, and coming back, and knew the train would have left without me.

Still, I wanted to find the darn place so that I could regroup and decide how to get home. So I went back the other way. I saw a sign that said “Caltrain Station San Zeno Way” but that didn’t tell me anything because I didn’t know where San Zeno Way was or how to get there. Little did I know that was actually the street that Google wanted me to turn on.

It turns out that what Google indicated in the first place was this:
At Kifer, exit right.
Go to the closest point where you can turn left legally, and make a U-turn.
Turn right on San Zeno Way, just before you arrive again at Lawrence.
Take San Zeno Way to the train tracks (a few blocks), and there you find the station.

Now, as it happens, as a cyclist there is something more clever you can do.
At Kifer, cross the intersection and stop in the pedestrian island.
Dismount your bike, cross the right-turn area, and walk around the little curve in the sidewalk.
Cross the next pedestrian crosswalk to the triangular island. On the other side of the island, get on your bike and start riding, heading in your original direction, but on San Zeno, not Lawrence.

What I ended up doing was giving up, taking Kifer back to Fair Oaks, and then California to downtown Sunnyvale (to get on California I had to run a non-sensored red arrow, so that was an adventure). There, I discovered the public restroom in the parking garage by the train was actually, miraculously, open.

And then, crazy person that I am, I decided to ride all the way home. Even knowing I would be completely soaked when I got back, and probably would only barely beat the next-hour train. Because the cool thing about cycling is that I am basically self-reliant when I do it, even in the dark and rain.

And it was dark and rainy, and people were driving crazily. I had someone turn left in front of me, blatantly, on purpose, when I had the right of way. People were going way too fast for the conditions. I was really glad when I got home. And much more inclined to trust Google Maps rather than my own opinions.

Privacy, Accessibility, and Notability

As a result of some long-ago and more recent conversations with smart friends of mine, I came up with some interesting thoughts about privacy.

I don’t fully understand the legal umbrella of privacy, but it seems to me that there are a few distinct concepts that it would be useful to introduce into quasi-legal/common-sense discussions of privacy, and potentially to the legal arena too, in the long run.

First, a brief rundown of the concepts, before we get into their interactions and complications.

Privacy. Things that are private are things that you do on private property not visible from a public space, or public spaces where you have “a reasonable expectation of privacy”, and that you don’t speak or publish about in publicly-accessible forums — or if you do, those forums are specifically unconnected to your “real identity”. Also, things are private which are defined by law to be private, but that’s less important here than the nontechnical definition.

Accessibility (or Ease of Access). Things that are accessible are things that are easy for the average person or user to find. This is not a great term because “accessible” also has a technical binary definition related to privacy: if information is not at all accessible, it is private. But bear with me for a while, and suggest a better word if you have one.

Notability. Things that are notable are things that a substantial percentage of people (in the whole population or some subgroup) is interested in knowing about.

Anonymity. Being out in public without being notable.

The complexities of online “privacy” often come up when something besides privacy is involved, namely accessibility or notability. In my old journal, I wrote an entry about Google Street View (and Facebook News Feed, to some extent) in which I used the terms “theoretical privacy” and “actual privacy” rather than using the word “accessibility”, although I did notice, on re-reading the comments, that I start to talk about information being “(easily) accessible”.

GSV and FNF are iconic examples of things that “raised privacy concerns” without actually doing anything to change whether information was private or not. All the information on GSV and FNF was always available (to anyone who set foot in a place, in the case of GSV, and to anyone who previously had access to the info, in the case of FNF). What they did do was make it incredibly easy to find things out that previously had required a lot of effort to find out: what a place looks like at ground level, and what your friends are doing on Facebook. So the information became accessible (in the sense defined above) where before it had been inaccessible.

Notability is implicated in most problems where accessibility becomes an issue. If information is not notable (no one is really interested in knowing it), it doesn’t matter if it is easily accessible or not: no one cares, either way. Dave sent me a link today (which spawned this whole thought process on my part) about a guy whose information suddenly became notable. The guy didn’t mind, but it gave him pause for thought, as I’m sure it would most of us.

In the FNF and GSV cases, nothing became differently notable, just differently (more easily) accessible. This is closer to a form of privacy loss, because it makes something notable easier to find, and if something notable is found, you have much easier access to it. BoingBoing readers had many things to say about it, some of them wondering if we need new laws, or a new area of law, to deal with accessibility of information, since it isn’t covered by traditional privacy law.

Personal conduct in public, combined with YouTube and other video-upload services, illustrates a different set of circumstances. Most of us who live in largish urban areas, most of the time we’re in public, are anonymous: out in public without anyone particularly caring who we are. We feel restricted in our activities by our visibility, but don’t need to worry very much about anyone caring what we’re up to, even if we’re eating cookies when we’re supposed to be on a diet, or smoking when we said we quit. The situation isn’t the same in smaller communities, of course. In small communities, it’s hard to be out in public without being known.

Even in larger communities, recording and uploading a person’s behavior to a video site like YouTube makes it more accessible, but doesn’t necessarily make it more notable (consider all the incredibly boring YouTube videos that no one watches). Likewise, a person’s behavior becoming an object of attention/controversy would make it more notable but not more accessible: you’d still have to actually find the person to see what they were doing. When you get the simultaneous combination of accessibility and notability, you get something like the recent BART shooting video + controversy or the Caltrain cyclist arrest. But another worrying situation is when something goes up earlier, and then later becomes notable (like the guy’s photos as linked above, or like Facebook photos of undergrads drinking which get them in trouble).

How do we live our lives in a world that is increasingly a participatory panopticon? How do we act in public? What do we publicize and what do we keep private when things could become far more accessible or notable in the future than we ever imagined?

Why does everyone love Gmail themes?

I hate them. Why does everyone talking about them on the internet seem to love them, except one guy who twittered that he hates them?

Oh, and someone who thinks the “older version” solves it. No it doesn’t; the older version doesn’t have chat!

Dear Google,

Please give me back my old Gmail (with chat, thanks), where every element blended nicely into every other, instead of my messages being white while my inbox border is blue, and my chat search box being white while the top and edge are blue (or whatever color). And where my chat windows had nicely coordinating icon colors for minimize/pop-out, and blinked a nicely contrasting, if kind of obnoxious, orange.

Your new “default” theme is not the same as the old Gmail and you know it.

And your new themes are almost entirely ugly, and most of them are impractical as well.

Don’t do this to me. Make a theme that really makes it the same as it was before. Please? Pretty please?

By the way, I hate the iGoogle themes too. Can I have my old iGoogle page back while you’re at it?

…Okay, except the Terminal theme is the geekiest, coolest annoying thing ever. You are forgiven. But give me back my normal Gmail anyway.

{City} Caltrain on Google

This is super cool.

Type in “{City} Caltrain” in Google Maps where X is a city name with a Caltrain station and you get a map with a pic of the station, indications of the time of next 6 trains, their type (Bullet, Limited, Local), and their final destination (SF, SJ, Tamien, or Gilroy). Pretty goshdarn awesome in a world that needs more awesomeness.

Some of the non-city names work too, like 22nd St or San Antonio, but others don’t, like California Ave and Tamien, which do bring up the station location and snapshot but only bring up bus and light rail info, respectively. Still awesome!