Shedding More Light on the Amazon Controversy

Debbie says:

I was tired and outraged last night, and I didn’t say that the removal of Women En Large and Familiar Men, along with thousands of other titles with erotic or GLBT content from amazon.com’s indirect search functions was unlike their overall corporate policies and could have been something that was done too them.

Thanks to one of our commenters, I now see that it could have been a hack. I’m not qualified to judge the tech in this link, but a friend who is says she finds the post at the link highly plausible in some ways and dubious in others. If it is true, the motive was “let’s watch everyone scurry around and get incensed,” rather than “let’s keep those filthy gays out of children’s eyes.”

In any event, even if Amazon is eventually shown to have been hacked, I’m not about to let Amazon off the hook for leaving itself open for malicious users to do this so easily. And I note that it has not yet been fixed.

User content is one of the best things about Amazon. However, letting users flag content as “inappropriate” for the whole customer base, rather than for themselves, is probably a mistake. It’s certainly a mistake to let a few flags like that control the entire search apparatus for a gigantic operation. Perhaps more to the point, where did that customer service email come from?

Like it or not, understand it or not (and I’m working hard to understand it), the big-picture high-profile Internet prank is here to stay for a long time. The big commercial sites need to take the real world into account, try to think like the pranksters, and, most important, make public statements if they get caught out. I’d feel a lot better about Amazon if the issue was fixed by now, and if their front page, or their front books page, had a public statement describing the issue and repudiating the pranksters.

8 thoughts on “Shedding More Light on the Amazon Controversy

  1. Jonquil’s LiveJournal has some good links, and the comments including something about the credibility of Brutal Honesty, I believe.

    I am, myself, highly dubious that this is a hack. You’d have to be supersophisticated to get past Amazon’s security and you’d have to know an awful lot about how Amazon’s site search and ranking functions work to hack them. Ranking algorithms in source code are not something you can figure out or change on the fly. I just don’t believe that anti-queer/anti-sex people would try something this difficult and illegal with no gain to themselves. It seems much more likely to me that it’s a very bad business decision OR an internal crank at Amazon OR badly tested code changes.

  2. Flagging systems are hard– really hard. I’ve worked on them in the past, and part of my job was to examine the system and figure out how someone with malicious intent might game it. Most content flagging systems that I’ve seen are relatively easy to reverse engineer if you have a bit of time and ingenuity, and the inclination to do so.

    It’s really hard to build a system that actually works to remove “bad” content while being difficult to manipulate maliciously. It’s usually the case that either the flagging system is completely ineffective, or it’s easy to game. The best systems I’m aware of hold flagged content for human review rather than blindly removing it, but that can be an expensive proposition.

    The hack story seems reasonably plausible to me. I haven’t carefully examined the script that was posted, but just thinking through the problem makes it seem like it wouldn’t be terribly hard to implement something like this.

    Having said that, I don’t really know how Amazon’s system works and whether the hack story is true. I just know that, as someone who has worked on this sort of system, it passes the smell test.

  3. I read the speculation only after I’d posted my earlier comment – my bad because I made a completely incorrect assumption of what “hacking” meant. I was debating how anyone could get into Amazon’s source code control system from the outside to change their algorithms.

    Yes, gaming the system by having a large number of people flag or rate content could swing certain things, but what about the part where the GLBT and sex-related books got eliminated from sales ranking? It sounds as though Amazon has to have done something internally to make THAT change happen. Did no one notice the impact on GLBT and other books?

  4. I can easily imagine having designed a rating system that way. If I understand the descriptions correctly, removing the sales rank could be an easy way to get something invisible by removing it both from searches and from bestseller lists.

    In other words, it’s possible that in Amazon’s systems removing the sales rank is an easy way to make something invisible. If so, that may be the mechanism that their flagging system used to pull things that crossed a flagging threshold. So removing the sales rank may be the tool to remove it from lists and searches, rather than an end unto itself.

    I’m merely speculating here, because I’ve not seen the internals of Amazon’s systems, but that all of that seems technically plausible. If an Amazon engineer told me that’s how it all worked, I’d have no reason to disbelieve them.

  5. Debbie – turns out – it was a really stupid error on the part of a non-English speaking person.

    See more at Galley Cat:
    http://tinyurl.com/dey9t3

    I work with data all the time and yeah, something like this is REALLY easy to do if you mis-write a query or mis-tag something in the database.

    I was incensed at first, then realized after reading more that it could have been a data/metadata/catalog error – and sure enough, that’s what it was.

  6. Thanks to everybody for their information and input.

    Maria, I had just seen some of those reports about the error within the last couple of hours. I do want to say, however, that “really stupid errors” can be done both by English-speaking and non-English-speaking people. I’ve certainly made my share. I think that’s what you’re saying in the rest of your comment; I just want to be crystal clear.

  7. I was incensed at first, then realized after reading more that it could have been a data/metadata/catalog error – and sure enough, that’s what it was.

    But the technical error revealed a significant problem with the underlying classification. Either mainstream publishers are gaming the system, so that (het) porn stars’ biographies, Playboy centerfolds, etc., aren’t characterized as “explicit”, or there is some kind of more rigorous or isolating filtering system for gay/lesbian/bi/trans/queer/intersexed content, even if it’s not explicitly sexual.

    That’s messed up.

    Also, books about sexuality by disability activists, by feminists, and by glbtqi folks, were de-ranked in the “glitch”, while much more explicit books were not. Again, either the publishers of the non-de-ranked books were gaming the system by supplying sanitized metadata, or there is a problem with the current structure of the filtering system.

    There is no reason why “The Joy of Gay Sex” is “adult” while “The Joy of Sex” is not. Other than homophobia, that is.

Join the Conversation

This site uses Akismet to reduce spam. Learn how your comment data is processed.