Gotta be a table
Jul. 11th, 2012 12:37 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
*drums fingers*
The more I think about the idea of a display-name/alias for canonical tags, the more I think: it's going to have to be a new table.
I'm pretty sure that it would be faster to store any given story's aliases in a single field of the Story table as a string of key-value pairs. But those pairs would need delimiters that would not ever show up in the aliases themselves and that... that's not something I really want to bet on, when it comes to fandom, names, and the evolution of pairing syntax.
It would also be more accident prone during the canonizing of the tags. (I'm starting to feel like I should capitalize that phrase. Like the Running of the Bulls or something.)
So. Separate table, nice and simple, with columns for story id, canon-tag id, and alias. I'm thinking story id should be the primary key. The most frequent use is going to be producing story-blurbs, and I'm guessing this sucker is going to have to be partitioned. So, partition by story id range and crank that into the one extra query this will create per story (or at least prepare for it; it might not be necessary yet).
Putting aliases in their own table will also make them far more easily searchable.
petronia pointed out that some fan cultures, for example the Chinese-language fans, will want and expect to be able to search for things like "Sasuke/Naruto" as distinct from "Naruto/Sasuke", and also search for things like "*/Sasuke" (that is, anyone/Sasuke). Putting a specific "search display names" option on the Advanced search, and putting the aliases into a table of their own with one alias per field, will make that a lot more feasible. It won't be perfect, because it will rely on authors to alias, but there should be some significant overlap between authors who will alias like that and authors writing the kind of fic members of that fan culture want to see.
Now the first step of the Canonizing of the Tags is a lot simpler. Relatively speaking. For each story, get the current tags; get the names and ids of those tags; write them to the alias table. That can run in the background as long as it takes, since it won't affect anything yet.
The new posting form has to be prepared next, so that it's ready to present canonicals and "talk" to the alias table. The new code to display story blurbs and pages will need to be done up, but that should be nice and straightforward. Incidentally, I quite like
busaikko's idea of putting Additional tags, possibly labeled Author Tags, under the summary to make them clearly separated from the menu tags and more clearly part of the author's own meta-information about their work. That would prepare the way to possibly show Reader Tags, and make them clearly distinct from anything the author zirself put on the story.
And now the canonization query should run smoothly, as each child-tag id is replaced with the id of its final parent tag, including the ids in the alias table. Which will, after the query runs, match up with the changed ids in the Story table. And without needing any dangerous, and slow, additions like "match any number that comes before X delimiter in this long string". *dusts hands*
Best of all, as
niqaeli points out, this can be considered an improvement in user control of their own content. Users would now be able to absolutely control what canonicals are associated with their stories, instead of being forced to leave that up to the wranglers. (Which must surely be less nerve-wracking for the wranglers too...). The user will still have control of exactly how all the text of their content appears and, since no user actually has control of how the navigation structure appears right now, no one will lose any control they had.
So there is an increase in user control, plus an improvement in searchability, a considerable improvement in stability and performance, and a huge improvement in the efficient use of wrangler time that might let them be more pro-active in populating new fandoms with suitable canonicals ready for author use. This should even, much as it disgusts me, let the OTW leadership avoid Step One from my previous post. So get cracking, people.
The more I think about the idea of a display-name/alias for canonical tags, the more I think: it's going to have to be a new table.
I'm pretty sure that it would be faster to store any given story's aliases in a single field of the Story table as a string of key-value pairs. But those pairs would need delimiters that would not ever show up in the aliases themselves and that... that's not something I really want to bet on, when it comes to fandom, names, and the evolution of pairing syntax.
It would also be more accident prone during the canonizing of the tags. (I'm starting to feel like I should capitalize that phrase. Like the Running of the Bulls or something.)
So. Separate table, nice and simple, with columns for story id, canon-tag id, and alias. I'm thinking story id should be the primary key. The most frequent use is going to be producing story-blurbs, and I'm guessing this sucker is going to have to be partitioned. So, partition by story id range and crank that into the one extra query this will create per story (or at least prepare for it; it might not be necessary yet).
Putting aliases in their own table will also make them far more easily searchable.
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Now the first step of the Canonizing of the Tags is a lot simpler. Relatively speaking. For each story, get the current tags; get the names and ids of those tags; write them to the alias table. That can run in the background as long as it takes, since it won't affect anything yet.
The new posting form has to be prepared next, so that it's ready to present canonicals and "talk" to the alias table. The new code to display story blurbs and pages will need to be done up, but that should be nice and straightforward. Incidentally, I quite like
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
And now the canonization query should run smoothly, as each child-tag id is replaced with the id of its final parent tag, including the ids in the alias table. Which will, after the query runs, match up with the changed ids in the Story table. And without needing any dangerous, and slow, additions like "match any number that comes before X delimiter in this long string". *dusts hands*
Best of all, as
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
So there is an increase in user control, plus an improvement in searchability, a considerable improvement in stability and performance, and a huge improvement in the efficient use of wrangler time that might let them be more pro-active in populating new fandoms with suitable canonicals ready for author use. This should even, much as it disgusts me, let the OTW leadership avoid Step One from my previous post. So get cracking, people.
no subject
Date: 2012-07-11 06:27 pm (UTC)That wasn't me. (I forget who it was, tho...)
no subject
Date: 2012-07-11 06:30 pm (UTC)no subject
Date: 2012-07-11 09:10 pm (UTC)Especially when you take that combined with the fact that would mean actually disengaging control over the users and their tagging (because, frankly, wrangling as it currently exists is actually a bizarre and rather controlling functionality with 'anyone can do whatever with their tags! \o/!' merely a facade layered over it), means I think it will be very difficult to get the leadership to even agree that this is the way forward.
Because I do find it far more controlling and insidious and matronising to insist that oh no, we will figure out for the users what they actually meant and we'll do it behind the scenes where they can't even see us make the soup, than it is to say: here's the tags you can use, use or don't use what you want, and hey you can rename them if the formatting don't suit you. It's a very subtle sort of control, but it's definitely a form of control and power over the users.
And that sort of thing is rather the hallmark of the OTW's culture... :/
no subject
Date: 2012-07-11 09:18 pm (UTC)no subject
Date: 2012-07-12 12:42 am (UTC)-Silv
no subject
Date: 2012-07-12 12:49 am (UTC)no subject
Date: 2012-07-12 01:56 am (UTC)no subject
Date: 2012-07-12 01:59 am (UTC)no subject
Date: 2012-07-12 01:56 am (UTC)And it strikes me afresh that the current wrangling ethic is to just respond to the tags users make to tag their stories with, but if a tag like that was primarily a navigation utility, you wouldn't expect any given user to tag their story with it even if they wanted it (and could figure out that they wanted it without access to the tag tree).
Of course, the idea might be totally crazy, and even if it's not, it wouldn't really address the stuff
no subject
Date: 2012-07-12 02:07 am (UTC)What might be useful, and do the same thing without changing the tree, is to make a couple differently formed search forms. Including one for "input/select character name" which would then search for that character in all the relationship tags. I mean, that's possible already, on the advanced search form, but I honestly think a lot of people just do not understand about searching different fields. Especially users who don't post, and never see the post-form. So a couple of pre-set options, where you just have to enter the name and hit "search" without wondering which field is which could help a lot.
*darkly* Especially since the standard archive skin does not show any distinction between tag fields. God, that is just /such/ bad flattening of information. And then they wonder why no one knows how the hell to use the advanced search form or form useful boolean searches.
no subject
Date: 2012-07-12 03:10 am (UTC)Granted, this assumes that the filters are a) working and b) able to grab all the info. (setting aside the fact that they way the were working was uber!clunky and slow)
I don't know what the new filter design is going to look like and if it will still do this - have to wait and see.
sidenote of tag wrangling: Did someone suggest using 'Character/Any' as a metatag? I don't remember - if not, I *think* that discussion is still on-going and you can toss that idea into the ring. It could get a little 'heavy' with the subtags, but it could work. The issue, so far, is figuring out what the 'general consensus' is for that tag type.
no subject
Date: 2012-07-12 02:15 am (UTC)I am not a Chinese-speaking fan, but I have many times wished for this option! "Find [Name] in the pairing field" searches would be SO AWESOME.
(I blame this on my acquisition of a large, 16(soon to be 20)-main-cast-member fandom! Authors tend to list everyone who has a speaking role, however minor, in the "characters" list, so when I search for "Eridan Ampora" I turn up a whole lot of fics where he's not part of the pairing/s at all, and often doesn't even appear except in a flashback in chapter five or something. So frustrating! And I bet it's something that most large-cast fandoms run into.)
no subject
Date: 2012-07-12 02:22 am (UTC)Oh, hey, maybe that's what else the alias table would be good for! A fourth column for primary or secondary, and it would be retrieved all in the same query as any aliases. So it could be a multipurpose "tag metadata" table.
no subject
Date: 2012-07-12 03:12 am (UTC)All the ideas are logged and tracked by the Support team. It helps the OTW/AO3 determine which features users want implemented.
If you're not comfortable submitting it - for any reason - let me know and I can send it to them if you'd like. =)
no subject
Date: 2012-07-12 04:05 am (UTC)First off, instead of 'canonical', I'd call them 'main' or 'formal' because 'canonical' carries a connotation of "this is what canon says is proper" and then we're back in the 1x2 and 2x1 and 1x2x1 and 2x1x2 pairing war crap. Let's not go there. It was a scary place, in case you've forgotten.
For canonicals, you need a pattern. Because it's an English-language db, we went with alphabetical, regardless of story content -- and excepting one-name characters (or characters who are never given a surname), it's always alphabetical.
There was one other type I took to calling wildcard-tags, which are where an author wants to indicate part but not all of a pairing. There's character/unknown, which means a pairing between two named (canonical) characters but the second is a surprise (aka the Frank+??? tag); there's also character/various (to indicate menages with the named character being the focus and/or story focus). Last, there's character/oc, for when the pairing is with a non-story character. Obviously. But those three have covered every single instance I've had to deal with, and with Mikkeneko alone, she certainly hit every possible pairing and then some. I can only imagine an entire series of fandoms of clone Mikkenekos. (Basically: design for making Mikke happy, and you'd have the perfect db for every possible wacky permutation EVER.)
Once writers/readers are used to the fact that formal tags are neutral -- meaning they make no statements in any direction about the story's pairing status (in terms of who tops or leads or whatever), people relax mightily about it. Any specifics are for keywords, or one-time use tags, or author's notes preceding the chapter or story. Given the choice when presented as such, all my authors chose "let people find my story" over "let me tag in any which way as I prefer" when the latter is presented as "and then only two out of ten people will ever actually find the story to read it".
Second, I'd say you're actually talking about two tables, not one. The first table is merely a lookup: the table's own index, then the ID for the fandom, then the text for the pairing or character. It'd be a massive table but simple for the db to handle, because it's not complex, and it's indexed. Just rows upon rows that designate character or pairing to fandom, such that if I search for "Elric" after selecting the "FMA" drop-down, it'll only give me stories with a member of the Elric family, rather than throw in various fanfic for Moorcock's series. Or I can get both, if no fandom is identified.
The second table is equally simple: an index, the story ID, and the ID of the formal tag. Row upon row upon row, of any/all characters and pairings attached as formal tags. Again, the simplicity of the table means the joins are easier, because that's really all you're tracking. If you wanted to identify primary/secondary, even tertiary, then add a third integer column of 0 for primary, 1 for secondary, 2 for tertiary, and so on. That way, if an author just tags and doesn't bother with who's primary and not, the automatic default is that all formal tags are primary. (That might be the one place tag wranglers would come in handy, to address stories with 27 primaries listed.) It means you could also limit the number of primaries, secondaries, etc, for those random authors who want to list every blooming character ever as tags. For whatever reason.
The wrench in this is one I don't see mentioned here, so either it's not been raised or I've missed it: cross-fandom tagging. Ie, Edward Elric paired with, idk, Pikachu. So, come to think of it, I'd add two more wildcard combinations. Here's how I'd work it, as long as I'm meandering.
1. Any given fandom is probably going to be starting off (at build) with a certain number of common pairings. It's also going to have a certain number of named, known, common characters. By that, I mean that only the oldest and best-established fandoms are likely to have expanded to the point of having entire stories about characters with only a surname who only show up in episode 17. Take those common pairings, and let the rest be character/other to indicate "character pairing not listed in the Common Formal Pairings for this fandom". Used consistently, readers will learn that "character/other" means "look in the keywords to see it's Jill Protagonist plus Random Gal From Ep17". Set a threshold for how many times a fandom uses character/other (say, 20 instances), and that's when you send in tag-wranglers to compare the keywords. If an uncommon pairing has become popular, elevate it to formal and edit accordingly. Otherwise, it remains less common, and "character/other" is for people seeking wierd combinations.
2. For cross-fandom pairings, I'd set up the posting page to require character/pairing tags for EACH fandom. IOW, there are the genre/warning/etc tags that work across the story (apply regardless of source). Then, for each fandom selected, the author can select characters from that specific fandom who appear. Again, consistency is key, for readers to learn the patterns, that "character/external" means "paired with an ex-canon character". So in a crossover between Escaflowne and Moretsu Pirates, "Chiaki/external" and "Hitomi/external" gives you a hint (and again, keywords to specify). But it means that if I want stories about Chiaki, the crossover would show up, instead of be left out.
As for searching, I personally dislike when the search is split into simple versus the link-to-advanced. I much prefer the ux of search + drop-down to narrow results, much like LJ/DW do on their backend pages. I'd even break it into several options, maybe like this (totally off the top of my head), where it's checkboxes and radio buttons -- I can find a patterny example if you want, think I have one around here in my collection. Basically, there's a yes/no for "in current fandom" (if the reader is on a page with identified fandom, this becomes an option to apply search to where you are now) versus "any fandom", and then a series of options, all radio (separate from the first two):
-- character unpaired
-- paired with named character
-- paired with unknown
-- paired with other
-- paired with various
-- paired with external
-- paired with OC
Awkward there, but you get the idea. Which probably wouldn't make much difference in a teeny fandom like, say, Natsume Yuujinchou, but could make all the difference in the world in a mad-crazy complex fandom like FMA or Naruto or GWing. Point is, putting it in a dropdown within the search function is a pattern most users have gotten used to by now, so I'd use it here, to let people narrow down (if they so choose). Hell, make those checkboxes, just in case the person wants several different options. (Wait, has AO3 figured out the "and" function yet, on its searches? Hmm. Some of this might be too hard for them, then... but I'll attempt to refrain from snark about bananas and innertubes.)
The last thing before I subside, without pointing fingers, just stating my case: to call such a developer-driven line-in-the-sand 'controlling' or 'matronizing' (hello! check yourself! since when did 'being like a mom' become a loaded negative term!) is sort of like... Well, it's like building a house and getting upset because the architect and construction team all insist on putting studs every sixteen inches on center, for load-bearing walls. And the house owners just think this doesn't fit their 'style' at all, and is so!restrictive! and are all upset at this, oh, controlling attitude on the part of the people who are TRYING TO MAKE SURE THE GODDAMN ROOF DOES NOT FALL IN.
I mean, seriously, for crying out loud.
no subject
Date: 2012-07-12 04:31 am (UTC)But on the topic! I have not yet been crazed enough to do an installation of the archive just to see the database schema, so I have no idea where, on the scale of "sensible" to "rat's nest" it falls. I /hope/ that the tags table looks about like the first one you describe, but I'm a lot less sure about the second. It certainly looks like a sound suggestion, though (supposing anyone involved knows how to /do/ joins).
*thinks* I /believe/ that Character/OC is already a canonical/formal (that'll take a while to shift since "canonical" is an established usage) tag in the tree. I don't know about the "unknown", though, and that could certainly be a useful one.
(God, do not get me started on the search forms. Just... argh.)
*shrugs a shoulder* Not speaking for anyone else, I rather like the adaptation of "matronizing" from "patronizing" as a description of gender-inflected assumption of authority, with the contemporary connotation that the assumption is unjustified. There's a big difference between "patronizing" and "fatherly", and I don't think assuming a similar one between "matronizing" and "motherly" is a stretch. As far as reasonable dev-behavior, there's "dev doing zir job" and then there's "control freak who cannot let /anyone else/ do their jobs". I'd say there are too many higher-ups in the otw structure who are acting like the latter.
no subject
Date: 2012-07-12 04:53 am (UTC)I get what you're saying about 'matronizing'... it just read kind of strange. Maybe because 'patronizing' has the sense of 'condescending' (as in, I know better than you) while 'matronizing' made me think of motherly nagging. Which isn't really an "I know better" so much as a "I might not know anything at all but I'm still going to get on your case about it, over and over and over, until you agree". Heh. Which is cultural gender-bias in that interpretation, admittedly.
As for the controlling behavior... it's a bizarre little subculture that's grown up, hasn't it? Seems to me that the original or stated intention -- you can do what you want! with your database! -- has caused a kneejerk kind of reaction amongst the higherups. Like, in trying to allow too much chaos in one area, or trying to let people interact without set boundaries, they ended up going in the total opposite direction to counteract that chaos. Sort of acts like an entire object lesson in how firm boundaries and consistent patterns are required in many things, including interpersonal relationships, archive management, and database tables. Let one area go totally crazy with lack o' boundaries and lack o' consistency, and you end up overcompensating elsewhere. So I guess the way I see it is that if the archive and its structure were more, well, structured, then the management could relax, because things would be clearer. They wouldn't be locking down out of some unspoken emotional need for consistency in the face of so much ambiguity.
The problem is (above and beyond the problem of what rules to set) is breaking out of that existing rut of too much one way, not enough another way, to find the balance. Especially since that means reducing a fuckload of drama about the imbalance and the inconsistencies and the control, and that kind of imbalance tends to attract people who are imbalanced themselves -- which just creates a feedback loop between the systems (the archive itself) and the people running/using it.
Which is to say: ultimately, adding or revising tables in the database isn't a simple act of just adding/revising tables. It's something that could, and probably would, have a far-reaching effect on the overall archival-cultural system as a whole, because the database structure is endemic to the entire system's culture and mindset. And thus, my dear, even a seemingly minor change could have ripples far beyond whether we call it 'character/???' or 'character/unknown'.
no subject
Date: 2012-07-12 05:02 am (UTC)Which is really a shame, because the "???" is so much more evocative. I wonder if using entities instead would work...
if the archive and its structure were more, well, structured, then the management could relax, because things would be clearer
*nodnod* I do think that would help a lot. And the same in other areas of the organization, too. There seems to be so much people are trying to do "by hand", as it were, that should be automated while critical decisions get "I don't know, what do /you/ think" to death, so many places where bylaws should exist and don't but there are umpteen gajillion meetings being held over the tiniest details that the volunteers could really be trusted with. I think you're right on the money about the control having gotten into the wrong places!
no subject
Date: 2012-07-12 05:15 am (UTC)(learned that as well working with using a symbol instead of a slash -- I think we used a square bullet -- as a compromise between / and x and + and the various fandom connotations for each. Unfortunately, within a few months, half the bullets weren't square but were random bizarre nonsense strings, and I was writing entire functions just to find and change them back after any save. grrrrr.)
no subject
Date: 2012-07-12 05:23 am (UTC)