48
submitted 4 months ago* (last edited 4 months ago) by danielquinn@lemmy.ca to c/startrek@startrek.website

It would seem that I have far too much time on my hands. After the post about a Star Trek "test", I started wondering if there could be any data to back it up and... well here we go:

Those Old Scientists

Name Total Lines Percentage of Lines
KIRK 8257 32.89
SPOCK 3985 15.87
MCCOY 2334 9.3
SCOTT 912 3.63
SULU 634 2.53
UHURA 575 2.29
CHEKOV 417 1.66

The Next Generation

Name Total Lines Percentage of Lines
PICARD 11175 20.16
RIKER 6453 11.64
DATA 5599 10.1
LAFORGE 3843 6.93
WORF 3402 6.14
TROI 2992 5.4
CRUSHER 2833 5.11
WESLEY 1285 2.32

Deep Space Nine

Name Total Lines Percentage of Lines
SISKO 8073 13.0
KIRA 5112 8.23
BASHIR 4836 7.79
O'BRIEN 4540 7.31
ODO 4509 7.26
QUARK 4331 6.98
DAX 3559 5.73
WORF 1976 3.18
JAKE 1434 2.31
GARAK 1420 2.29
NOG 1247 2.01
ROM 1172 1.89
DUKAT 1091 1.76
EZRI 953 1.53

Voyager

Name Total Lines Percentage of Lines
JANEWAY 10238 17.7
CHAKOTAY 5066 8.76
EMH 4823 8.34
PARIS 4416 7.63
TUVOK 3993 6.9
KIM 3801 6.57
TORRES 3733 6.45
SEVEN 3527 6.1
NEELIX 2887 4.99
KES 1189 2.06

Enterprise

Name Total Lines Percentage of Lines
ARCHER 6959 24.52
T'POL 3715 13.09
TUCKER 3610 12.72
REED 2083 7.34
PHLOX 1621 5.71
HOSHI 1313 4.63
TRAVIS 1087 3.83
SHRAN 358 1.26

Discovery

Important Note: As the source material is incomplete for Discovery, the following table only includes line counts from seasons 1 and 4 along with a single episode of season 2.

Name Total Lines Percentage of Lines
BURNHAM 2162 22.92
SARU 773 8.2
BOOK 586 6.21
STAMETS 513 5.44
TILLY 488 5.17
LORCA 471 4.99
TARKA 313 3.32
TYLER 300 3.18
GEORGIOU 279 2.96
CULBER 267 2.83
RILLAK 205 2.17
DETMER 186 1.97
OWOSEKUN 169 1.79
ADIRA 154 1.63
COMPUTER 152 1.61
ZORA 151 1.6
VANCE 101 1.07
CORNWELL 101 1.07
SAREK 100 1.06
T'RINA 96 1.02

If anyone is interested, here's the (rather hurried, don't judge me) Python used:

#!/usr/bin/env python

#
# This script assumes that you've already downloaded all the episode lines from
# the fantastic chakoteya.net:
#
# wget --accept=html,htm --relative --wait=2 --include-directories=/STDisco17/ http://www.chakoteya.net/STDisco17/episodes.html -m
# wget --accept=html,htm --relative --wait=2 --include-directories=/Enterprise/ http://www.chakoteya.net/Enterprise/episodes.htm -m
# wget --accept=html,htm --relative --wait=2 --include-directories=/Voyager/ http://www.chakoteya.net/Voyager/episode_listing.htm -m
# wget --accept=html,htm --relative --wait=2 --include-directories=/DS9/ http://www.chakoteya.net/DS9/episodes.htm -m
# wget --accept=html,htm --relative --wait=2 --include-directories=/NextGen/ http://www.chakoteya.net/NextGen/episodes.htm -m
# wget --accept=html,htm --relative --wait=2 --include-directories=/StarTrek/ http://www.chakoteya.net/StarTrek/episodes.htm -m
#
# Then you'll probably have to convert the following files to UTF-8 as they
# differ from the rest:
#
# * Voyager/709.htm
# * Voyager/515.htm
# * Voyager/416.htm
# * Enterprise/41.htm
#

import re
from collections import defaultdict
from pathlib import Path

EPISODE_REGEX = re.compile(r"^\d+\.html?$")
LINE_REGEX = re.compile(r"^(?P<name>[A-Z']+): ")

EPISODES = Path("www.chakoteya.net")
DISCO = EPISODES / "STDisco17"
ENT = EPISODES / "Enterprise"
TNG = EPISODES / "NextGen"
TOS = EPISODES / "StarTrek"
DS9 = EPISODES / "DS9"
VOY = EPISODES / "Voyager"

NAMES = {
    TOS.name: "Those Old Scientists",
    TNG.name: "The Next Generation",
    DS9.name: "Deep Space Nine",
    VOY.name: "Voyager",
    ENT.name: "Enterprise",
    DISCO.name: "Discovery",
}


class CharacterLines:
    def __init__(self, path: Path) -> None:
        self.path = path
        self.line_count = defaultdict(int)

    def collect(self) -> None:
        for episode in self.path.glob("*.htm*"):
            if EPISODE_REGEX.match(episode.name):
                for line in episode.read_text().split("\n"):
                    if m := LINE_REGEX.match(line):
                        self.line_count[m.group("name")] += 1

    @property
    def as_tablular_data(self) -> tuple[tuple[str, int, float], ...]:
        total = sum(self.line_count.values())
        r = []
        for k, v in self.line_count.items():
            percentage = round(v * 100 / total, 2)
            if percentage > 1:
                r.append((str(k), v, percentage))
        return tuple(reversed(sorted(r, key=lambda _: _[2])))

    def render(self) -> None:
        print(f"\n\n# {NAMES[self.path.name]}\n")
        print("| Name             | Total Lines | Percentage of Lines |")
        print("| ---------------- | :---------: | ------------------: |")
        for character, total, pct in self.as_tablular_data:
            print(f"| {character:16} | {total:11} | {pct:19} |")


if __name__ == "__main__":
    for series in (TOS, TNG, DS9, VOY, ENT, DISCO):
        counter = CharacterLines(series)
        counter.collect()
        counter.render()
top 13 comments
sorted by: hot top controversial new old
[-] Corgana@startrek.website 21 points 4 months ago

Fascinating stuff I love that you did this. I'm surprised Morn didn't rank higher considering how chatty he is in every scene.

[-] ericjmorey@discuss.online 5 points 4 months ago

Number of lines vs number of words spoken vs length of time speaking probably would have a lot of variation in results.

[-] deegeese@sopuli.xyz 9 points 4 months ago

Thanks for sharing. I notice chakoteya.net has TOS scripts. Is there any reason they weren’t included in the analysis?

[-] danielquinn@lemmy.ca 12 points 4 months ago

Honestly, it's 'cause I forgot to include it! I'll see if I can add it tonight. Check back in 24hrs :-)

[-] deegeese@sopuli.xyz 4 points 4 months ago* (last edited 4 months ago)

Thanks for the update.

Poor Chekov has almost no lines, but Koenig was great as Bester on B5.

[-] milkisklim@lemm.ee 8 points 4 months ago* (last edited 4 months ago)

This is really cool stuff! Thanks for posting the code!

This definitely goes to show why people felt Discovery was the Micheal Burnham show. Not that she had an unusual number of lines but that no one else spoke even half as much as her, with all of the other percentages of lines broken up by more characters than the other series.

Also does GEORGIOU count for both prime and mirror versions of the character?

[-] danielquinn@lemmy.ca 6 points 4 months ago

That was my takeaway as well. I just wish I had data for the other seasons. It'd be interesting to see how that might change the percentages as they are.

As for GEOGIOU, I'm reasonably sure that this refers to both versions of her.

[-] rob_t_firefly@lemmy.world 1 points 4 months ago* (last edited 4 months ago)

As the prime version of Georgiou's lines basically amounted to "Hi!" "Oh crap!" "Bye!" the overall math shouldn't be too affected.

[-] exocrinous@startrek.website 0 points 4 months ago

Georgiou also got fridged for Michael's character development. And then we follow Michael over the timeskip. Right out the gate, the universe exists to tell a story about Michael.

[-] ValueSubtracted@startrek.website 8 points 4 months ago

Wow, Tarka was a chatty sonofagun.

[-] Indy@startrek.website 7 points 4 months ago

This is beautiful! I love data and I'm delighted you were inspired by my post to gather the data.

Thank you for doing this!

[-] usernamefactory@lemmy.ca 5 points 4 months ago

Fascinating! It would be illuminating to see this broken up by season as well. Seven of Nine's relatively low ratio, for instance, can definitely be attributed to her late arrival to the series. In the latter seasons, I suspect her percentage could be rivalling Janeway's.

Conversely, it's impressive Lorca ranks as highly as he does, given he was gone by the end of Disco season one. But since he was simultaneously captain and antagonist while he was around, I guess it isn't that surprising.

[-] clay_pidgin@sh.itjust.works 3 points 4 months ago

Maybe the two Dax hosts on DS9 should be combined, as they didn't overlap.

this post was submitted on 03 Jul 2024
48 points (98.0% liked)

Star Trek

10570 readers
49 users here now

r/startrek: The Next Generation

Star Trek news and discussion. No slash fic...

Maybe a little slash fic.


New to Star Trek and wondering where to start?


Rules

1 Be constructiveAll posts/comments must be thoughtful and balanced.


2 Be welcomingIt is important that everyone from newbies to OG Trekkers feel welcome, no matter their gender, sexual orientation, religion or race.


3 Be truthfulAll posts/comments must be factually accurate and verifiable. We are not a place for gossip, rumors, or manipulative or misleading content.


4 Be niceIf a polite way cannot be found to phrase what it is you want to say, don't say anything at all. Insulting or disparaging remarks about any human being are expressly not allowed.


5 SpoilersUtilize the spoiler system for any and all spoilers relating to the most recently-aired episodes, as well as previews for upcoming episodes. There is no formal spoiler protection for episodes/films after they have been available for approximately one week.


6 Keep on-topicAll submissions must be directly about the Star Trek franchise (the shows, movies, books etc.). Off-topic discussions are welcome at c/quarks.


7 MetaQuestions and concerns about moderator actions should be brought forward via DM.


Upcoming Episodes

Date Episode Title
10-31 LD 5x03 "The Best Exotic Nanite Hotel"
11-07 LD 5x04 "A Farewell to Farms"
11-14 LD 5x05 "Star Base 80?"
11-21 LD 5x06 "Of Gods and Angels"
11-28 LD 5x07 "Fully Dilated"

Episode Discussion Archive


In Production

Strange New Worlds (2025)

Section 31 (2025-01-24)

Starfleet Academy (TBA)

In Development

Untitled comedy series


Wondering where to stream a series? Check here.


Allied Discord Server


founded 1 year ago
MODERATORS