The Comparative Meme Analyser

09 Nov

Just to give you a short update on the Meme Analyser. One of the most often requested features was support to compare two memes in a single graph to get a simple overview on whats happening with two memes along time. Good news: I just implemented this feature. In the latest version of the Meme Analyser you can choose to opt-in for a second meme that will be plotted into the same graph. You may want to use this as an analog to the standard Google Fights. Or you can use it to see how to two memes that are running for the same topic try to outrun the other one. If you want to run the analyser please make sure that your browser allows JavaScript.

What’s next on my wishlist for the tool:

  • Finally start harvesting live data so new memes can be analysed as well
  • Do a zoom-function that only analyses a specific range of dates as opposed to the whole data.
  • Optimize the performance: As the tool is written in Python, implements a huge MySQL-database and uses the Bottle-Webframework it is slow as hell (yup, i know the problem but no way around it right now). If you have ideas how to optimize the setup: Please let me know!
  • The Next Meme Analyser-Version

    08 Oct


    What you can see above is a result of the latest version of the meme analyser. I’ve changed the API that produces the images. The old version used the Google Chart API which basically works by embedding a plain image using the img-tag. This works fine and is easily done. But on the down side up to now the y-axis had no information about the specific dates and not much information (like the total amount of memes for a given day) could be seen by using this approach.

    The new version uses the Google Visualisation API which makes use of Javascript. As you can test with the chart above the new version does not only draw the y-axis correctly but also allows to get the date and the amount of memes for every day by a simple mouse-over. In this blogpost the chart is fixed to a width of 600 px and shows only a limited amount of data. The meme-analyser itself draws charts with a width of 1200 px and shows the whole data-range. So enjoy the new version of the analyser.

    Missing Trends: Memetic Populations

    05 Oct

    I’ve found the time for some more memetics-stuff. What you can see in this image is the total amount of data for each day from 07/02/2010 to 09/18/2010. On average the total amount of memes is 31131 (±4360). I’ve also done this for the total range of data that was available and could not find any major differences and no clear trend can be seen. Instead this looks like a typical graph for populations at the limit: Populations grow larger than the environment is stable for, decline a bit, grow again and so on.

    Although it’s nice to see that those sum of hashtags on twitter seem to follow the basics of population growth this does not help on the theory that the attention span is a factor that distinguishes the r-strategists from the K-strategists. The graph contains #blumenkuebel, one of those maybe-r-strategist-memes implemented. But up to now I can‘t see any connection between the total meme population for a given day and those r-strategist-memes.

    Up to now I‘m not convinced that there is no real connection between them. Instead the quality of the raw data and the analysis seems not good enough. First of all the analysis on a daily base is not good enough to see the growth of hashtags. Especially those r-strategists grow really fast and are in the overall population for only days or hours. This still needs some fixing for an analysis of hours.

    But a better analysis is not really possible with the raw data given. Remember: Each user is counted only once per day and per hashtag. This means that the total amount of the meme population is biased. This bias seems to be important if one looks at single hours: Memes that start really fast and reach a lot of users in a short time span will have a peak that lasts for one hour while all those tweets that are made by users afterwards are missing.

    So the best raw data should be all tags from all users without any filtering. If you got access to a data source that provides this: Please contact me.

    I can haz graphs plz?

    25 Sep

    The last blogpost contained an introduction into memetics, some biological background and even some math. I tried to give a brief introduction into the field and give a first insight into the actual data. So it was fully not the way memes should act and most of you gave me the “tl;dr”-looks. So i tried to optimise this approach of graph-making a bit and decided to give you the chance to produce your own graphs. So, experience the twitter-meme-analyser!

    The analyser uses the same data I have tested before and creates some graphs out of it using the Google Chart-API. But i fixed the scaling of the data. So now each graph shows you every single day from 06/24/2009 to 09/18/2010. Just enter a hashtag you’d like to have sketched down and enter the maximum value for the y-axis (this scaling can be useful to compare memes). If there’s much load you may need to be patient. Searching 10.5 million entries in a database may take a second.

    While toying around you find memes that are clearly r- or K-strategists? Or you find memes that show selection theory can not be applied to memes? Please let me know!

    Image: Word & Image

    A Memetic Selection Theory? On Twitter?

    22 Sep

    An Introduction to Memetics

    The theory of memes as discrete units of cultural selection was opened to a wider audience by the evolutionary biologist Richard Dawkins in 1976. His book “The Selfish Gene” focused on genes as the primary motor of evolution and the basic unit that selection acts on. As an analogy to the traditional gene he coined the term “meme” as a basic unit cultural or intellectual selection can act on.

    According to Dawkins three distinct conditions need to be met for evolution to occur:

    1. There must be a variation that introduces changes to existing things.
    2. Replication and/or heredity: Information must  be passed on in some way.
    3. Differences in fitness: In biology this is usually measured as reproductive success, but it can be generalised to how successful something spawns more copies of itself.

    Those general conditions can be applied to organisms, to genes, or, as Dawkins did, to information in general. Information can be passed on by copying DNA, copying digital files, writing it down in books, singing it, printing it on t-shirts or just retelling it to others.

    When the key to unlock copy-protected HD DVDs became available in 2007, copyright lawyers tried to prevent any further spreading of it. On the opposing side an international group of internet users stood up and shared the information about the key. To this end, they used all the above mentioned ways to make sure the information did not get lost. This can be seen as an example of how hard it is to get rid of a meme that provides a benefit in fitness, in this case the option to copy DVDs.

    Nowadays the term meme has become a synonym for “internet meme”: small bits of information that are replicated all over the internet by many people. Terms like “facepalm”, “fail”, pictures like “lolcatz” or video mashups like “Hitler finds out…” are known to many people around the planet and they became part of a sub-culture. In the course of transitions of memes the original context can get lost, and websites like “Know Your Meme” try to reconstruct the origins of memes.

    Criticism

    The theory of memes has been criticised for several reasons: First of all it is hard to define what exactly constitutes a unit of selection with regard to memetics. Critic Suzan Blackmore cites the example of Beethoven‘s Fifth. The first notes (the widely known “da-da-da-da”) can be viewed as a meme, but so can the whole symphony. Similar problems can arise in the field of evolutionary biology where the level of selection is defined somewhere between populations and the genomic level.

    Another point that is being criticised is the mode of heredity used by memes. Blackmore distinguishes between “copying the instructions” (what traditional biology expects) and “copying the product”. The latter works in a more Lamarckian fashion. The theory of Lamarck expects that often used traits are passed on to the next generation in their altered form due to usage itself.

    The familiar example for Lamarckian evolution: Giraffes acquired their long necks by using them extensively to reach the tops of trees. While biological evolution does not work like that (as long as we do not dig deeply into the emerging field of epigenetics), the heredity of memes can and will work this way. Everyone who has ever played a game of “Chinese whispers” will remember how this works.

    Selection Theory in Biology

    Despite these points taking a closer look at the parallels between traditional biology and memes might be worth it. In the field of ecology there exists the theory of r/k selection of Robert MacArthur and E. O. Wilson which characterises reproduction and growth of species and their populations. The theory identifies differences in the quantity of offspring produced and the difference in “quality”, in this context: The investment into the offspring and by this means the chance of survival for the offspring.

    The name of the theory is derived from the Verhulst equation used in population dynamics where N is the amount of individuals in the population, t is time, r is the growth rate and K is the carrying capacity of the environment:

    If the parameters r and K as well as the starting population are known, it is possible to calculate changes in the amount of individuals in a population over time. In general those species that are selected to the r parameter are those that produce many offspring with a low chance of survival to adulthood. Those species typically try to live in unstable uncrowded environments. Examples for this type of population growth can be found among bacteria, plants, insects and rodents (like rabbits).

    Species that are selected for the K parameter usually live in environments that are crowded by many different species and where resources are limited. In those environments the capacity is almost always at the limits, and population sizes usually stay constant. To ensure reproduction success it is beneficial to produce less offspring while investing more paternal care. Examples for K strategists are elephants, whales, and so are we humans.

    A good comparison is that of beetles found in agriculturally used fields and beetles found in woods. Species that populate fields are often smaller, less long-lived, and more reproductive with low chances of survival of the offspring. This makes sense as those fields are a highly unstable environment due to human interference by farming. Woods are heavily populated by many different species and resources are limited. Many beetles there tend to be larger, more long-lived, and less reproductive with higher chances of survival of the offspring. This is not a binary classification though, but more of a continuous spectrum. Not all species can be classified as r or K strategists.

    The terms “strategy” and “strategists” are common in biology can be a bit misleading as it may seem that those species actively chose those types of reproduction behaviour. This is not true: reproduction behaviour is also subject to evolutionary changes over time and mediated through genetic variance. Natural selection drives the genetic changes toward those kinds of behaviour that work best for the environment a species lives in.

    Memetic Selection Theory?

    It may be a bit of a stretch to apply this theoretical model to memes as the selection theory has been modelled for species and populations and not for genes, but there are several reasons for the adoption of a similar theory for (internet-)memes:

    1. They live through human beings that receive and share the memes. But the attention span of humans itself is an environment that can be populated by memes.
    2. The attention span is a limited resource and the amount of attention that is ready to be populated can change. So the attention span or the capacity of human brains can be perceived as the carrying capacity K.
    3. Not only species form populations, but so do genes and, in theory, memes should do the same.

    So the theory should be that memes can be classified as r strategists and become big in a real short timescale. German readers may remember the #Blumenkübel meme which took off in less than an hour and made it into the worldwide trending topics, only to disappear after some days.

    Other memes could be classified as K strategists without such an exponential increase. Instead those memes would grow slow (and maybe will be active for a longer time).

    Here are some possible considerations regarding memes:

    1. Maybe memes can be seen as r or K strategists, which means that their population either rises and falls quickly or stays in a more or less stable state?
    2. Maybe which kind of selection acts on memes depends at least partially on the environment/the free attention span that is available?
    3. Some memes can be specialists that are only active in a specific context while other memes (think of #fail) are used in a generalised way. Does this affect their spreading?
    4. Should generalists show a population increase like traditional K strategists while specialists often grow as r strategists?

    So i decided to look at meme data achieved from Twitter, where #hashtags form groups of memes.

    The Data

    The data and help with it was kindly provided by Marco Verch. He maintains the www.twitter-trends.de website which creates the trending German topics on a daily basis. He does this because Twitter itself so far does not provide a German listing. He collects the data by using Twitter’s Search-API, searching for all German tweets with the help of a set of often used German words. Not all German tweets are found with this method, but as the API does not allow to search for all German tweets this can be seen as a good approximation, and potential biases afflict all memes in the same way.

    The database dump Marco gave me provides usernames, the hashtags used, the time of the tweet, the date and the tweet-ID for each entry. To prevent spamming of hashtags each user can only provide one entry per day for each hashtag. So if one person tweets 10 times a day with the same tag only one entry makes it into the database. Again: This may not be an optimal data source but as memes are supposed to spread from person to person it should be possible to get around these limitations.

    The database dump contains about 10.5 million usable entries of tweets and spans from mid-June 2009 (2009-06-24) to the mid of September 2010 (2010-09-18). This range should be sufficient to get some insights. For a further comparison of memes it seemed reasonable to compare those memes that have similar total population sizes. So in a first step I counted the usage for each meme. The top 10 of memes with the biggest populations, total numbers in brackets, can be found:

    1. #fail (152940)
    2. #fb (146875)
    3. #ff (85056)
    4. #iphone (53840)
    5. #piraten (47865)
    6. #berlin (39371)
    7. #twitter (37239)
    8. #ger (32636)
    9. #apple (29706)
    10. #piraten+ (28761)

    Using this sorted list of memes I decided to pick some memes that have nearly the same overall population size to compare the growth of the memes over the total time span. Filtering and sorting of the data was done using some scripts written in Python. The visualisation was done by hand using Numbers. First of all you can find a comparison of the most popular memes: #fail and #fb


    Both memes seem to keep a more or less stable population size for all dates where the memes show up. No regression model shows a significant trend and because of the resolution (memes are binned according to days) no further insight can be won at this level of detail. But lets take a look at three other memes that have similar population sizes: #loveparade, #werbung and #blog

    A large difference can be seen for those graphs. While #werbung and #blog show a uniform distribution over all 451 days the graph of #loveparades differs: At a first glance it is obvious that the meme first shows up almost exactly on the date the Loveparade ended with a tragical disaster, and then the population of the meme drops quite fast again.

    The other difference is found in the amount of days the meme populates the twittersphere: While #werbung and #blog populate the twittersphere on each day included in the dataset, the #loveparade meme can only be found on 157 different days, mostly on the day of the disaster and the days after.


    A similar picture can be seen for the memes #google, #wm, and #facebook. While #google and #facebook can be found for every single day and seem to stay at a constant population level for each day, the #wm meme is not significantly present on most days. With the start of the World Cup though the meme shows a huge population growth. But shortly after the event the meme nearly disappears from the overall population again.

    Here are some more memes: Each pair of memes can be found nearly equally often in the whole dataset. While memes like #unsereuni, #wave, or #s21 have distinct peaks and show some rapid growth, other memes seem to show a uniform distribution over the whole dataset.

    So, yes, some memes seem to show a population growth that could be compared to that of K strategists, while other memes could be compared to r strategists. Those memes have been taken by randomly picking 2 memes that have nearly the same overall population. But up to now these comparisons are not supported by any statistical tests yet.

    The same is true for the idea of generalist and specialist memes: More general memes like #fail, #google, #facebook, #blog, etc. show in general a more uniform distribution over the whole dataset when compared to specialist memes, often used in attribution to a unique event, like #loveparade, #wave, w21 or #wm.

    Problems to Solve

    While this was a short, first look into the data, there are still some problems which need to be solved, and many questions are left unanswered: up to now these comparisons are only done by hand and by taking a look at the graphs. So this is mainly a descriptive work. For real results it is necessary to define some rules for K strategists and r strategists. Then there is a problem in defining the memes: Different hashtags can be used to attribute for the same thing.

    Think of #worldcup and #wm or of #wave and #googlewave. Should those tags be merged into one meme? Or should they be treated as different memes? So far I went with the latter as I thought it would make sense in a way, because strong memes should be able to outrun the weaker ones.

    Then there are some problems with the graphs: First of all the graphs are not complete, so they only show those datapoints where at least one occurrence of the meme is found. This can be fixed in further analysis by filling in the dates and thus enabling real comparisons. The graphs also should be standardised for the amount of active users for each day. As Twitter still grows the amount of memes and the total numbers can also be expected to rise with the growth in active accounts. And for some memes the day-wise resolution may not be good enough. Twitter is fast and memes may rise and fall in the course of hours.

    As a last thing: I did not check for the total amount of memes for each day. If the success of single memes is partially due to a lack of other memes can only be tested by checking the whole population of a day.

    Yep, there are still enough problems to solve and I hope there is some more knowledge that can be obtained from the data by taking a closer look. Although I heard some lectures on population dynamics and ecology in the past I am not an ecologist by any means. The same is true for the whole field of memes. So if you can provide some knowledge in the field please send me some papers on the topic, leave a comment or contact me. If you think you can help by any other means or have alternative ideas I would like to invite you to do the same and be part of a peer-review of some kind. Thanks go to Oliver Tacke and Julia Reda for providing valuable feedback, sharing thoughts and some interdisciplinary exchange, to Marco Verch for providing data and help and to Silke Suck for proofreading all this . This „research“ is funded by everyone who feels like flattering this posting.

    Welcome

    14 Sep

    Welcome to the Phylomemetic Tree. If you’d like to read a bit on the motivation you may want to visit the About-Page. To make it short: I’d like to gather some empirical data on memetics to get an insight on how good the comparison of memes and genes is. On top you see the worlds first phylogenetic tree, drawn by Charles Darwin into one of his notebooks. May it be possible to create similar trees, phylomemetic trees, for the emergence and relationships of memes?

    Currently I’m working on a first analysis of memes found along the german twittersphere and I’m looking forward to give preliminary results in near future.

    Image: Darwin Online
    

    The Phylomemetic Tree

    …and some horizontal meme transfer