Why data science is eating social media agencies
The rise of data science
Marc Andreessen famously said that software is eating the world, and it might be said that data science is eating social media agencies. I suppose “software” is easier to understand than “data science” — a term which has rapidly risen to mean as much or as little as “innovation” or “disruption“.
Data science is not business intelligence in the sense of slicing and dicing well formed data for reporting. Rather, data science is the exploration of data of various types with a view to understanding underlying relationships and being able to explain important relationships through appropriate visualisations which tell the story.
In 2012, the Harvard Business Review described data science as “The Sexiest Job of the 21st Century”, and demand has exploded since then as organisations embrace the power of analytics and the drop in cost of collecting and analysing big data.
This rise in data science and the expectations around is it also playing out in social media data analytics.
For example, although it is not very useful at the moment, IBM has cognitive analytics which can analyse a person’s social stream and produce a relatively robust personality profile including values and needs. The challenge is in making this insight actionable.
However in regular social media monitoring and social media analytics, data science is already having an impact. Agencies need to keep up with it in order to retain their ability to deliver insights from social data.
Data science & social media research
The reason that data science is eating agencies is because strategy for social media and content starts with analysis and research. The research tools are those which can analyse social media data e.g. most, but not all, of the social media monitoring tools.
The first generation of these tools gave us “influencers” and “most popular” and word-clouds, however this is no longer enough.
We’ve all had plenty of experience now in the use of such tools, and we’re aware of how difficult it can be to find the real influencers. We’ve learnt that certain “influencers” appear across a wide range of searches for a large number of topics. This happens because the tools use methods which give a high ranking to these people based on “context free” data e.g. the number of followers and the ranks of the followers themselves.
For example, if Lady Gaga tweets once about “footy” then every search over a period containing that tweet is going to come up with Lady Gaga as a top “influencer”. However we know that is a false result. Lady Gaga is not only unlikely to be an “influencer” about footy on the basis of one tweet, but is also extremely unlikely to be reachable or amenable to even consider being an influencer.
The calculations and presentation of the “most popular” tweets, for example, also suffer from the same problem. If a minute fraction of Lady Gaga’s 54.1 million followers retweet her “footy” tweet then it will easily rise to be the “most popular”. But really, it’s hardly likely to be relevant to your analysis or campaign planning.
Word-clouds are ubiquitously popular in analysing social media conversations and in presenting results to clients about what topics are being discussed. What we’ve learnt is that although we might occasionally stumble upon a real insight, in general word-clouds are very blunt instruments which need to be carefully edited and nurtured in order to add any value to the powerpoint presentation of the analysis. We have to have a high volume of activity, we need to be able to edit out of irrelevant words and rerun the word-cloud analysis, and we need to be able to discern the biases caused by the search terms themselves.
In fact the problems with word-clouds stem from the primitive nature of how we have to search purely with textual input. Meaning that the search of social media is driven by pure syntax – the words and structure of the search term produces a pattern which is matched literally “word by word” to the social media streams.
This type of searching is very primitive, and lacks any “content” or “semantic” capability. Even though clients know perfectly what they are asking us to research, conceptually, they often find it hard to describe how people would talk about those topics in social media. We apply our skills to “best guess” the words and phrases and then refine the search iteratively. Semantic analysis of our search needs makes this more effective.
Now, enter data science!
What this means for agencies
What this means for agencies is that they need to be using tools for social media monitoring and analytics which give more effective insights enabled by underlying data science.
Example 1: From word-clouds to BuzzGraphs and Entity Analysis
By applying Natural Language Processing and data science, social media analysis tools are now providing insights into not just word frequency but how words are used in context.
A BuzzGraph shows how words are linked together and which words are most frequently used together in the social data. This gives a very different picture than a word-cloud, and more insights.
Entity Analysis associates words and small groups of words with their likely semantic type e.g. a person, a sports team, a geographic entity, a website. This also provides new insights. And usually both BuzzGraphs and the Entity Analysis reports contain clickable links so that by clicking through the context can be examined in detail.
Example 2: Community groupings in Twitter
Cluster Analysis — a tool of data science — applied to people participating in Twitter conversations about specific topics of interest, identifies how those people are associated. It groups people together who are most strongly connected with each other, and seeks to separate groups who are most weakly connected.
Using the Lady Gaga example from above, if the topic was “footy” and the AFL then Lady Gaga would undoubtedly be clustered into a group (called a community) of celebrities. Another likely group would be news organisations, and others would be groups of “social media active” fans associated with each club. This allows us to (a) eliminate the celebrities from our potential influencer targets, (b) to identify who is most passionate about each club, and (c) determine who is the most influential “non celebrity” who links various clubs and is not a journalist or working for a news organisation. Non of this is possible without data science.
And finally, with respect to keyword construction and searches, digital science’s cousin cognitive analysis will bring semantic interpretation. We’ll type, or say, “For, Australia, show the the sentiment trends for the last year associated with top ten travel destinations to the US including cities, theme destinations, and geographic destinations”. After a few rounds of training by scoring the results the answers would be relevant and accurate.
In the meantime, if you are not starting to use social media monitoring and analytic tools supported by data science then you will be losing your ability to offer real insights to clients.
I recommend that you ask your social media vendor to demonstrate how they are using or intend to use data science to support your research and insight development.
Image courtesy of Game CupidBack