Here’s a piece that I wrote for The Media Briefing about the use of data in media (they serialised it and ran it over two days). I thought it might be about “big data” when I started out (see a couple of my previous posts) but the “big data over-hype” line has been done a lot. The Reuters Institute’s Big Data in Media conference was the perfect opportunity to log some practical examples of the use of data. It was an excellent one day conference. What struck me is that it wasn’t perfect and complete “big data” being discussed (the overhype associated with that phrase does those working with data a disservice) and the work being done is painstaking, expensive, supplemented by other data sets/research and worthwhile. Ending this article by listing the problems being encountered by publishers implementing data strategies isn’t nay-saying – rather it is a nod to the difficulties data teams are putting up with and finding ways around.
At the Digital Media Strategies 2014 event in February two Fleet Street Generals, Mike Darcey (News UK CEO) and Andrew Miller (Guardian CEO) agreed that data is a vital part of the future success of newsbrands. They vociferously disagreed on much else but on data they were united.
In pointing to the drawbacks of each other’s battle plans (paywall vs. “Open”) and other newsbrand models (metered paywalls and “clickbait”) they listed the industry’s major problems. If data is a big part of the solution, these are the problems to solve:
- Low digital ad rates
- Getting people to pay for content and/or give their data
- Over-reliance on sensational “clickbait” content
- Having the reach and investment in journalism to hold power to account (by attracting whistleblowers and funding serious investigative work)
The recent “Harnessing the Power of Big Data for Media” conference (May 8th) run by the Reuters Institute for the Study of Journalism was an opportunity to see the breadth of data driven initiatives being undertaken by newsbrand companies. To what degree did the work in the trenches address the huge issues listed above?
In the main, the papers focussed on the ways that data can be used to maximise the potency of content. Initiatives were as follows:
- Identifying how content works on different platforms. Jimmy Maymann, CEO of the Huffington Post talked about using data to understand when certain groups of people were reading and on what platform. Parents turn to their mobile after their kids have (eventually) gone to sleep and will read about parenting (in desperation) then. Tom Betts of the FT talked about the value of having a single person customer view of interest in reading about certain topics on each platform. Someone might not want to read about lifestyle content on their laptop but be happy to read it on their mobile in the evening. Ky Harlin, Director of Data Research and Development at Buzzfeed talked about understanding how content works differently on different social media platforms.
- Passive personalisation. This is the use of algorithms, using data gleaned from people’s online behaviour, to serve content to individuals based on what is known about their preferences. Speakers from both The Huffington Post and Sacramento Bee spoke about their initiatives in that direction. Sanjeevan Bala of Channel 4 spoke about using “anti-algorithms” (they’re called Stochastic algorithms but might as well be called Serendipity algorithms) to ensure their 4OD registrants hear about new content from different genres to those they regularly view.
- Operating in “The Stream” of social media. Buzzfeed have a lab dedicated to understanding how content goes viral (much talk of the A/B testing of headline wording) and what topics go viral. They have built norms that mean they can track how viral a story can become after a few hours of being live. From here they can make adjustments (headline changes, appropriately timed use of social) to keep it on track.
- Not over-valuing big page view figures. Matt Keylock, Global Head of Data at Dunnhumby discussed how retailers have learned not to overvalue items that were bought by customers who didn’t return to shop at the same store again (this can be the case with electrical items, apparently). The parallel with content is obvious. Keylock’s solution was delving deeper into the data and looking at more measures before deciding on strategy. Tom Betts arms FT writers with simple dashboards that contain several metrics by which to judge the success of their stories. Quality content cannot be judged on page views alone.
- Knowing where readers are up to with a story. Arsenio Santos of Circa 1605 allows readers to sign up to follow a story as it develops. This creates an alternative measure of engagement and shows which stories to invest in. It also seems like a step towards the vision of Richard Gingras, head of news at Google, who predicted a couple of years ago that data and technology (he talked about face recognition technology) will be used to remember where someone is with a news story and continue from where they left off.
- The use of data by journalists. The opportunity is investigative, eye-catching journalism created by journalists unafraid of delving into data sets. Peter Bale, General Manager of CNN talked about CNN’s map of military deaths in Afghanistan mapped on that country and also the USA (based on the home town of the soldier killed).
There were also examples of how newsbrands were using data to work with advertisers. Examples were:
- Behavioural targeting. Passive personalisation has an obvious appeal for advertisers too.
- Not sinking in The Stream.Publishers can entice advertisers and ad agencies with their data driven knowledge of how to make content stand out in the social media stream. The knowledge learned in Buzzfeed’s Lab, The Guardian’s Lab and HuffPo’s Partner Studio will be valuable to advertisers who are becoming storytellers themselves these days.
- Helping advertisers meet new regulations. Digital ad audit tools like Nielsen’s OCR will underscore advertising served in inappropriate environments. The Huffington Post offers technology that allows advertisers to link keywords to a campaign to ensure their ad is not served next to content containing those words (a slimming product next to a story about bulimia, for example).
- Data visualisation. At the Sacramento Bee they load several years worth of “boring” car dealer sales data spreadsheets onto a mapping tool that allows their salespeople to engage car dealers with visualisations of shifting trends in their own industry. One dealer hadn’t realised he was underselling a particular vehicle in a certain region.
So there was plenty of activity addressing the problems listed by the newsbrand CEOs at DMS14. Data is being used to address the challenges of advertising. It is being used to make content so relevant (to people and platforms) that it might draw in large, engaged audiences (who might pay with cash and data). It is also being used to create investigative and eye catching journalism (and steer editors away from over-reliance on clickbait).
That is good news, right? What is encouraging, however, is that data is being used at the heart of newsbrands – enhancing the content (including the serious journalism) and spreading it further.
Publisher Generals will have plenty of positives to talk up at next year’s Digital Media Strategies conference. Yes but it isn’t all plain sailing. The following are very real issues standing in the way of progress.
- Privacy issues. Initiatives like passive personalisation can spook people unless they know what data is being used and for what purposes. The increasing debate and awareness of privacy issues will be a growing problem for publishers with data ambitions in the future.
- How big is big data? There’s a lot of talk of overhype. This is based on the fact that publishers don’t have one large set of data but multiple sets of small data. Data concerns will lead to more opt outs and so data sets will decrease. This is not census data by any means and that makes it harder to understand and draw meaningful findings from it. As Tom Betts says, it often outlines the “what” but not the “why”. For the latter he uses conventional market research.
- Insight takes time – and money. Amazing insights rarely sit on the surface; you need to pan for gold. Tom Betts talked about the trap of mistaking cause and effect. It also takes time to get data into a format from which insights can be found. The Sacramento Bee’s car dealer will have taken a long time to produce and probably uncovered only a handful of useful insights (they didn’t say if it lead to more ad revenue). The CNN maps will also have taken time. Eight out of ten CNN info-graphics are sponsored so that they can even afford to produce them. Finding a sponsor for a map of dead Americans cannot be easy. No wonder their other example was a map of the locations where people are happiest.
- The power of the data hobbyist. Whilst investing in data journalism is a huge cost for under-resourced publishers it is a small cost for individuals spending their own time on it. Bertrand Pecquerie of the Global Editor’s Network spoke about the large proportion of award winning data journalists who said they had to do it in their own time. Amateur data journalists are also setting up respected websites that began as hobbies. Peter Bale, MD of CNN said traditional journalists who aired opinions based on very little proof were being embarrassed by a new breed, often outside the core profession, drawing more solid conclusions from data.
- The high volume of content needed to feed social media and create targeted content. Jimmy Maymann, CEO of the Huffington Post said they create 1,600 new articles a day because “social media has changed the pace of the news site”. At last year’s NewsWorks Shift conference Tony Gallagher, then editor of The Daily Telegraph said the demands of the web meant he produced 600 articles a day of which a third made it to print. Understanding how to use The Stream of social media is one thing but having the resources to generate content at the rate it consumes it is quite another. At Digital Media Strategies 2014 Duncan Painter, CEO of Top Right said it was madness that journalists were the first to be made redundant when the need for quality content is so high.
- Solving new problems but not the original, underlying ones. Behavioural targeting, native advertising crafted for social media and the avoidance of inappropriate contexts might prevent digital ad costs being driven down further but they won’t return scarcity to media inventory. Without scarcity ad agencies often need to invest too much time and money per media property for these customisations. Group M’s Interaction report 2014 concludes, “It is for now at least possible that the dividend of reduced wastage is being matched or offset by the cost of customisation and personalisation”.
- Deeply mined data driven insights can seem counter intuitive and mean missing out on instant gratification (think of the culture shift involved in getting publishers to NOT focus on huge page view numbers). Dunnhumby stressed the importance of getting data into the eco-system of the business and making it part of the language used between CMO, CFO and CEO. If that was a challenge for retailers it will be a bigger challenge for publishers for whom stock and revenue are less clearly linked. Of all the issues he faced, Darrel Kunken, Director of Market Analysis at the Sacramento Bee said it is the cultural issue that is the biggest.
Nobody said it was easy. Data is certainly part of the solution but it isn’t a silver bullet. Rather it will offer small advances, hard fought and expensive. Darcey and Miller might be the Generals of Fleet Street but this is still trench warfare. Their data teams will rely on investment in their operations but also in the creation of the sort (and volume) of content that their data tells them is required to make those advances. Even ad revenues that are data driven will not cover that investment anytime soon.