SUMO: You are Head of Open Data at “Thomson Reuters”. But what is open data in simple terms?
Dan Meisner: Open data is any data that is available for use under an open licence. People can leverage that to do whatever they want with it at least with a constraint of that licence. So most often this is government data. There are some thousands of US government agencies alone that are all collecting data and this is data being collected on behalf of tax payers and so it should be available to tax payers as well. It’s about the right to get to it and it’s also about the availability of that. So to me open data is content that is licenced for open use as well as made available and easy to consume for people.
SUMO: What is your daily routine or is there even a daily routine as Head of Open Data at “Thomson Reuters”?
Meisner: I wouldn’t say there is a daily routine. I think it stays pretty varied. A lot of my time is consumed in the management of the actual website permID.org. It is about how we disseminate this open information and then how that connects to the underlying information architecture within “Thomson Reuters”.
It is also about talking with the partners that we want to leverage data and talking with clients who we want to let know that this is available to them and make use of it to help them achieve their goals. But also working with the representatives of the government and to make our data solutions known and available for their purposes as well.
SUMO: Why is open data getting more and more important in business matters?
Meisner: It’s recently maybe five or ten years ago a professional client of ours – say in the financial asset management space – would purchase their information products from us. It might be a handful of data bases and so what we are giving them might be 95% of the information they are looking at. Today they are trying to look at a much more realistic picture: They are bringing in information from the other systems within their organisation where their research data bases might not be connected to their customer management systems and so they are bringing that together.
And then you have things like News Flow and Social Media like “Twitter” and open data as well, and we are starting to see businesses evolve that are really based on open data.
I don’t know if they have a presence here, but in the US we have “Silo” which is a massive source of information for real estate. You go on there and look at real estate listings and it is using publicly available information about schools and crime and transportation that can help you find the right home to purchase. So the commercial component of the knowledge that is being leveraged is becoming smaller and smaller and we want to be able to play in that space rather than become obsolete. Not that we are really in danger becoming obsolete, there is always a need for reference data, but to the extent that we can facilitate this network effect, our customers benefit because they can get a much richer picture of what is going on in the world and ultimately make better decisions driven by data.
SUMO: How is “Thomson Reuters” dealing with big data?
Meisner: We were very early adopters of big data. We have been engaging big data since before we called it big data. One of the key issues that we are noticing with our clients who are almost engaged in some form of significant big data project is, that it is not purely a technology issue. Big data removes the scarcity in data storage we had previously and might be limited by the physical capacity of a data base in terms of what we can put together.
Now I have essentially build this box of infinite scale that I can throw everything into but putting it all into one place it is not equal to integrating it. There is just a lot more of it. Imagine if you took every file in this building and threw it into one big box. Yes, it is in one big box, but that doesn’t make it any easier to use.
The ability to manage the structure of that information, the Meta data and the organisational aspects of that information become far more important because the promise of big data is that I can leverage everything that my organisation knows. But that is only really true if I can bridge the gaps in meaning between data set A and data set B and if they have different notions of like I said. If one is Microsoft Inc. and the other is Microsoft Incorporated and we can’t draw that line between them to connect them together. Those definitions are different depending on the content set, depending on the specifics of the entities being described and depending on the professional domain. To a lawyer a subsidiary of a company is different than to an asset manager.
Being able to bridge those gaps in a computer-readable way becomes much more important and what we have noticed is, we have been dealing with these issues for decades because for us we might have lots of different products that distributes that data, but then our clients will take multiple products. And if they have five different “Thomson Reuters” products with five different definitions of Microsoft we get angry phone calls. Or they have to do the work
SUMO: People often mention big data with a sceptical undertone. How much is big data influencing our private life?
Meisner: Big data is just a tool. It is about how this tool is used and so certainly aspects of this have enabled our national security apparatus to basically look into the lives of Americans in a way that we have not enjoyed and what have become a big scandal. The ability to look at all of this data about our own citizens in a way that’s quite frankly very creepy is enabled by these technologies, but technology is amoral, it is neither good nor bad. It is all a question of how you use it. A gun in a hand of a police officer is probably a good thing. A gun in a hand of a criminal is a bad thing.
SUMO: Are there any suitable solutions or programmes at this moment to handle the big amount of data in various business fields?
Meisner: It is mainly about Apache Hadoop, an open-source software framework, as far as the data storage goes. Things like some broadly speaking semantic web technologies are very important. The standards are limited in certain ways especially for us. We need to understand the time dimensions as well. How things are changing which the standards don’t really address right now. We have incredibly large volumes of unstructured data. Things like news articles or research or filings. They are documents, they are not meaningful to a machine and so bridging that gap with natural language processing enables us to embed some structure into unstructured documents and gain some understanding. Then you are able to do things like form an implicit graph of relationships based on companies that are mentioned together.
SUMO: What do you think about the future of big data? Will there be just advantages or do you see also risks?
Meisner: It helps us to understand the world around us a lot better. It’s a force multiplier on the human brain because I can look at massive volumes of data and simulate that information and make decisions based on that data.
The flipside is that if I am making all my decisions algorithmically we start to lose some of that where innovation comes from. You look at something like the music genius Pandora or Netflix original content and a lot of that is partially driven algorithmically based on all the data that they are collecting and figuring out what are the aspects that we need to hit in designing media to appeal to a widest demographical group and ultimately to make the most money. That also limits our ability to take risk that – if we don’t know any better – we have not calculated. We just try something and if we fail – whatever. But if we have that feral understanding of risk we know that this piece of music that I am composing or this movie that I am trying to put together has his tiny fraction of 1 percent chance of succeeding – who is going to finance that project? There is no sort of emotional element that creates an innovation. So I think that is a risk.
But there is always that danger whenever you’re trying to break everything down into matrix. You are creating models and those models are approximations of the phenomenon that we are trying to explain. Not necessarily a detailed understanding of the phenomenon which is maybe impossible but it makes risk much more measurable, which is good and bad. And again: It could be good or bad depending on how you using it.
Interview for SUMO by Paris Zinner. The next issue will be released in the middle of March 2016.