Balancing ‘as open as possible’ and ‘as closed as necessary’

A highlight of the recent event Openness and commercialisation was a panel of academia and industry experts and representatives brought together to discuss tensions and synergies, and the way forward to balance commercial advantage with open science.

14th December 2020

Back to overview

During 3 and 4 December, around 200 delegates gathered for the online event ‘Openness and commercialisation: How the two can go together’ which featured contributions from our Task Force Innovation and Task Force Open Science.

A graphical summary of the full event was prepared by Connie Claire (Community Manager, 4TU.ResearchData).

On 4 December, there was a panel discussion focused on the topic ‘as open as possible and as closed as necessary’ addressing the balance between openness and commercialisation. The panelists, representing large pharma and technology businesses as well as an AI startup, and a representative from academia, had been invited to consider the commercial aspects and potential tensions with openness. However the discussion encompassed the increasingly important discussion in open science and open innovation on how to balance ‘knowledge safety’ & ‘export control’ (particularly for dual-use technologies). This is especially important with the growing discussions around ‘technological sovereignty’ and ‘open strategic autonomy’ among EU institutions and across Europe.

For example, concrete examples of tensions are these quotes from the audience during the event highlighting:

Geopolitical tension: “publish papers and data as openly as possible, except if it has to do with AI then don’t share with certain countries”
Industry tension: “open science is valuable to increase reproducibility and rigour, but detrimental when there is an expectation is to make valuable digital assets available for free”

With that introduction and overall context, we are pleased to share this report from the panel discussion. The statements below are edited excerpts from the conversation. For the full statements and context, please check the video recording of the panel discussion.

Summary

The main points of the discussion can be summarised in a series of further questions and statements as follows, which also provide an outline for how this discussion can develop beyond this one hour panel session:

What is the definition of openness? (How can we best define openness?)
The changing nature of Intellectual Property (IP) - in different sectors, and in different sizes and types of companies, and that which will benefit the social good. How does this affect openness?
Collaboration and sharing data often guides the principles of openness, or the degree to which open science principles are implemented.
How can the FAIR data principles lead to enhanced open science practices?
Can contracts and flexibility of approaches aid openness and lead to more effective innovation practices? This can be supported by university good practice and examples.

Full report from the panel discussion

The Chair Tim Bedford (Chair of Task Force Innovation and Associate Principal at the University of Strathclyde) introduced the panel discussion: “When implementing FAIR data principles, it is often said that the data should be as open as possible, and as closed as necessary. In the discussion today we shall debate how this is operationalised when building innovation partnerships between universities and commercial entities. If data is as open as possible and as closed as necessary, then we need to know who decides what is possible and what is necessary, and on what grounds the decision is made. There are implications for the academic community and for the company partners, as well as for funders who want to create a research and development system that both supports new science and supports our knowledge-based economy on the world stage. How do we balance off the different interests and perspectives when deciding ‘as open as possible, as closed as necessary’?”

Tim then invited each panelist to give an opening statement to reflect on this.

Sophie Bailes (Director Digital Strategy of Pharmaceutical Technology & Development at AstraZeneca) highlighted their work with digital transformation and its role in taking molecules from phase 1 all the way through clinical trials and into commercial manufacturing. How they use digital tools such as modelling, simulations and algorithms to accelerate that process is central. They are especially looking at the product robustness to deliver 100% quality for all of their medicines once they’re in manufacturing. As we move into this digital era, what is considered IP is changing. So rather than just looking at the chemical structure and processes, we also think about the data and digital tools that underpin the work. So when we hear statements such as ‘as open as possible’ we have to consider what that means, as science is a business, and in order to protect valuable IP we have to think about what that openness is and should be. However, Sophie highlighted that there is a great opportunity here for the industry in the area of what is classified as pre-competitive space, and that there are some great examples where cross-pharma collaboration have allowed the use of data across the industry where to get greater knowledge from a pool of data, in contrast to just the data available internally in a single company.

The question it really comes down to, according to Sophie, is what is the social good of having open data in this area? Both within industry and academia, there is a great wealth of knowledge and understanding of the context where specific data is generated. Especially for scientific data, the context is vital to ensure the data is used in a robust way. If we are looking into going into a fully open setting and that context is lost, then there are substantial risks in that process. Sophie recalled that earlier in the conference it was discussed the importance of data stewards and data curators to guide this process in a good way, which she agreed with. For her this feeds directly into how we apply data standards, and the control of that data including how it can be used and re-used in the future.

Shalini Kurapati (Co-Founder & CEO of Clearbox AI Solutions) leads an AI start-up which helps companies assess, monitor and improve their AI models with the aim to provide trustworthy AI in companies. AI touches upon all facets of today’s innovation landscape. However, this is a relatively new technology and quite cutting edge, and there is a big culture of industry-academia collaboration in this field. Shalini emphasised that the lines between research labs and business units practicing AI are really blurry. In terms of research, companies that are developing AI are expected to publish papers, often to show off their competitive edge. The immense excitement and buzz around AI is justifiable as it is a powerful tool to improve how we work and live.

However, at the same time, there is a level of apprehension associated with AI. Many AI models are black boxes: as long as you give them tons of data, they can be great at predictions, decision making or giving recommendations. However, importantly, Shalini emphasised that we should not just blindly trust these models at face value and the outcomes they produce as we have limited methods to understand exactly what goes on inside the black box. There are a number of ethics, fairness and regulatory issues that come along with this, but the core problem is trust. Shalini highlighted that 90% of companies today want to deploy AI, but only 15% are able to implement AI approaches in their business operations. The lack of trust is inhibiting progress or commercialisation. Part of this is that the research that is being published is suffering from low reproducibility (i.e. researchers often have difficulties in reproducing the findings of other researchers) which further lowers trust. To help address this, Shalini’s proposition is that AI can benefit a lot from openness, but we need to define what openness is. She was very inspired by this morning’s talk by Alan Hansbury who talked about openness with closed data, as AI often falls in that domain. So we need to look at how we can leverage this to increase trust and reproducibility, so that companies can actually put AI models into production in a trustworthy and responsible way.

Norbert Lütke-Entrup (Head of Corporate Technology & Innovation Management of Siemens) shared that Siemens is one of the most active private sector participants in European framework programmes for research and innovation. In this context, Siemens engages a lot in the debate around open science and open research data. He emphasised that Siemens acknowledges the value in pooling and sharing research data, and fully subscribes to efforts by the European Commission to make research data more open. However, he also stressed that open research data is only a means to an end, i.e. the well-being of Europe’s citizens and the competitiveness of Europe’s industry, which raises some tricky questions. For example, in sensitive research areas, such as quantum computing and quantum encryption, Europe may be ill-advised to share research outputs with the entire world in the context of growing geopolitical competition. The other dimension he sees is that much research data produced through company involvement represents a commercial value, the global sharing of which would be harmful to European industrial competitiveness. In order to answer questions such as ‘what can be open?’, ‘what should be open?’, ‘at what time?’ in such contexts, the simple statement ‘as open as possible, as closed as necessary’ is a good starting point, but not the detailed guidance needed. The Working Group on Research and Innovation at BusinessEurope, which Norbert chairs, has proposed more detailed guiding principles from the perspective of companies, which represent roughly two thirds of all European R&D efforts.

To start with, open science policies (e.g. in EU programmes) need to acknowledge that companies have a commercial rationale to follow. Many companies voluntarily make part of their know-how and data public, as is illustrated by the large amount of open source software contributed by companies with the aim to enrich their ecosystems and creating a win-win for all ecosystem members. But companies need the freedom to decide on the openness of their data and cannot accept a mandatory sharing of data from research co-funded them.

Secondly, different shades of openness are needed, i.e. the current ‘black or white’ approach to open research data should be replaced with different shades of grey, such as free access to metadata with a more controlled access to the actual data. For each research output, a conscious decision about the degree of openness is needed, formulated as part of a data strategy.

Thirdly, there should not be any ‘holy cows’. Neither should all research data always be open, nor should researchers assume that data with a commercial value always have to be closed.

As a fourth principle, Norbert underlined the importance of making decisions about the openness of research data without any bias. For example, a company applying for a EU-funded project and opting out of an open data regime should not be at a disadvantage.

Finally, care should be taken about how reward systems for scientists are designed. Today, the main yardstick for measuring the work of scientists is the number and quality of publications they produce. Norbert stressed the risk of adding an open research data dimension to that reward system, which would put industrial researchers from private sector organisations at a severe disadvantage.

Laura MacDonald (Chief Executive of Association of European Science and Technology Transfer Professionals (ASTP)) started by reflecting that the whole concept of open science generally seems to sit well with researchers who ‘live and breathe’ research and who love sharing their results. She shared that when ASTP drilled down to really understand the impact of open science policy recommendations and associated drivers, and how that sits with the community, then what they found is that there is a huge need for people to understand what is actually meant by data, research data and outputs in this openness context. One example of ongoing activities in this area is to make sure that the research community fully understands what, practically, open science policies and structures around open data actually means for their daily practice. This is for all disciplines and not just the STEM disciplines which are more used to dealing with industry collaboration. She highlighted that it is important that we, as a research and innovation community, continue with our awareness and understanding campaigns.

Laura then reflected on the vital point in this panel discussion, what impact will open science have on researchers and their partners in industry? As we are all aware, Laura recalled, the world of solution-finding, innovation and development is increasingly about co-creation and doing things within partnerships. When she heard Norbert and Sophie’s reflections about the challenges for their corporations, she sees two main additions from her perspective. The first is that the relationship with the outside world is not going to immediately change due to open science; we are already happy to and want to put much data into the public domain. Instead, what is changing are the steps needed for how newer developments (e.g. advancements around FAIR principles) are turned into daily practice. She emphasised that for this, it is vital to ensure that the relationships that have been building over many decades between industrial players and research centres can continue to thrive in a real spirit of partnership. This underpins the great science that results from many consortiums. We need to further define and be clear about how these relationships can be regulated, in a contractual way. In addition she emphasised that we need to further define openness, supported by better understanding, and not be black and white about openness. She agreed that the three principles Norbert set out here are very useful. She asked ‘How do we capture methods and practical steps to make sure researchers are comfortable to go into collaborative projects?’ Key questions to be answered upfront include: what kind of results will fall into the general public domain? Which results will have a process of management for example curation during the project and then going into the public domain or be associated with conventional intellectual protection rights where there are contractual arrangements between the consortium partners on the usage of the data and how the rights can be transferred. She underlined that we have a lot of defining and interpreting to do around the words ‘FAIR’, ‘possible’ and ‘necessary’.

Laura finished by going back to first principles, which she stated are the drivers behind the collaboration for the research community and the industrial partners. She recalled that we are all facing the same challenges and want the same thing in the end. We want to see more effective, efficient and impactful innovations in the end. To achieve this, she emphasised the importance of making definitions and practical steps for implementation more visible.

Tim thanked all panelists and reiterated that it is indeed about relationships between individuals in the partner organisations, and that we as a community should work towards giving them structures to collaborate successfully for a social good. There are cultural and legal differences.

The panel discussed the notion of ‘pre-competitive space’ which is where competing companies are interested in working together, where there is no money to be made immediately.

Shalini shared that pre-competitive space in AI is quite different from some more conventional business areas, for example as shared by Sophie for AstraZeneca. She underlined this needs to be considered moving forward.

Shalini highlighted that most of the business units that work with AI are composed of scientists, and that working on open source software in AI is the norm. But there is a lot of reluctance in sharing data sets. So for the software and the processes around developing algorithms and methods there is a lot of openness and collaboration as it is seen as a pre-competitive space, in contrast, the data is seen as highly valuable and connected to business operations and competitive edge, and therefore not shared. She does not see this changing any time soon.

Sophie recognised this as a challenge in her consortia as well. Digital methods and models are developed using ‘perfect’ or idealised data sets which are very helpful to understand how the methods and models work and to fine tune them, and this is often a collaborative and open process seen as contributing to the ‘ecosystem’, which is a win-win for everyone. But adapting and validating these methods and models to work with real-world ‘dirty’ data and settings is challenging and therefore firmly inside the competitive space and therefore well-guarded internally and kept outside of open and collaborative arenas.

The panel discussed shades of openness and if power imbalance with industry (often providing substantial financial resources to collaborative projects) is pushing towards the ‘darkest shade’ of grey? Norbert reflected that at Siemens they have tons of data from customers, e.g. operators of high-speed trains or power grids. However, these are not Siemens’ data, they are ultimately owned by Siemens’ customers and used by Siemens on the basis of agreements strictly limiting the use of such data. Generally speaking, partners in industry share data only with a clear perspective to create value for both partners. In this example, data collected during the operation of high-speed trains helps the train manufacturer to build better trains, and also allows the manufacturer to provide superior customer services to the train operator. But neither the operator nor the manufacturer of the train is keen on sharing this information broadly as it contains very detailed information about both of their internal processes and technologies, which form part of their competitive edge. What companies want here is the flexibility to share data under conditions that suits their mutual interests.

Norbert reflected that data has been called ‘the gold of the 21st century’. This has created extra hesitancy among commercial entities to share data freely. At the same time, they have seen that interesting and unexpected things can happen if and when data is shared with the community. Companies are very aware of that and are therefore exploring more and more ways to share and open up information. In short, keeping data hidden that doesn’t need to be hidden may actually represent loss of unexpected insights and therefore is a missed opportunity. Many companies are becoming aware of this.

Laura emphasised that there are many different types of data: going from raw data to anonymised data to processed data to finding patterns to drawing conclusions. She highlighted that we need to look at the context at each of these steps as there may be many uncontroversial areas for sharing but which are not well-known. Many actors have already done this in ‘pre-open science’ days. Even without the open science context many actors know that some insights or findings should be accessible to everyone as it advances the community and ecosystem in a way that is useful to everyone.

Tim asked if we can define data sets in a clear way to prevent that the default is driving results towards being as ‘closed as possible’?

Laura responded that sector-based examples and developments to prepare and present concrete examples would be useful. She highlighted the urgency with “We don’t have a choice, we have to make this happen.”

Norbert shared that there are several areas we can identify. Core know-how (e.g., under what conditions will a train break down?) is certainly excluded from any data sharing. On the other hand, research in e.g. how to make AI algorithms explainable are driven by a common interest and thus a candidate for sharing data. He agreed that today companies may be a bit too quick to decide to not share anything, but cautioned that we should not let the pendulum swing completely the other way around to force everyone to share everything.

Tim asked if there are things we can do as an academic community and business community to show that openness can be useful? To help advance understanding and awareness?

Norbert shared that examples and good practices are useful. He described how Siemens shared non-critical turbine data in an open challenge and found an external team which could address the question in a completely new and great way that they hadn’t thought of internally. This can help educate that sharing is better. He also described an example of a Canadian goldmine company which put their seismic data online openly and an external team developed and shared with them a method for finding and digging for gold much more efficiently which saved that company which was on the verge of bankruptcy. He reflected that these types of stories are very powerful to showcase that openness is valuable.

Shalini reflected that in industry the word open can ‘put you in the corner’. Openness does not mean free. Raw data is often very dirty and processing, curating and maintaining data sets is therefore very costly so there must be ways to reclaim and have return on investments and resources that have been put into making and maintaining high-quality data sets. For algorithms the expression ‘garbage in equals garbage out’ is very true, so preparing and maintaining high-quality data sets is vital. This often means that data sets cannot sustainably be made always available for free to everyone.

Another incentive for data companies towards increasing openness, is to ensure that their data is described and maintained in such a way that it works in many contexts (and not solely in the internal workflow). One example shared was data sets used for diagnosing diseases which worked beautifully in one hospital but failed completely when deployed in another. A likely reason is minor differences in the equipment and settings used, which is hard to predict beforehand, but of course crucial for real-life usage. So sharing data in this case and deploying it in various contexts gave valuable insights for the robustness of the method.

Tim asked an audience question on if we should define the type of openness we want at the start of the project, or if we should agree this during the project as results become known?

Norbert responded that we should talk about it at the beginning, but it is naive to believe we can have a firm agreement at the beginning, when the outcome is usually not perfectly well known. What is needed is a basic agreement at the beginning of a research project, which is however flexible and can be adapted during the project if needed.

Laura added that we should define principles for the decision-making upfront at the very beginning, and then the actual decision making can be done during the project.

Sophie agreed that we need to have agreements up front, but need flexibility. She shared that she has never worked on a project which went completely according to the first plan. So we need processes to review, evaluate and adapt as project changes. She discussed that AstraZeneca is used to working with postdocs and PhD candidates who want to publish, so they already have processes in place to ensure fruitful and clear discussions on when and how publishing can take place. Much of this can be adapted to data discussions as well, which has already started to happen.

Shalini added that in her field it is often agreed up-front that code and software is put online open source on github and similar sources, as there is common understanding of the value that comes from this.

Tim thanked the audience for all the questions and all the panelists for their contributions. He invited each panelist to make a closing remark.

Sophie reiterated that there is a genuine social good to sharing data. Individual companies often do not produce enough data in the development phase to truly understand processes and mechanisms, so for that pooling data is very valuable. Breaking down barriers for sharing is therefore valuable for companies and for the worldwide scientific community.

Shalini recalled her earlier statement that we need to approach openness in terms of trust, reproducibility and interoperability when we talk about industry-academia collaborations. That is when we have the most productive discussions.

Norbert’s take away message was that we have a case for sharing more research data in Europe but starting with a more conscious discussion about how best to use a given set of data in the interest of Europe. If the commercial dimensions are most important then we should be restrictive with sharing, but if the data sets will add true value to the ecosystem (which is a win-win for everyone including the company) then we should be more open. These discussions need to happen without ideological bias; openness is not always good. He emphasised the need for flexibility in order to find more and better ways of sharing data intelligently and get out of the current black and white discussion of openness vs closedness.

Laura agreed that the panelists are on the same page and the general principles are aligned. She emphasised that the concepts of openness and open science need to be more practical. She called on all of us to help look at sector level to identify and share successful examples of collaboration models and agreements to help advance daily practice.

Tim thanked all panelists and the audience again, and closed the panel session.

Model paragraph

During its meeting in June 2020, the Task Force Innovation brainstormed and discussed the following model paragraph for academia-industry collaboration agreements. It is intended as a starting point for the type of wording that can be included in collaboration agreements which promote FAIR principles and the sharing of data, while providing flexibility and protection wherever necessary.

“All research data produced in this project should be handled and stored following the FAIR principles with the guiding principle ‘as open as possible, as closed as necessary’. Research data not specifically designated as protected or private can be made openly available after an embargo period. Unless otherwise agreed, the default embargo period in this project is 6 months, starting from the generation of each unit of the research data: i.e., 6 months into the project, research data generated on day 1 and not designated as protected or private can be made openly available. For research data designated as protected or private, authentication and authorisation requirements must be specified.”

If you have any questions or comments, please do not hesitate to contact us.

Yvonne Kinnard, Secretary of Task Force Innovation, and KE Policy & Outreach Manager at the University of Strathclyde

Mattias Björnmalm, Advisor for Research & Innovation, CESAER

Balancing ‘as open as possible’ and ‘as closed as necessary’

Summary

Full report from the panel discussion

Model paragraph

We distinguish the following types of cookies, based on their purposes:

Essential / Strictly Necessary Cookies:

Non-Essential Cookies:

» Functional Cookies:

» Analytical Cookies:

» Targeting / Advertising / Marketing Cookies:

Want to know more?

Balancing ‘as open as possible’ and ‘as closed as necessary’

Summary

Full report from the panel discussion

Model paragraph

Request more information

We distinguish the following types of cookies, based on their purposes:

Essential / Strictly Necessary Cookies:

Non-Essential Cookies:

» Functional Cookies:

» Analytical Cookies:

» Targeting / Advertising / Marketing Cookies:

Want to know more?