Sign Up
All Posts

Jupyter notebooks and organisational data science

Leverage the power of Jupyter notebooks to supercharge the way your business does data science

 

What are Jupyter notebooks?

Data science is exploding and there is an explosion in the amount of data available to businesses as more and more organisations use this data to power, well, everything. The de facto notebook framework in the data science community is the Jupyter Notebook, and has been widely used now for data exploration, visualisation and narration. But what is the Jupyter notebook?

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. Uses include data cleaning and transformation, numerical simulation, statistical modelling, data visualisation, machine learning, and much more.

Jupyter notebooks and R-markdown notebooks are by far the two most popular types of data science notebooks. They are open-source formats created by the community, and there are over 7-million Jupyter notebooks on Github alone right now with exponential growth in usage.

And these notebooks are already being widely used in many large organisations:

But what are the reasons this tool is most popular above all others?

 

The power of Jupyter notebooks

1. Exploratory Analysis and Data Visualisation

Jupyter notebooks have risen rapidly in popularity to become the go-to standard for quick prototyping and exploratory data analysis. Within the Jupyter interface, users can observe the results of their code by executing cells line-by-line, independent of the rest of the script. 

This means one can interactively run code, explore output and visualise data seamlessly. Jupyter also supports the rendering of interactive plotting libraries like that of plotly, bokeh and Altair, alongside the sharing of code and data sets, thus enabling others to better interact with these generated insights.


2. Computational Narration

The final goal of any data exploration & analysis is not to generate interesting but arbitrary information from the company’s data — it is to uncover actionable insights, communicated effectively in reports that are ready to be used around the business to take more data-informed decisions. 

Jupyter notebooks are, by default, perfect for generating data-based reports. It is all about storytelling - and in a notebook you can narrate the visualisations you create, such the data and the computations run to process and visualise this data are all embedded into a narrative - a computational narrative - that tells a story for its intended audience and context. This makes presentation and communication of generated insights towards business value effortless.

 

3. Parameterisation and Scheduling

Ok so we've seen so far the importance of Jupyter notebooks in the process of generating and presenting insights from data. However, the community is continuously working hard on new add-ons to the Jupyter ecosystem that make the lives of data scientists and engineers that much easier.

A big game changer  has been papermill, which is a library that makes large-scale execution of multiple notebooks, with configurable parameters and production ecosystems in mind, possible.  Parametrised notebooks essentially allow one to specify parameters within code blocks and provide input values while the code is running.

The opportunities this opens up are endless. The data insights, visualisations and reports we have already referred to above can now be automated and scheduled. Imagine, for example, a monthly report that a data scientist traditionally would have to manually compute at the end of each month. Now, imagine this same data scientist can use papermill, leveraging parameters for the values that change month-to-month, meaning that the report can now be scheduled to automatically execute on the 1st of each and every month.

With this and other tools, data teams can begin to set up full pipelines for generating reports in production, based on Jupyter notebooks.

Kyso's bet on Jupyter notebooks

With the introduction of tools like papermill, as well as others like, for example, those aimed at improving version control with notebooks, Jupyter notebooks are no longer only used in local development but also in production. In addition, Jupyter notebooks are more and more often used like reports and dashboards to be presented to business teams and executives.

And this is where Kyso comes in.

Internal data teams use tools like Github for project management, access to which is typically restricted to the data scientists & engineers. This means their work is not shared with the non-technical people in the company. They use technical documents like these notebooks for scheduling, data exploration & analysis, but the presence of code, terminal output, etc. means they are not the best communication tool for non-technical audiences. All of this causes a lot of locked business value because not everyone in the company learns from generated insights.

Kyso solves this problem with our Github integration and an elegant blogging platform for rendering data science notebooks, making accessible to everyone information that was previously only possessed by a select few. Kyso lets technical teams publish and share their analyses in a more readable format by rendering notebooks as web articles. Think of it like your team’s own internal Medium, but for data science — you can publish Jupyter & R notebooks. Any code is hidden by default and can be toggled so that your post is digestible for both technical and non-technical audiences alike. You can also write articles from scratch using our in-app Markdown editor.

We created Kyso to optimize the computation-to-communication workflow, and to bring non-technical stakeholders into data-based conversations that happen daily around a business.

 

The future of Jupyter notebooks

We look at these huge companies like Netflix already betting doubling down on Jupyter notebooks, making them integral parts of their data science infrastructure, and we imagine a future in which the Jupyter notebook is also the singular interface data scientists use to communicate and deliver results organisation-wide.

Jupyter notebooks and similar tools are becoming the new Excel. In order to reap the full benefit of this phenomenon, the knowledge within these documents needs to be communicated effectively across entire organizations, so everyone — I mean everyone — can learn & apply data insights to their respective roles to drive business value. At Kyso we want to help you do just that.

Kyle O'Shea
Kyle O'Shea
With a strong background in economics, Kyle started his career in financial analytics. He now heads up the Kyso Community and our internal data science team. He likes to visualize data that help us to better understand economic and social inequality issues.

Related Posts

Jupyter notebooks and organisational data science

Leverage the power of Jupyter notebooks to supercharge the way your business does data science  
Kyle O'Shea Sep 14, 2020 6:56:18 PM

Managing Organisational Data Science

Why every company & group needs a central data insight system  
Kyle O'Shea Sep 4, 2020 3:28:23 PM

Empowerment Loops via Network Effects in Data Science

Become more data driven by leveraging network effects   Introduction Brett Bivens of Venture Desktop and ...
Kyle O'Shea Sep 1, 2020 6:39:20 PM