José, data team lead at a tech giant, plans to transition from an in-house data platform to a scalable vendor solution. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Subscribe     

The DataLead

by Datalore

Hi there,

José leads a 100-strong squad of data scientists, machine learning engineers, researchers, and data engineers. His leadership philosophy is rooted in collaboration, innovation, and continuous learning – principles that keep his team at the cutting edge of data science technologies.

When Jose attended the ODSC West event in November 2023, he was impressed at how much data science tooling had evolved in such a relatively short period of time. 

At the same time, his team was still reliant on an in-house data science platform centered around open-source Jupyter notebooks and JupyterHub running in their Kubernetes cluster.

After a recent round of layoffs, José lost the members of his team who were responsible for maintaining the platform, which meant there was no potential for further customization or improvements – even basic support would no longer be provided. 

Just yesterday, for example, a group of data scientists were unable to train their machine learning model because there were insufficient computing resources in the cluster. And nobody was there to fix it until today, when a colleague from another department volunteered to help.

José took this as a sign that it’s time to move on and migrate from the in-house platform to a vendor solution – one that is more scalable and robust.

But such a move would require significant investment. To justify the expenditure to his boss, José prepares a comparative table of the benefits the team would enjoy after switching to a vendor solution.

In-house maintained JupyterHub + Jupyter notebooks Vendor data science platform
Engineering resource allocation
3 FTE 0.3 FTE
Licensing and support
Free under BSD license; no support included Flexible pricing tiers; customer support included
Built-in data connections to SQL databases and cloud storage
Via Python APIs that require internal development; credentials are exposed to the computational environment No-code connectors to SQL databases and cloud storage, ability to combine SQL and Python in one notebook; credentials are securely stored
Real-time collaboration in Jupyter notebooks
Available in Beta An industry standard
Coding assistance
Via plugins; potential security risks Integrated code completion
Interactive shareable reports
No such functionality; requirement to pay for another tool to turn notebooks into stakeholder reports Platforms provide this feature as part of their offering
Computational resources scalability and monitoring
Requires deep technical expertise; not transparent More straightforward and could be done by the team lead alone
Environment configuration
Host environment for all notebooks by default, which is cumbersome to configure and manage Custom environments available for each notebook
AI code generation
Available outside JupyterHub, hence security risks for users submitting private data to third-party providers Available as part of vendor offerings; clear terms of service
Versioning
Available only via the Git plugin Built-in version control designed for team collaboration and reproducibility
Low-code and no-code widgets
Via plugins only Integrated within Jupyter notebooks
Notebook APIs
Not available Available

It is evident to José that the productivity gains would be massive and more than enough to cover all the tooling and migration expenses, so he spends the night calculating the ROI for the next five years. Now he’s ready to present the research to his boss and get the green light.

Is Jose missing something important? Do you think he will succeed? Let us know by replying to this email.

Share with friends:

facebook
Twitter
Linkedin
Previous episodes

Datalore
Privacy policy


Our mailing address: JetBrains s.r.o., Na Hřebenech II 1718/8, 14000 Prague 4, Czech Republic