The DataLead

by Datalore

Hi,

It’s Jodie here, your host for Season 2 of DataLead – an email series where we talk about AI for DataLeaders. A few weeks ago, while browsing LinkedIn, it felt like most of my feed was either talking about or generated by AI. In fact, I get a similar feeling wherever I go online: social media, blogs, and even news websites. Do you feel the same way? Let me know by replying to this email.

But why is everyone so fascinated by AI models? And how did we get to such a point that the whole world is talking more about “AI” than they are about “technology”?

What exactly is AI?

AI, or artificial intelligence, has actually been a field since the 1950s, and it encompasses all attempts to teach computers parts of human cognition, such as reasoning, problem-solving, and perception. It contains a number of subfields, including:

Natural language processing (NLP), which works with problems involving human languages, such as text summarisation and classification.
Computer vision, which works with images and videos, doing tasks like image classification and object recognition.
Signal processing, which involves processing and interpreting a diverse range of signals, including music and speech.

So, as you can see, the term AI has always been very broad – it’s not just the latest marketing buzzword!

Generative AI is a subset of each of these fields and refers to models capable of creating original outputs like text, images, music, and speech, such as generative large language models (LLMs).

While these models have gained significant attention recently, they remain a small part of the overall AI landscape, which still largely focuses on solving more established challenges.

AI landscape

Where did generative AI come from?

Although generative AI models may have appeared to come from nowhere, they are in fact based on decades of research within the above fields. The biggest breakthroughs that led to the development of these models include:

The development of neural networks, a type of machine learning algorithm designed to mimic how the human brain works (from the 1940s onwards).
The introduction of CUDA technology in 2006, which adapted GPUs to handle the heavy computational demands of these models.
The development of massive text and image datasets, like ImageNet and Common Crawl, starting from 2008.
The development of a specific type of neural network called the transformer network in 2017. This type of network scales very easily and forms the foundation of LLMs, as well as a number of image and music generation models.

Where did generative AI come from?

How do I make sense of the current AI models?

Now that we’ve understood where these models have come from, we can start to decipher the current landscape of generative models. While this feels overwhelming – it sometimes feels like a new model is released almost weekly – we can simplify this significantly by separating generative AI models into proprietary and open-source varieties.

Proprietary models, like Gemini and DALL•E, are powerful and user-friendly and are usually provided through well-documented APIs hosted by their vendors. These models are convenient but come with potential downsides, such as cost, the risk of vendor lock-in, and concerns about the transparency of the data used for training.
Open-source models, like LLaMA 3.1 and DeepFloyd IF, offer greater flexibility, customization, and transparency but require a higher level of technical expertise to implement and manage effectively.

The second thing we can look at when deciding between models is leaderboards, which rank the performance of models against each other. The two most famous have been created to compare LLMs: the Open LLM leaderboard compares how open source LLMs perform on a number of specific tasks, while the LMSYS Chatbot Arena leaderboard ranks all LLMs based on people’s preferences of which model they like best. Other leaderboard projects are being created to compare image generation models.

What are successful applications of AI?

Let’s end this episode by looking at how LLMs have been successfully used so far. One of the most successful current applications of AI is the use of LLMs as assistants. For example, LLMs have been readily adopted as coding assistants, with around a third of developers saying they use these products regularly to complete routine tasks 20–50%^[1] faster.

At JetBrains, we have a number of successful integrations of LLMs into our products that enable developers and data scientists to work faster. Within JetBrains Datalore, we've integrated AI seamlessly into your workflow with Jupyter notebooks, allowing data scientists to quickly generate code from natural language, find and fix errors, and even get suggestions on how to analyze a dataset.

You can try this out yourself free for 14 days with Datalore Cloud, while also getting to experience Datalore’s other features, such as intelligent coding assistance, real-time collaboration, data integrations, and seamless data management that make it ideal for team-based and production-ready data science projects.

Wrapping up

I hope this overview has helped decode what people might mean when they say “AI” and given you a foothold in this complex area. Stay tuned for the next episode, where we’ll get into the numbers as we discuss the potential costs of running your own LLM-based application.

Best,
Jodie

JetBrains
The Drive to Develop

[1] McKinsey Report, A coding boost from AI

Previous episodes

The DataLead by Datalore

What exactly is AI?

Where did generative AI come from?

How do I make sense of the current AI models?

What are successful applications of AI?

Wrapping up

The DataLead

by Datalore