The weekly podcast about the Python programming language, its ecosystem, and its community. Tune in for engaging, educational, and technical discussions about the broad range of industries, individuals, and applications that rely on Python.
Computers are excellent at following detailed instructions, but they have no capacity for understanding the information that they work with. Knowledge graphs are a way to approximate that capability by building connections between elements of data that allow us to discover new connections among disparate information sources that were previously uknown. In our day-to-day work we encounter many instances of knowledge graphs, but building them has long been a difficult endeavor. In order to make this technology more accessible Tom Grek built Zincbase. In this episode he explains his motivations for starting the project, how he uses it in his daily work, and how you can use it to create your own knowledge engine and begin discovering new insights of your own.
Docker is a useful technology for packaging and deploying software to production environments, but it also introduces a different set of complexities that need to be understood. In this episode Itamar Turner-Trauring shares best practices for running Python workloads in production using Docker. He also explains some of the security implications to be aware of and digs into ways that you can optimize your build process to cut down on wasted developer time. If you are using Docker, thinking about using it, or just heard of it recently then it is worth your time to listen and learn about some of the cases you might not have considered.
The Python language has seen exponential growth in popularity and usage over the past decade. This has been driven by industry trends such as the rise of data science and the continued growth of complex web applications. It is easy to think that there is no threat to the continued health of Python, its ecosystem, and its community, but there are always outside factors that may pose a threat in the long term. In this episode Russell Keith-Magee reprises his keynote from PyCon US in 2019 and shares his thoughts on potential black swan events and what we can do as engineers and as a community to guard against them.
Project management is a discipline that has been through many incarnations, spawning an entire industry of businesses and tools. The challenge is to build a platform that is sufficiently powerful and adaptable to fit the workflow of your teams, while remaining opinionated enough to be useful. It also helps to have an open and extensible platform that can be customized as needed. In this episode Pablo Ruiz Múzquiz explains the motivation for creating the open source tool Taiga, how it compares to the other options in the market, and how you can use it for your own projects. He also discusses the challenges inherent to project management tools, his philosophies on what makes a project successful, and how to manage your team workflows to be most effective. It was helpful learning from Pablo's long experience in the software industry and managing teams of various sizes.
When your software projects start to scale it becomes a greater challenge to understand and maintain all of the pieces. In this episode Henry Percival shares his experiences working with domain driven design in large Python projects. He explains how it is helpful, and how you can start using it for your own applications. This was an informative conversation about software architecture patterns for large organizations and how they can be used by Python developers.
Machine learning is growing in popularity and capability, but for a majority of people it is still a black box that we don't fully understand. The team at MindsDB is working to change this state of affairs by creating an open source tool that is easy to use without a background in data science. By simplifying the training and use of neural networks, and making their logic explainable, they hope to bring AI capabilities to more people and organizations. In this interview George Hosu and Jorge Torres explain how MindsDB is built, how to use it for your own purposes, and how they view the current landscape of AI technologies. This is a great episode for anyone who is interested in experimenting with machine learning and artificial intelligence. Give it a listen and then try MindsDB for yourself.
One of the secrets of the success of Python the language is the tireless efforts of the people who work with and for the Python Software Foundation. They have made it their mission to ensure the continued growth and success of the language and its community. In this episode Ewa Jodlowska, the executive director of the PSF, discusses the history of the foundation, the services and support that they provide to the community and language, and how you can help them succeed in their mission.
Algorithmic trading is a field that has grown in recent years due to the availability of cheap computing and platforms that grant access to historical financial data. QuantConnect is a business that has focused on community engagement and open data access to grant opportunities for learning and growth to their users. In this episode CEO Jared Broad and senior engineer Alex Catarino explain how they have built an open source engine for testing and running algorithmic trading strategies in multiple languages, the challenges of collecting and serving currrent and historical financial data, and how they provide training and opportunity to their community members. If you are curious about the financial industry and want to try it out for yourself then be sure to listen to this episode and experiment with the QuantConnect platform for free.
The knowledge and effort required for building a fully functional web application has grown at an accelerated rate over the past several years. This introduces a barrier to entry that excludes large numbers of people who could otherwise be producing valuable and interesting services. To make the onramp easier Meredydd Luff and Ian Davies created Anvil, a platform for full stack web development in pure Python. In this episode Meredydd explains how the Anvil platform is built and how you can use it to build and deploy your own projects. He also shares some examples of people who were able to create profitable businesses themselves because of the reduced complexity. It was interesting to get Meredydd's perspective on the state of the industry for web development and hear his vision of how Anvil is working to make it available for everyone.
Serverless computing is a recent category of cloud service that provides new options for how we build and deploy applications. In this episode Raghu Murthy, founder of DataCoral, explains how he has built his entire business on these platforms. He explains how he approaches system architecture in a serverless world, the challenges that it introduces for local development and continuous integration, and how the landscape has grown and matured in recent years. If you are wondering how to incorporate serverless platforms in your projects then this is definitely worth your time to listen to.
One of the biggest pain points when working with data is getting is dealing with the boilerplate code to load it into a usable format. Intake encapsulates all of that and puts it behind a single API. In this episode Martin Durant explains how to use the Intake data catalogs for encapsulating source information, how it simplifies data science workflows, and how to incorporate it into your projects. It is a lightweight way to enable collaboration between data engineers and data scientists in the PyData ecosystem.
Learning to program can be a frustrating process, because even the simplest code relies on a complex stack of other moving pieces to function. When working with a microcontroller you are in full control of everything so there are fewer concepts that need to be understood in order to build a functioning project. CircuitPython is a platform for beginner developers that provides easy to use abstractions for working with hardware devices. In this episode Scott Shawcroft explains how the project got started, how it relates to MicroPython, some of the cool ways that it is being used, and how you can get started with it today. If you are interested in playing with low cost devices without having to learn and use C then give this a listen and start tinkering!
Being able to control a computer with your voice has rapidly moved from science fiction to science fact. Unfortunately, the majority of platforms that have been made available to consumers are controlled by large organizations with little incentive to respect users' privacy. The team at Snips are building a platform that runs entirely off-line and on-device so that your information is always in your control. In this episode Adrien Ball explains how the Snips architecture works, the challenges of building a speech recognition and natural language understanding toolchain that works on limited resources, and how they are tackling issues around usability for casual consumers. If you have been interested in taking advantage of personal voice assistants, but wary of using commercially available options, this is definitely worth a listen.
The U.S. government has a vast quantity of software projects across the various agencies, and many of them would benefit from a modern approach to development and deployment. The U.S. Digital Services Agency has been tasked with making that happen. In this episode the current director of engineering for the USDS, David Holmes, explains how the agency operates, how they are using Python in their efforts to provide the greatest good to the largest number of people, and why you might want to get involved. Even if you don't live in the U.S.A. this conversation is worth listening to so you can see an interesting model of how to improve government services for everyone.
Most programming is deterministic, relying on concrete logic to determine the way that it operates. However, there are problems that require a way to work with uncertainty. PyMC3 is a library designed for building models to predict the likelihood of certain outcomes. In this episode Thomas Wiecki explains the use cases where Bayesian statistics are necessary, how PyMC3 is designed and implemented, and some great examples of how it is being used in real projects.
Managing an event is rife with inherent complexity that scales as you move from scheduling a meeting to organizing a conference. Indico is a platform built at CERN to handle their efforts to organize events such as the Computing in High Energy Physics (CHEP) conference, and now it has grown to manage booking of meeting rooms. In this episode Adrian Mönnich, core developer on the Indico project, explains how it is architected to facilitate this use case, how it has evolved since its first incarnation two decades ago, and what he has learned while working on it. The Indico platform is definitely a feature rich and mature platform that is worth considering if you are responsible for organizing a conference or need a room booking system for your office.
The CPython interpreter has been the primary implementation of the Python runtime for over 20 years. In that time other options have been made available for different use cases. The most recent entry to that list is RustPython, written in the memory safe language Rust. One of the added benefits is the option to compile to WebAssembly, offering a browser-native Python runtime. In this episode core maintainers Windel Bouwman and Adam Kelly explain how the project got started, their experience working on it, and the plans for the future. Definitely worth a listen if you are curious about the inner workings of Python and how you can get involved in a relatively new project that is contributing to new options for running your code.
Version control has become table stakes for any software team, but for machine learning projects there has been no good answer for tracking all of the data that goes into building and training models, and the output of the models themselves. To address that need Dmitry Petrov built the Data Version Control project known as DVC. In this episode he explains how it simplifies communication between data scientists, reduces duplicated effort, and simplifies concerns around reproducing and rebuilding models at different stages of the projects lifecycle. If you work as part of a team that is building machine learning models or other data intensive analysis then make sure to give this a listen and then start using DVC today.
Ecommerce is an industry that has largely faded into the background due to its ubiquity in recent years. Despite that, there are new trends emerging and room for innovation, which is what the team at Mirumee focuses on. To support their efforts, they build and maintain the open source Saleor framework for Django as a way to make the core concerns of online sales easy and painless. In this episode Mirek Mencel and Patryk Zawadzki discuss the projects that they work on, the current state of the ecommerce industry, how Saleor fits with their technical and business strategy, and their predictions for the near future of digital sales.
Naomi Ceder was fortunate enough to learn Python from Guido himself. Since then she has contributed books, code, and mentorship to the community. Currently she serves as the chair of the board to the Python Software Foundation, leads an engineering team, and has recently completed a new draft of the Quick Python Book. In this episode she shares her story, including a discussion of her experience as a technical author and a detailed account of the role that the PSF plays in supporting and growing the community.
Python has become one of the dominant languages for data science and data analysis. Wes McKinney has been working for a decade to make tools that are easy and powerful, starting with the creation of Pandas, and eventually leading to his current work on Apache Arrow. In this episode he discusses his motivation for this work, what he sees as the current challenges to be overcome, and his hopes for the future of the industry.
The current buzz in data science and big data is around the promise of deep learning, especially when working with unstructured data. One of the most popular frameworks for building deep learning applications is PyTorch, in large part because of their focus on ease of use. In this episode Adam Paszke explains how he started the project, how it compares to other frameworks in the space such as Tensorflow and CNTK, and how it has evolved to support deploying models into production and on mobile devices.
The Redis database recently celebrated its 10th birthday. In that time it has earned a well-earned reputation for speed, reliability, and ease of use. Python developers are fortunate to have a well-built client in the form of redis-py to leverage it in their projects. In this episode Andy McCurdy and Dr. Christoph Zimmerman explain the ways that Redis can be used in your application architecture, how the Python client is built and maintained, and how to use it in your projects.
Any time that your program needs to interact with other systems it will have to deal with serializing and deserializing data. To prevent duplicate code and provide validation of the data structures that your application is consuming Steven Loria created the Marshmallow library. In this episode he explains how it is built, how to use it for rendering data objects to various serialization formats, and some of the interesting and unique ways that it is incorporated into other projects.
Chaos engineering is the practice of injecting failures into your production systems in a controlled manner to identify weaknesses in your applications. In order to build, run, and report on chaos experiments Sylvain Hellegouarch created the Chaos Toolkit. In this episode he explains his motivation for creating the toolkit, how to use it for improving the resiliency of your systems, and his plans for the future. He also discusses best practices for building, running, and learning from your own experiments.
Music is a part of every culture around the world and throughout history. Musicology is the study of that music from a structural and sociological perspective. Traditionally this research has been done in a manual and painstaking manner, but the advent of the computer age has enabled an increase of many orders of magnitude in the scope and scale of analysis that we can perform. The music21 project is a Python library for computer aided musicology that is written and used by MIT professor Michael Scott Cuthbert. In this episode he explains how the project was started, how he is using it personally, professionally, and in his lectures, as well as how you can use it for your own exploration of musical analysis.
Software development is a career that attracts people from all backgrounds, and Python in particular helps to make it an approachable occupation. Because of the variety of paths that can be taken it is becoming increasingly common for practitioners to bypass the traditional computer science education. In this episode David Kopec discusses some of the classic problems that he has found most useful to understand in his work as a professor and practitioner of software engineering. He shares his motivation for writing the book "Classic Computer Science Problems In Python", the practical approach that he took, and an overview of how the contents can be used in your day-to-day work.
As a developer and user of open source code, you interact with software and digital media every day. What is often overlooked are the rights and responsibilities conveyed by the intellectual property that is implicit in all creative works. Software licenses are a complicated legal domain in their own right, and they can often conflict with each other when you factor in the web of dependencies that your project relies on. In this episode Luis Villa, Co-Founder of Tidelift, explains the catagories of software licenses, how to select the right one for your project, and what to be aware of when you contribute to someone else's code.
As we build software projects, complexity and technical debt are bound to creep into our code. To counteract these tendencies it is necessary to calculate and track metrics that highlight areas of improvement so that they can be acted on. To aid in identifying areas of your application that are breeding grounds for incidental complexity Anthony Shaw created Wily. In this episode he explains how Wily traverses the history of your repository and computes code complexity metrics over time and how you can use that information to guide your refactoring efforts.
Computers have found their way into virtually every area of human endeavor, and archaeology is no exception. To aid his students in their exploration of digital archaeology Shawn Graham helped to create an online, digital textbook with accompanying interactive notebooks. In this episode he explains how computational practices are being applied to archaeological research, how the Online Digital Archaeology Textbook was created, and how you can use it to get involved in this fascinating area of research.
Every day there are satellites collecting sensor readings and imagery of our Earth. To help make sense of that information, developers at the meterological institutes of Sweden and Denmark worked together to build a collection of Python packages that simplify the work of downloading and processing the data gathered by satellites. In this episode one of the core developers of PyTroll explains how the project got started, how that data is being used by the scientific community, and how citizen scientists like you are getting involved.
The web has spawned numerous methods for communicating between applications, including protocols such as SOAP, XML-RPC, and REST. One of the newest entrants is GraphQL which promises a simplified approach to client development and reduced network requests. To make implementing these APIs in Python easier, Syrus Akbary created the Graphene project. In this episode he explains the origin story of Graphene, how GraphQL compares to REST, how you can start using it in your applications, and how he is working to make his efforts sustainable.
Real-time communication over the internet is an amazing feat of modern engineering. The protocol that powers a majority of video calling platforms is WebRTC. In this episode Jeremy Lainé explains why he wrote a Python implementation of this protocol in the form of AIORTC. He also discusses how it works, how you can use it in your own projects, and what he has planned for the future.
Using computers to analyze text can produce useful and inspirational insights. However, when working with multiple languages the capabilities of existing models are severely limited. In order to help overcome this limitation Rami Al-Rfou built Polyglot. In this episode he explains his motivation for creating a natural language processing library with support for a vast array of languages, how it works, and how you can start using it for your own projects. He also discusses current research on multi-lingual text analytics, how he plans to improve Polyglot in the future, and how it fits in the Python ecosystem.
Do you know what your servers are doing? If you have a metrics system in place then the answer should be "yes". One critical aspect of that platform is the timeseries database that allows you to store, aggregate, analyze, and query the various signals generated by your software and hardware. As the size and complexity of your systems scale, so does the volume of data that you need to manage which can put a strain on your metrics stack. Julien Danjou built Gnocchi during his time on the OpenStack project to provide a time oriented data store that would scale horizontally and still provide fast queries. In this episode he explains how the project got started, how it works, how it compares to the other options on the market, and how you can start using it today to get better visibility into your operations.
Keeping up with the work being done in the Python community can be a full time job, which is why Dan Bader has made it his! In this episode he discusses how he went from working as a software engineer, to offering training, to now managing both the Real Python and PyCoders properties. He also explains his strategies for tracking and curating the content that he produces and discovers, how he thinks about building products, and what he has learned in the process of running his businesses.
Digital books are convenient and useful ways to have easy access to large volumes of information. Unfortunately, keeping track of them all can be difficult as you gain more books from different sources. Keeping your reading device synchronized with the material that you want to read is also challenging. In this episode Kovid Goyal explains how he created the Calibre digital library manager to solve these problems for himself, how it grew to be the most popular application for organizing ebooks, and how it works under the covers. Calibre is an incredibly useful piece of software with a lot of hidden complexity and a great story behind it.
Investigative reporters have a challenging task of identifying complex networks of people, places, and events gleaned from a mixed collection of sources. Turning those various documents, electronic records, and research into a searchable and actionable collection of facts is an interesting and difficult technical challenge. Friedrich Lindenberg created the Aleph project to address this issue and in this episode he explains how it works, why he built it, and how it is being used. He also discusses his hopes for the future of the project and other ways that the system could be used.
The Python Community is large and growing, however a majority of articles, books, and presentations are still in English. To increase the accessibility for Spanish language speakers, Maricela Sanchez helped to create the Charlas track at PyCon US, and is an organizer for Python Day Mexico. In this episode she shares her motivations for getting involved in community building, her experiences working on Python Day Mexico and PyCon Charlas, and the lessons that she has learned in the process.
As data science becomes more widespread and has a bigger impact on the lives of people, it is important that those projects and products are built with a conscious consideration of ethics. Keeping ethical principles in mind throughout the lifecycle of a data project helps to reduce the overall effort of preventing negative outcomes from the use of the final product. Emily Miller and Peter Bull of Driven Data have created Deon to improve the communication and conversation around ethics among and between data teams. It is a Python project that generates a checklist of common concerns for data oriented projects at the various stages of the lifecycle where they should be considered. In this episode they discuss their motivation for creating the project, the challenges and benefits of maintaining such a checklist, and how you can start using it today.
The breadth of use cases that Python supports, coupled with the level of productivity that it provides through its ease of use have contributed to the incredible popularity of the language. To explore the ways that it can contribute to the success of a young and growing startup two of the lead engineers at Wanderu discuss their experiences in this episode. Matt Warren, the technical operations lead, explains the ways that he is using Python to build and scale the infrastructure that Wanderu relies on, as well as the ways that he deploys and runs the various Python applications that power the business. Chris Kirkos, the lead software architect, describes how the original Django application has grown into a suite of microservices, where they have opted to use a different language and why, and how Python is still being used for critical business needs. This is a great conversation for understanding the business impact of the Python language and ecosystem.
Many people learn to program because of their interest in building their own video games. Once the necessary skills have been acquired, it is often the case that the original idea of creating a game is forgotten in favor of solving the problems we confront at work. Game jams are a great way to get inspired and motivated to finally write a game from scratch. This week Daniel Pope discusses the origin and format for PyWeek, his experience as a participant, and the landscape of options for building a game in Python. He also explains how you can register and compete in the next competition.
Any application that communicates with other systems or services will at some point require a credential or sensitive piece of information to operate properly. The question then becomes how best to securely store, transmit, and use that information. The world of software secrets management is vast and complicated, so in this episode Brian Kelly, engineering manager at Conjur, aims to help you make sense of it. He explains the main factors for protecting sensitive information in your software development and deployment, ways that information might be leaked, and how to get the whole team on the same page.
There are many aspects of learning how to program and at least as many ways to go about it. This is multiplicative with the different problem domains and subject areas where software development is applied. In this episode William Vincent discusses his experiences learning how web development mid-career and then writing a series of books to make the learning curve for Django newcomers shallower. This includes his thoughts on the business aspects of technical writing and teaching, the challenges of keeping content up to date with the current state of software, and the ever-present lack of sufficient information for new programmers.
Maintaining the health and well-being of your software is a never-ending responsibility. Automating away as much of it as possible makes that challenge more achievable. In this episode Anthony Sottile describes his work on the pre-commit framework to simplify the process of writing and distributing functions to make sure that you only commit code that meets your definition of clean. He explains how it supports tools and repositories written in multiple languages, enforces team standards, and how you can start using it today to ship better software.
How secure are your servers? The best way to be sure that your systems aren't being compromised is to do it yourself. In this episode Daniel Goldberg explains how you can use his project Infection Monkey to run a scan of your infrastructure to find and fix the vulnerabilities that can be taken advantage of. He also discusses his reasons for building it in Python, how it compares to other security scanners, and how you can get involved to keep making it better.
The need to process unbounded and continually streaming sources of data has become increasingly common. One of the popular platforms for implementing this is Kafka along with its streams API. Unfortunately, this requires all of your processing or microservice logic to be implemented in Java, so what's a poor Python developer to do? If that developer is Ask Solem of Celery fame then the answer is, help to re-implement the streams API in Python. In this episode Ask describes how Faust got started, how it works under the covers, and how you can start using it today to process your fast moving data in easy to understand Python code. He also discusses ways in which Faust might be able to replace your Celery workers, and all of the pieces that you can replace with your own plugins.
Writing a book is hard work, especially when you are trying to teach such a broad concept as programming. In this episode Ana Bell discusses her recent work in writing Get Programming: Learn To Code With Python, including her views on how to separate the principles from the implementation, making the book evergreen in its appeal, and how her experience as a lecturer at MIT has helped her maintain the perspectives of beginners. She also shares her views on the values of learning about programming, even when you have no intention of doing it as a career and ways to take the next steps if that is your goal.
Masonite is an ambitious new web framework that draws inspiration from many other successful projects in other languages. In this episode Joe Mancuso, the primary author and maintainer, explains his goal of unseating Django from its position of prominence in the Python community. He also discusses his motivation for building it, how it is architected, and how you can start using it for your own projects.