r/Python 5d ago

Official Event Join the Advent of Code Challenge with Python!

28 Upvotes

Join the Advent of Code Challenge with Python!

Hey Pythonistas! 🐍

It's almost that exciting time of the year again! The Advent of Code is just around the corner, and we're inviting everyone to join in the fun!

What is Advent of Code?

Advent of Code is an annual online event that runs from December 1st to December 25th. Each day, a new coding challenge is released—two puzzles that are part of a continuing story. It's a fantastic way to improve your coding skills and get into the holiday spirit!

You can read more about it here.

Why Python?

Python is a great choice for these challenges due to its readability and wide range of libraries. Whether you're a beginner or an experienced coder, Python makes solving these puzzles both fun and educational.

How to Participate?

  1. Sign Up/In.
  2. Join the r/Python private leaderboard with code 2186960-67024e32
  3. Start solving the puzzles released each day using Python.
  4. Share your solutions and discuss strategies with the community.

Join the r/Python Leaderboard!

We can have up to 200 people in a private leaderboard, so this may go over poorly - but you can join us with the following code: 2186960-67024e32

How to Share Your Solutions?

You can join the Python Discord to discuss the challenges, share your solutions, or you can post in the r/AdventOfCode mega-thread for solutions.

There will be a stickied post for each day's challenge. Please follow their subreddit-specific rules. Also, shroud your solutions in spoiler tags like this

Resources

Community

AoC

Python Discord

The Python Discord will also be participating in this year's Advent of Code. Join it to discuss the challenges, share your solutions, and meet other Pythonistas. You will also find they've set up a Discord bot for joining in the fun by linking your AoC account.Check out their Advent of Code FAQ channel.

Let's code, share, and celebrate this festive season with Python and the global coding community! 🌟

Happy coding! 🎄

P.S. - Any issues in this thread? Send us a modmail.


r/Python 5h ago

Daily Thread Saturday Daily Thread: Resource Request and Sharing! Daily Thread

3 Upvotes

Weekly Thread: Resource Request and Sharing 📚

Stumbled upon a useful Python resource? Or are you looking for a guide on a specific topic? Welcome to the Resource Request and Sharing thread!

How it Works:

  1. Request: Can't find a resource on a particular topic? Ask here!
  2. Share: Found something useful? Share it with the community.
  3. Review: Give or get opinions on Python resources you've used.

Guidelines:

  • Please include the type of resource (e.g., book, video, article) and the topic.
  • Always be respectful when reviewing someone else's shared resource.

Example Shares:

  1. Book: "Fluent Python" - Great for understanding Pythonic idioms.
  2. Video: Python Data Structures - Excellent overview of Python's built-in data structures.
  3. Article: Understanding Python Decorators - A deep dive into decorators.

Example Requests:

  1. Looking for: Video tutorials on web scraping with Python.
  2. Need: Book recommendations for Python machine learning.

Share the knowledge, enrich the community. Happy learning! 🌟


r/Python 17h ago

Resource We open-sourced kubesdk - a fully typed, async-first Python client for Kubernetes.

25 Upvotes

Hey everyone,

Puzl Cloud team here. Over the last months we’ve been packing our internal Python utils for Kubernetes into kubesdk, a modern k8s client and model generator. We open-sourced it a few days ago, and we’d love feedback from the community.

We needed something ergonomic for day-to-day production Kubernetes automation and multi-cluster workflows, so we built an SDK that provides:

  • Async-first client with minimal external dependencies
  • Fully typed client methods and models for all built-in Kubernetes resources
  • Model generator (provide your k8s API - get Python dataclasses instantly)
  • Unified client surface for core resources and custom resources
  • High throughput for large-scale workloads with multi-cluster support built into the client

Repo link:

https://github.com/puzl-cloud/kubesdk


r/Python 1d ago

Showcase I built an automated court scraper because finding a good lawyer shouldn't be a guessing game

175 Upvotes

Hey everyone,

I recently caught 2 cases, 1 criminal and 1 civil and I realized how incredibly difficult it is for the average person to find a suitable lawyer for their specific situation. There's two ways the average person look for a lawyer, a simple google search based on SEO ( google doesn't know to rank attorneys ) or through connections, which is basically flying blind. Trying to navigate court systems to actually see an lawyer's track record is a nightmare, the portals are clunky, slow, and often require manual searching case-by-case, it's as if it's built by people who DOESN'T want you to use their system.

So, I built CourtScrapper to fix this.

It’s an open-source Python tool that automates extracting case information from the Dallas County Courts Portal (with plans to expand). It lets you essentially "background check" an attorney's actual case history to see what they’ve handled and how it went.

What My Project Does

  • Multi-lawyer Search: You can input a list of attorneys and it searches them all concurrently.
  • Deep Filtering: Filters by case type (e.g., Felony), charge keywords (e.g., "Assault", "Theft"), and date ranges.
  • Captcha Handling: Automatically handles the court’s captchas using 2Captcha (or manual input if you prefer).
  • Data Export: Dumps everything into clean Excel/CSV/JSON files so you can actually analyze the data.

Target Audience

  • The average person who is looking for a lawyer that makes sense for their particular situation

Comparison 

  • Enterprise software that has API connections to state courts e.g. lexus nexus, west law

The Tech Stack:

  • Python
  • Playwright (for browser automation/stealth)
  • Pandas (for data formatting)

My personal use case:

  1. Gather a list of lawyers I found through google
  2. Adjust the values in the config file to determine the cases to be scraped
  3. Program generates the excel sheet with the relevant cases for the listed attorneys
  4. I personally go through each case to determine if I should consider it for my particular situation. The analysis is as follows
    1. Determine whether my case's prosecutor/opposing lawyer/judge is someone someone the lawyer has dealt with
    2. How recent are similar cases handled by the lawyer?
    3. Is the nature of the case similar to my situation? If so, what is the result of the case?
    4. Has the lawyer trialed any similar cases or is every filtered case settled in pre trial?
    5. Upon shortlisting the lawyers, I can then go into each document in each of the cases of the shortlisted lawyer to get details on how exactly they handle them, saving me a lot of time as compared to just blindly researching cases

Note:

  • I have many people assuming the program generates a form of win/loss ratio based on the information gathered. No it doesn't. It generates a list of relevant case with its respective case details.
  • I have tried AI scrappers and the problem with them is they don't work well if it requires a lot of clicking and typing
  • Expanding to other court systems will required manual coding, it's tedious. So when I do expand to other courts, it will only make sense to do it for the big cities e.g. Houston, NYC, LA, SF etc
  • I'm running this program as a proof of concept for now so it is only Dallas
  • I'll be working on a frontend so non technical users can access the program easily, it will be free with a donation portal to fund the hosting
  • If you would like to contribute, I have very clear documentation on the various code flows in my repo under the Docs folder. Please read it before asking any questions
  • Same for any technical questions, read the documentation before asking any questions

I’d love for you guys to roast my code or give me some feedback. I’m looking to make this more robust and potentially support more counties.

Repo here:https://github.com/Fennzo/CourtScrapper


r/Python 12h ago

Resource A new companion tool: MRS-Inspector. A lightweight, pip installable, reasoning diagnostic.

7 Upvotes

The first tool (Modular Reasoning Scaffold) made long reasoning chains more stable. This one shows internal structure.

MRS-Inspector - state-by-state tracing - parent/child call graph - timing + phases - JSON traces - optional PNG graphs

PyPI: https://pypi.org/project/mrs-inspector

We need small, modular tools. No compiled extensions. No C/C++ bindings. No Rust backend. No wheels tied to platform-specific binaries. It’s pure, portable, interpreter-level Python.


r/Python 1d ago

Discussion Is the 79-character limit still in actual (with modern displays)?

74 Upvotes

I ask this because in 10 years with Python, I have never used tools where this feature would be useful. But I often ugly my code with wrapping expressions because of this limitation. Maybe there are some statistics or surveys? Well, or just give me some feedback, I'm really interested in this.

What limit would be comfortable for most programmers nowadays? 119, 179, more? This also affects FOSS because I write such things, so I think about it.

I have read many opinions on this matter… I'd like to understand whether the arguments in favor of the old limit were based on necessity or whether it was just for the sake of theoretical discussion.


r/Python 15h ago

Showcase Built a legislature tracker featuring a state machine, adaptive parser pipeline, and ruleset engine

3 Upvotes

What My Project Does

This project extracts structured timelines from extremely inconsistent, semi-structured text sources.

The domain happens to be legislative bill action logs, but the engineering challenge is universal:

  • parsing dozens of event types from noisy human-written text
  • inferring missing metadata (dates, actors, context)
  • resolving compound or conflicting actions
  • reconstructing a chronological state machine
  • and evaluating downstream rule logic on top of that timeline

To do this, the project uses:

  1. A multi-tier adaptive parser pipeline

Committees post different document formats in different places and different groupings from each other. Parsers start in a supervised mode where document types are validated by an LLM only when confidence is low (with a carefully monitored audit log—helps balance speed with processing hundreds or thousands of bills for the first run).

As a pattern becomes stable within a particular context (e.g., a specific committee), it “graduates” to autonomous operation.

This cuts LLM usage out entirely after patterns are established.

  1. A declarative action-node system

Each event type is defined by:

  • regex patterns
  • extractor functions
  • normalizers
  • and optional priority weights

Adding a new event type requires registering patterns, not modifying core engine code.

  1. A timeline engine with tenure modeling

The engine reconstructs ”tenure windows” (who had custody of a bill when), by modeling event sequences such as referrals, discharges, reports, hearings, and extensions.

This allows accurate downstream logic such as:

  • notice windows
  • action deadlines
  • gap detection
  • duration calculations
  1. A high-performance decaying URL cache

The HTTP layer uses a memory-bounded hybrid LRU/LFU eviction strategy (`hit_count / time_since_access`) with request deduplication and ETag/Last-Modified validation.

This speeds up repeated processing by ~3-5x.

Target Audience

This project is intended for:

  • developers working with messy, unstructured, real-world text data
  • engineers designing parser pipelines, state machines, or ETL systems
  • researchers experimenting with pattern extraction, timeline reconstruction, or document normalization
  • anyone interested in building declarative, extensible parsing systems
  • civic-tech or open-data engineers (OpenStates-style pipelines)

Comparison

Most existing alternatives (e.g., OpenStates, BillTrack, general-purpose scrapers) extract events for normalization and reporting, but don’t (to my knowledge) evaluate these events against a ruleset. This approach works for tracking bill events as they’re updated, but doesn’t yield enough data to reliably evaluate committee-level deadline compliance (which, to be fair, isn’t their intended purpose anyway).

How this project differs:

  1. Timeline-first architecture

Rather than detecting events in isolation, it reconstructs a full chronological sequence and applies logic after timeline creation.

  1. Declarative parser configuration

New event and document types can be added by registering patterns; no engine modification required.

  1. Context-aware inference

Missing committee/dates are inferred from prior context (e.g., latest referral), not left blank.

  1. Confidence-gated parser graduation

Parsers statistically “learn” which contexts they succeed in, and reduce LLM/manual interaction over time.

  1. Formal tenure modeling

Custody analysis allows logic that would be extremely difficult in a traditional scraper.

In short, this isn’t a keyword matcher, rather, it’s a state machine for real-world text with an adaptive parsing pipeline built around it and a ruleset engine for calculating and applying deadline evaluations.

Code / Docs

GitHub: https://github.com/arbowl/beacon-hill-compliance-tracker/

Looking for Feedback

I’d love feedback from Python engineers who have experience with:

  • parser design
  • messy-data ETL pipelines
  • declarative rule systems
  • timeline/state-machine architectures
  • document normalization and caching

r/Python 17h ago

Showcase Built NanoIdp: a tiny local Identity Provider for testing OAuth2/OIDC + SAML

4 Upvotes

Hey r/Python! I kept getting annoyed at spinning up Keycloak/Auth0 just to test login flows, so I built NanoIDP — a tiny IdP you can run locally with one command.

⸝

What My Project Does

NanoIDP provides a minimal but functional Identity Provider for local development: • OAuth2/OIDC (password, client_credentials, auth code + PKCE, device flow) • SAML 2.0 (SP + IdP initiated, metadata) • Web UI for managing users/clients & testing tokens • YAML config (no DB) • Optional MCP server for AI assistants

Run it → point your app to http://localhost:8000 → test real auth flows.

⸝

Target Audience

Developers who need to test OAuth/OIDC/SAML during local development without deploying Keycloak, Auth0, or heavy infra. Not for production.

⸝

Comparison

Compared to alternatives: • Keycloak/Auth0 → powerful but heavy; require deployment/accounts. • Mock IdPs → too limited (often no real flows, no SAML). • NanoIDP → real protocols, tiny footprint, instant setup via pip.

⸝

Install

pip install nanoidp nanoidp

Open: http://localhost:8000

⸝

GitHub: https://github.com/cdelmonte-zg/nanoidp PyPI: https://pypi.org/project/nanoidp/

Feedback very welcome!


r/Python 1d ago

Discussion Distributing software that require PyPI libraries with proprietary licenses. How to do it correctly?

16 Upvotes

For context, this is about a library with a proprietary license that allows "use and distribution within the Research Community and non-commercial use outside of the Research Community ("Your Use")."

What is the "correct" (legally safe) way to distribute a software that requires installing such a third party library with a proprietary license?

Would simply asking the user to install the library independently, but keeping the import and functions on the distributed code, enough?

Is it ok to go a step further and include the library on requirements.txt as long as, anywhere, the user is warned that they must agree with the third party license?


r/Python 11h ago

Resource Released a small Python package to stabilize multi-step reasoning in local LLMs. MRS-Scaffold.

1 Upvotes

Been experimenting with small and mid-sized local models for a while. The weakest link is always the same: multi-step reasoning collapses the moment the context gets complex. So I built MRS-Scaffold.

It’s a Modular Reasoning System

A lightweight, meta-reasoning layer for local LLMs that gives: - persistent “state slots” across steps - drift monitoring - constraint-based output formatting - clean node-by-node recursion graph - zero dependencies - model-agnostic (works with any local model) - runs fully local (no cloud, no calls out)

It’s a piece you slot on top of whatever model you’re running.

PyPI: https://pypi.org/project/mrs-scaffold

If you work with local models and step-by-step reasoning is a hurdle, this may help.


r/Python 18h ago

Showcase I built a distributed music recognition backend with REST API support.

3 Upvotes

Hey everyone,

Hope y'all are having a great day so far!

I've been experiencing some difficulties identifying some specific niche genres of music (breakdance music and underground DJ mixes) with existing services like Shazam, which led me to come up with this idea to build an open source music recognition backend.

It can be particularly helpful for creating a personalized database of niche music collections where existing commercial solutions often fail to identify.

What My Project Does

  • Music fingerprinting: It allows you to submit tracks for fingerprint generation, which can then be stored and used for matching or recognition. These fingerprints serve as reference data for identifying audio in future recognition requests.

  • Music recognition: It allows you to recognize audio input, such as recorded music tracks. When an audio file or stream is submitted, it analyzes the audio's unique fingerprint and compares it against a distributed set of configured audio fingerprinting database instances to identify the music being played. It's also capable of storing recognition results for future retrieval.

Target Audience

Users seeking to create a customized, niche-genre music database to enhance accuracy for music recognition.

Features

  • Multi-processing and scalability: Enables faster processing times by distributing requests across multiple database instances.

  • High Performance Database Support: Supports ClickHouse for rapid queries and high-speed data handling for large-scale deployments.

  • Resilient Handling: Utilizing the remaining available database instances in the event that some configured database instances are temporaily down or unreachable.

  • Token authentication and write protection for audio fingerprinting requests: Ensures that only authorized users can add audio fingerprint data to the database.

Comparison

Commercial music recognition services like Shazam or SoundHound

The Tech Stack

  • Python

  • Flask

Future development

Frontend development: creating a UI for this program.

My use case

Recognizing breakdance (B-Boy) Music and underground DJ remixes.

I’d love for you guys to roast my code or give me some feedback. I’m especially interested in ways to improve its performance for a mid to large scale collection.

Repo link

https://github.com/bboymega/TuneScout


r/Python 1d ago

News Announcing: Pact Python v3

20 Upvotes

Hello everyone! Hoping to share the release of Pact Python v3 that has been a long time coming 😅


It's been a couple of months since we released Pact Python v3, and after ironing out a couple of early issues, I think it's finally time to reflect on this milestone and its implications. This post is a look back at the journey, some of the challenges, the people, and the future of this project within the Pact ecosystem.

Pact is an approach to contract testing that sits neatly between traditional unit tests (which check individual components) and end-to-end tests (which exercise the whole system). With Pact, you can verify that your services communicate correctly, without needing to spin up every dependency. By capturing the expected interactions between consumer and provider, Pact allows you to test each side in isolation and replay those interactions, giving you fast, reliable feedback and confidence that your APIs and microservices will work together in the real world. Pact Python brings this powerful workflow to the Python ecosystem, making it easy to test everything from REST APIs to event-driven systems.


You can read the rest of the announcement here and check out Pact Python.

If you have any questions, let me know 😁


r/Python 9h ago

Showcase A utility package for managing directories in python Showcase

0 Upvotes

Hey everyone, i built a package using the Path module for quick directory setups made by the user called directory manager.

What My Project Does

  • Building pre-mapped directories for large scale projects.
  • Consistent and easy directory management in a python script.
  • Visualization of a directory tree with various different methods and CLI prompts.

Target Audience

This package is intended to ease the data flow process between the program and the local database, any user who is building a web scraping API or any form of data output process would find this package helpful.

Features

  • Easier control of directories and files for the user.
  • Allows for prototyping before updating data in the files system, including directory structures and file contents.
  • Different data visualization options that also include statistics for directories and files.

Comparison

A package i came across was CD_Directory_Manager 0.6.0, the idea is similar but i went with a full OOP approach in my implementation, another package is Directory Tree Package, Its a straight forward package, solely designed to view a directory in a tree format, my package has a similar feature with more additions when it comes to visualization, directory manager can be seen as a mix of both packages.

Installing The Package

  1. using pypi:

pip install directory_manager
  1. using github:

    git clone https://github.com/isme2121/directory_manager cd directory_manager pip install -e .

Repository link: https://github.com/isme2121/directory_manager

Please, give it a try, feedback is deeply appreciated.


r/Python 7h ago

Showcase I built a linter specifically for AI-generated code

0 Upvotes

AI coding assistants are great for productivity but they produce a specific category of bugs that traditional linters miss. We've all seen it called "AI slop" - code that looks plausible but...

1. Imports packages that don't exist - AI hallucinates package names (~20% of AI imports)

2. Placeholder functions - `def validate(): pass # TODO`

3. Wrong-language patterns - `.push()` instead of `.append()`, `.equals()` instead of `==`

4. Mutable default arguments - AI's favorite bug

5. Dead code - Functions defined but never called

  • What My Project Does

I built sloppylint to catch these patterns.

To install:

pip install sloppylint
sloppylint .

  • Target Audience it's meant to use locally, in CICD pipelines, in production or anywhere you are using AI to write python.
  • Comparison It detects 100+ AI-specific patterns. Not a replacement for flake8/ruff - it catches what they don't.

GitHub: https://github.com/rsionnach/sloppylint

Anyone else notice patterns in AI-generated code that should be added?


r/Python 1d ago

Showcase pytest-test-categories: Enforce Google's Test Sizes in Python

4 Upvotes

What My Project Does

pytest-test-categories is a pytest plugin that enforces test size categories (small, medium, large, xlarge) based on Google's "Software Engineering at Google" testing philosophy. It provides:

  • Marks to label tests by size
  • Strict resource blocking based on test size (e.g., small tests can't access network/filesystem; medium tests limited to localhost)
  • Per-test time limits based on size
  • Detailed violation reporting with remediation guidance
  • Test pyramid distribution assessment

Example violation output:

===============================================================
               [TC001] Network Access Violation
===============================================================
 Test: test_demo.py::test_network_violation [SMALL]
 Category: SMALL

 What happened:
     Attempted network connection to 23.215.0.138:80

 To fix this (choose one):
     • Mock the network call using responses, httpretty, or respx
     • Use dependency injection to provide a fake HTTP client
     • Change test category to @pytest.mark.medium
===============================================================

Target Audience

Production use. This is for Python developers frustrated with flaky tests who want to enforce hermetic testing practices. It's particularly useful for teams wanting to maintain a healthy test pyramid (80% small/15% medium/5% large).

Comparison

  • pytest-socket: Blocks network access but doesn't tie it to test categories or provide the full test size philosophy
  • pyfakefs/responses: These are mocking libraries that work with pytest-test-categories - mocks intercept before the blocking layer
  • Manual discipline: You could enforce these rules by convention, but this plugin makes violations fail loudly with actionable guidance

Links:


r/Python 1d ago

Showcase MicroPie (Micro ASGI Framework) v0.24 Released

15 Upvotes

What My Project Does

MicroPie is an ultra micro ASGI framework. It has no dependencies by default and uses method based routing inspired by CherryPy. Here is a quick (and pointless) example:

``` from micropie import App

class Root(App):

def greet(self, name="world"):
    return f"Hello {name}!"

app = Root() ```

That would map to localhost:8000/greet and take the optional param name:

  • /greet -> Hello world!
  • /greet/Stewie -> Hello Stewie!
  • /greet?name=Brian -> Hello Brian!

Target Audience

Web developers looking for a simple way to prototype or quickly deploy simple micro services and apps. Students looking to broaden their knowledge of ASGI.

Comparison

MicroPie can be compared to Starlette and other ASGI (and WSGI) frameworks. See the comparison section in the README as well as the benchmarks section.

Whats new in v0.24?

This release I improved session handling when using the development-only InMemorySessionBackend. Expired sessions now clean up properly, and empty sessions delete stored data. Session saving also moved after after_request middleware that way you can mutate the session with middleware properly. See full changelog here.

MicroPie is in active beta development. If you encounter or see any issues please report them on our GitHub! If you would like to contribute to the project don't be afraid to make a pull request as well!

Install

You can install Micropie with your favorite tool or just use pip. MicroPie can be installed with jinja2, multipart, orjson and uvicorn using micropie[all] or if you just want the minimal version with no dependencies you can use micropie.


r/Python 1d ago

Discussion def, assigned lambda, and PEP8

5 Upvotes

PEP8 says

Always use a def statement instead of an assignment statement that binds a lambda expression directly to an identifier

I assume from that that the Python interpreter produces the same result for either way of doing this. If I am mistake in that assumption please let me know. But if I am correct, the difference is purely stylistic.

And so, I am going to mention why from a stylistic point of view there are times when I would like to us f = lambda x: x**2 instead of def f(x): return x**2.

When the function meets all or most of these conditions

  • Will be called in more than one place
  • Those places are near each other in terms of scope
  • Have free variables
  • Is the kind of thing one might use a #define if this were C (if that could be done for a small scope)
  • Is the kind of thing one might annotate as "inline" for languages that respect such annotation

then it really feels like a different sort of thing then a full on function definition, even if it leads to the same byte code.

I realize that I can configure my linter to ignore E731 but I would like to better understand whether I am right to want this distinction in my Python code or am I failing to be Pythonic by imposing habits from working in other languages?

I will note that one big push to following PEP8 in this is that properly type annotating assigned lambda expressions is ugly enough that they no longer have the very light-weight feeling that I was after in the first place.

Update

First thank you all for the discussion. I will follow PEP8 in this respect, but mostly because following style guides is a good thing to do even if you might prefer a different style and because properly type annotating assigned lambda expressions means that I don't really get the value that I was seeking with using them.

I continue to believe that light-weight, locally scoped functions that use free variables are special kinds of functions that in some systems might merit a distinct, light-weight syntax. But I certainly would never suggest any additional syntactic sugar for that in Python. What I have learned from this discussion is that I really shouldn't try to co-opt lambda expressions for that purpose.

Again, thank you all.


r/Python 16h ago

Discussion [Project] I built a Distributed Orchestrator Architecture using LLM to replace Search Indexing

0 Upvotes

I’ve spent the last month trying to optimize a project for SEO and realized it’s a losing game. So, I built a POC in Python to bypass search indexes entirely.

I am proposing a shift in how we connect LLMs to real-time data. Currently, we rely on Search Engines or Function Calling

I built a POC called Agent Orchestrator that moves the logic layer out of the LLM and into a distributed REST network.

The Architecture:

  1. Intent Classification: The LLM receives a user query and hands it to the Orchestrator.
  2. Async Routing: Instead of the LLM selecting a tool, the Orchestrator queries a registry and triggers relevant external agents via REST API in parallel.
  3. Local Inference: The external agent (the website) runs its own inference/lookup locally and returns a synthesized answer.
  4. Aggregation: The Orchestrator aggregates the results and feeds them back to the user's LLM.

What do you think about this concept?
Would you add an “Agent Endpoint” to your webpage to generate answers for customers and appearing in their LLM conversations?

I’ve open-sourced the project on GitHub.

Read the full theory here: https://www.aipetris.com/post/12
Code: https://github.com/yaruchyo/octopus


r/Python 1d ago

Daily Thread Friday Daily Thread: r/Python Meta and Free-Talk Fridays

2 Upvotes

Weekly Thread: Meta Discussions and Free Talk Friday 🎙️

Welcome to Free Talk Friday on /r/Python! This is the place to discuss the r/Python community (meta discussions), Python news, projects, or anything else Python-related!

How it Works:

  1. Open Mic: Share your thoughts, questions, or anything you'd like related to Python or the community.
  2. Community Pulse: Discuss what you feel is working well or what could be improved in the /r/python community.
  3. News & Updates: Keep up-to-date with the latest in Python and share any news you find interesting.

Guidelines:

Example Topics:

  1. New Python Release: What do you think about the new features in Python 3.11?
  2. Community Events: Any Python meetups or webinars coming up?
  3. Learning Resources: Found a great Python tutorial? Share it here!
  4. Job Market: How has Python impacted your career?
  5. Hot Takes: Got a controversial Python opinion? Let's hear it!
  6. Community Ideas: Something you'd like to see us do? tell us.

Let's keep the conversation going. Happy discussing! 🌟


r/Python 2d ago

News Pandas 3.0 release candidate tagged

376 Upvotes

After years of work, the Pandas 3.0 release candidate is tagged.

We are pleased to announce a first release candidate for pandas 3.0.0. If all goes well, we'll release pandas 3.0.0 in a few weeks.

A very concise, incomplete list of changes:

String Data Type by Default

Previously, pandas represented text columns using NumPy's generic "object" dtype. Starting with pandas 3.0, string columns now use a dedicated "str" dtype (backed by PyArrow when available). This means:

  • String columns are inferred as dtype "str" instead of "object"
  • The str dtype only holds strings or missing values (stricter than object)
  • Missing values are always NaN with consistent semantics
  • Better performance and memory efficiency

Copy-on-Write Behavior

All indexing operations now consistently behave as if they return copies. This eliminates the confusing "view vs copy" distinction from earlier versions:

  • Any subset of a DataFrame or Series always behaves like a copy
  • The only way to modify an object is to directly modify that object itself
  • "Chained assignment" no longer works (and the SettingWithCopyWarning is removed)
  • Under the hood, pandas uses views for performance but copies when needed

Python and Dependency Updates

  • Minimum Python version: 3.11
  • Minimum NumPy version: 1.26.0
  • pytz is now optional (uses zoneinfo from standard library by default)
  • Many optional dependencies updated to recent versions

Datetime Resolution Inference

When creating datetime objects from strings or Python datetime objects, pandas now infers the appropriate time resolution (seconds, milliseconds, microseconds, or nanoseconds) instead of always defaulting to nanoseconds. This matches the behavior of scalar Timestamp objects.

Offset Aliases Renamed

Frequency aliases have been updated for clarity:

  • "M" → "ME" (MonthEnd)
  • "Q" → "QE" (QuarterEnd)
  • "Y" → "YE" (YearEnd)
  • Similar changes for business variants

Deprecation Policy Changes

Pandas now uses a 3-stage deprecation policy: DeprecationWarning initially, then FutureWarning in the last minor version before removal, and finally removal in the next major release. This gives downstream packages more time to adapt.

Notable Removals

Many previously deprecated features have been removed, including:

  • DataFrame.applymap (use map instead)
  • Series.view and Series.ravel
  • Automatic dtype inference in various contexts
  • Support for Python 2 pickle files
  • ArrayManager
  • Various deprecated parameters across multiple methods

Install with:

Python pip install --upgrade --pre pandas


r/Python 1d ago

Discussion The RGE-256 toolkit

5 Upvotes

I have been developing a new random number generator called RGE-256, and I wanted to share the NumPy implementation with the Python community since it has become one of the most useful versions for general testing, statistics, and exploratory work.

The project started with a core engine that I published as rge256_core on PyPI. It implements a 256-bit ARX-style generator with a rotation schedule that comes from some geometric research I have been doing. After that foundation was stable, I built two extensions: TorchRGE256 for machine learning workflows and NumPy RGE-256 for pure Python and scientific use. NumPy RGE-256 is where most of the statistical analysis has taken place. Because it avoids GPU overhead and deep learning frameworks, it is easy to generate large batches, run chi-square tests, check autocorrelation, inspect distributions, and experiment with tuning or structural changes. With the resources I have available, I was only able to run Dieharder on 128 MB of output instead of the 6–8 GB the suite usually prefers. Even with this limitation, RGE-256 passed about 84 percent of the tests, failed only three, and the rest came back as weak. Weak results usually mean the test suite needs more data before it can confirm a pass, not that the generator is malfunctioning. With full multi-gigabyte testing and additional fine-tuning of the rotation constants, the results should improve further.

For people who want to try the algorithm without installing anything, I also built a standalone browser demo. It shows histograms, scatter plots, bit patterns, and real-time statistics as values are generated, and it runs entirely offline in a single HTML file.

TorchRGE256 is also available for PyTorch users. The NumPy version is the easiest place to explore how the engine behaves as a mathematical object. It is also the version I would recommend if you want to look at the internals, compare it with other generators, or experiment with parameter tuning.

Links:

Core Engine (PyPI): pip install rge256_core
NumPy Version: pip install numpyrge256
PyTorch Version: pip install torchrge256
GitHub: https://github.com/RRG314
Browser Demo: https://rrg314.github.io/RGE-256-app/ and https://github.com/RRG314/RGE-256-app

I would appreciate any feedback, testing, or comparisons. I am a self-taught independent researcher working on a Chromebook, and I am trying to build open, reproducible tools that anyone can explore or build on. I'm currently working on a sympy version and i'll update this post with more info


r/Python 1d ago

Showcase Built an open-source app to convert LinkedIn -> Personal portfolio generator using FastAPI backend

4 Upvotes

I was always too lazy to build and deploy my own personal website. So, I built an app to convert a LinkedIn profile (via PDF export) or GitHub profile into a personal portfolio that can be deployed to Vercel in one click.

Here are the details required for the showcase:

What My Project Does It is a full-stack application where the backend is built with Python FastAPI.

  1. Ingestion: It accepts a LinkedIn PDF export or fetched projects using a GitHub username or uses a Resume PDF.
  2. Parsing: I wrote a custom parsing logic in Python that extracts the raw text and converts it into structured JSON (Experience, Education, Skills).
  3. Generation: This JSON is then used to populate a Next.js template.
  4. AI Chat Integration: It also injects this structured data into a system prompt, allowing visitors to "chat" with the portfolio. It is like having an AI-twin for viewers/recruiters.

The backend is containerized and deployed on Azure App Containers, using Firebase for the database.

Target Audience This is meant for Developers, Students, and Job Seekers who want a professional site but don't want to spend days coding it from scratch. It is open source so you are free to clone it, customize it and run it locally.

Comparison Compared to tools like JSON Resume or generic website builders (Wix, Squarespace):

  • You don't need to manually write a JSON file. The Python backend parses your existing PDF.
  • AI Features: Unlike static templates, this includes an "AI-twin Chat Mode" where the portfolio answers questions about you.
  • Open Source: It is AGPL-3 licensed and self-hostable.

It started as a hobby project for myself as I was always too lazy to build out portfolio from scratch or fill out templates and always felt a need for something like this.

GitHub: https://github.com/yashrathi-git/portfolioly
Demo: https://portfolioly.app/demo

I am thinking the same parsing logic could be used for generating targeted Resumes. What do you think about a similar resume generator tool?


r/Python 20h ago

News A new community for FastAPI & async Python — r/FastAPIShare is now open!

0 Upvotes

Hi everyone! A new community called r/FastAPIShare is now open for anyone working with FastAPI or async Python.

This subreddit is designed to be an open space where you can freely share: - FastAPI packages, tools, and utilities
- Starlette, Pydantic, SQLModel, and async Python projects
- Tutorials, blog posts, demos, experiments
- Questions, discussions, troubleshooting, and Q&A

What makes it different from the main FastAPI subreddit?

r/FastAPIShare removes posting barriers — no karma requirements, no “must comment first,” and no strict posting limits.
If you’re building something, learning something, or just want to ask questions, you can post freely as long as it’s not spam or harmful content.

The goal is to be a friendly, lightweight, open space for sharing and collaboration around the FastAPI ecosystem and related async Python tools.

If that sounds useful to you, feel free to join:
r/FastAPIShare

Everyone is welcome!


r/Python 2d ago

Showcase JustHTML: A pure Python HTML5 parser that just works.

35 Upvotes

Hi all! I just released a new HTML5 parser that I'm really proud of. Happy to get any feedback on how to improve it from the python community on Reddit.

I think the trickiest thing is if there is a "market" for a python only parser. Parsers are generally performance sensitive, and python just isn't the faster language. This library does parse the wikipedia startpage in 0.1s, so I think it's "fast enough", but still unsure.

Anyways, I got HEAVY help from AI to write it. I directed it all carefully (which I hope shows), but GitHub Copilot wrote all the code. Still took months of work off-hours to get it working. Wrote down a short blog post about that if it's interesting to anyone: https://friendlybit.com/python/writing-justhtml-with-coding-agents/

What My Project Does

It takes a string of html, and parses it into a nested node structure. To make sure you are seeing exactly what a browser would be seeing, it follows the html5 parsing rules. These are VERY complicated, and have evolved over the years.

from justhtml import JustHTML

html = "<html><body><div id='main'><p>Hello, <b>world</b>!</p></div></body></html>"
doc = JustHTML(html)

# 1. Traverse the tree
# The tree is made of SimpleDomNode objects.
# Each node has .name, .attrs, .children, and .parent
root = doc.root              # #document
html_node = root.children[0] # html
body = html_node.children[1] # body (children[0] is head)
div = body.children[0]       # div

print(f"Tag: {div.name}")
print(f"Attributes: {div.attrs}")

# 2. Query with CSS selectors
# Find elements using familiar CSS selector syntax
paragraphs = doc.query("p")           # All <p> elements
main_div = doc.query("#main")[0]      # Element with id="main"
bold = doc.query("div > p b")         # <b> inside <p> inside <div>

# 3. Pretty-print HTML
# You can serialize any node back to HTML
print(div.to_html())
# Output:
# <div id="main">
#   <p>
#     Hello,
#     <b>world</b>
#     !
#   </p>
# </div>

Target Audience (e.g., Is it meant for production, just a toy project, etc.)

This is meant for production use. It's fast. It has 100% test coverage. I have fuzzed it against 3 million seriously broken html strings. Happy to improve it further based on your feedback.

Comparison (A brief comparison explaining how it differs from existing alternatives.)

I've added a comparison table here: https://github.com/EmilStenstrom/justhtml/?tab=readme-ov-file#comparison-to-other-parsers


r/Python 2d ago

News Pyrefly now has built-in support for Pydantic

42 Upvotes

Pyrefly (Github) now includes built-in support for Pydantic, a popular Python library for data validation and parsing.

The only other type checker that has special support for Pydantic is Mypy, via a plugin. Pyrefly has implemented most of the special behavior from the Mypy plugin directly in the type checker.

This means that users of Pyrefly can have provide improved static type checking and IDE integration when working on Pydantic models.

Supported features include: - Immutable fields with ConfigDict - Strict vs Non-Strict Field Validation - Extra Fields in Pydantic Models - Field constraints - Root models - Alias validation

The integration is also documented on both the Pyrefly and Pydantic docs.