Location>code7788 >text

Don't write Python classes anymore __init__ methods

Popularity:543 ℃/2025-05-04 12:22:23

Cat Words under Flowers:Our WeeklyIssue 98 shared an article, which pointed out__init__The problems with the method and new best practices, and Issue 99 also shared an article that supports the views of the first article. I think they raise a question worth noting and thinking about, so I translated the first article into Chinese.

Original: Glyph

Translator: Cat under the Pea Flower @Python Cat

Original title: Stop Writing__init__ Methods

original:/2025/04/

Historical background

Before Python version 3.7 (released in June 2018), dataclasses were introduced,__init__Special methods have important uses. If you have a class representing a data structure - for example, withxandyAttributes2DCoordinate——If you want to pass2DCoordinate(x=1, y=2)In this way, you need to add axandyParameters__init__method.

The other implementation methods available at that time had serious problems:

  1. You can2DCoordinateRemove from the public API and expose amake_2d_coordinateFunctions and make them non-importable, but how do you reflect the return value or parameter type in the document?
  2. You can recordxandyProperties and let users assign values ​​separately, but this2DCoordinate()An invalid object will be returned.
  3. You can use the class attribute to set the coordinate default value to 0. Although this solves the problem with option 2, this requires all2DCoordinateObjects are not only mutable, but must be modified at each call point.
  4. You can add a new oneabstractClasses to solve the problem of option 1, this abstract class can be exposed in the public API, but this will surge the complexity of each new public class, no matter how simple it is. What's worse is,It didn't appear until Python 3.8, so in versions prior to 3.7, this forces you to use concrete inheritance and declare multiple classes, even for the most basic data structures.

In addition, one is responsible for assigning several attributes__init__There is no obvious problem with the method, so it is a good choice in this case. It became obvious in most cases, given the problem with all the alternatives I just describeddefaultChoice, this makes sense.

However, because "definition is accepted"__init__"Create an object as a userdefaultHow we have developed a habit: put a bunch of them at the beginning of each classCode that can be written as you like, These codes are executed every time they are instantiated.

Wherever there is random code, there will be uncontrollable problems.

The problem lies

Let's imagine a complex point data structure, creating a structure that interacts with external I/O:FileReader

Of course Python hasYour own file object abstraction, but for the sake of demonstration, we ignore it for the time being.

Suppose we have the following function, located in onefileioIn the module:

  • open(path: str) -> int
  • read(fileno: int, length: int)
  • close(fileno: int)

We assumeReturns an integer representing the file descriptor [Note 1],Read from open file descriptorlengthbytes,Close the file descriptor to invalidate it.

According to us, we wrote countless__init__We may define the thinking habits formed by methods in this wayFileReaderkind:

class FileReader:
    def __init__(self, path: str) -> None:
        self._fd = (path)
    def read(self, length: int) -> bytes:
        return (self._fd, length)
    def close(self) -> None:
        (self._fd)

For our initial use case, this is OK. Client code executes similarFileReader("./")to create aFileReader, it will descriptor the fileintMaintained as a private state. This is exactly what we expect; we do not want user code to see or tamper with_fd, because this may violateFileReaderthe invariance of Effective constructionFileReaderAll the necessary work required - i.e. callopen——AllFileReader.__init__It's handled.

However, as demand increases,FileReader.__init__Becoming more and more awkward.

At first we only care about, but later, we may need to adapt a library that needs to be managed by itself for some reasoncall and want to return aintAs ours_fd, now we have to adopt a strange workaround like this:

def reader_from_fd(fd: int) -> FileReader:
    fr = object.__new__(FileReader)
    fr._fd = fd
    return fr

In this way, all the advantages we have gained from the canonical object creation process are lost.reader_from_fdThe type signature receives only a normalint, it can't even suggest to the caller how to pass in the correct oneinttype.

Testing also becomes much more troublesome because when we want to get it in the testFileReaderWhen the instance of the file without doing actual file I/O, you must pile it to replace your ownReplicas, even if we can (for example) for testing purposes on multipleFileReaderShare a file descriptor between.

All of the above examples assumeIt is a synchronous operation. But there are many network resources that can only be obtained through the asynchronous (so: maybe slow, maybe error-prone) API, although this may be oneHypothesisquestion. If you ever wanted to writeasync def __init__(self): ..., then you have encountered this limitation in practice.

To fully describe all the problems of this approach, I am afraid that a monograph on the philosophy of object-oriented design is required. So I briefly summarize: the root causes of all these problems are actually the same - we closely tied the behavior of "creating data structures" to "common side effects of this data structure." Since it is said to be "common", it means that they are not "always" related. And in those unrelated situations, the code becomes bulky and prone to problems

In short, definition__init__It is an anti-pattern and we need an alternative.

This article was translated and first published on [Python Cat]:/posts/2025-05-02-init

Solution

I think the following three designs can solve the above problems:

  • usedataclassDefine attributes,
  • Before replacement__init__The behavior executed in
  • Use exact types to describe a valid instance.

usedataclassProperties to create__init__

First, let'sFileReaderRefactored into onedataclass. It will generate a__init__Method, but this is not something we can define at will, it will be constrained, i.e. only used for assignment properties.

@dataclass
class FileReader:
    _fd: int
    def read(self, length: int) -> bytes:
        return (self._fd, length)
    def close(self) -> None:
        (self._fd)

But... Oops. Repairing customization__init__CallWhen we introduce several problems it solves:

  1. We're lostFileReader("path")simplicity and convenience. Now users have to import the underlying, which makes the most common way of creating objects both verbose and unintuitive. If we want the user to know how to create it in a real scenarioFileReader, you have to add guidance on the use of other modules in the document.
  2. right_fdAs a file descriptor, there is no mandatory check; it is just an integer, and it is easy for the user to pass in incorrect numbers, but there is no error.

Look at it alone, onlydataclass, cannot solve all problems, so we need to add the second technology.

useclassmethodFactory to create objects

We do not want to generate additional imports or require users to view other modules - i.e.FileReaderAnything outside of itself – to figure out how to create what you wantFileReader

Fortunately, we have a tool that can easily solve these problems:@classmethod. Let's define aClass Method:

from typing import Self
@dataclass
class FileReader:
    _fd: int
    @classmethod
    def open(cls, path: str) -> Self:
        return cls((path))

Now your caller canFileReader("path")Replace with("path"), get with__init__Same benefits.

Also, if we need to useawait (...), a signature is required@classmethod async def openThis can be unlimited by__init__As a constraint on a special method.@classmethodIt can be absolutelyasync, it can also modify the return value, such as returning a set of related valuestuple, not just return constructed objects.

useNewTypeSolve the issue of object validity

Next, let's solve the slightly tricky object validity problem.

Our type signature calls this thingint, the underlying , returns an ordinary integer, which we cannot change. But for valid verification, we can useNewTypeTo accurately require:

from typing import NewType
FileDescriptor = NewType("FileDescriptor", int)

There are several ways to deal with the underlying library problem, but for the sake of simplicity and to show that this method does not bring any runtime overhead, we simply tell Mypy directly:andAlready receivedFileDescriptorInteger of type, not normal integers.

from typing import Callable
_open: Callable[[str], FileDescriptor] =   # type:ignore[assignment]
_read: Callable[[FileDescriptor, int], bytes] = 
_close: Callable[[FileDescriptor], None] = 

Of course, we have to adjust slightly, tooFileReader, but the changes are small. Combining these modifications, the code becomes:

from typing import Self
@dataclass
class FileReader:
    _fd: FileDescriptor
    @classmethod
    def open(cls, path: str) -> Self:
        return cls(_open(path))
    def read(self, length: int) -> bytes:
        return _read(self._fd, length)
    def close(self) -> None:
        _close(self._fd)

Please note that the key here is not the useNewType, instead, let objects with "full attributes" naturally become "effective instances".NewTypeJust a convenient tool to help us use itintstrorbytesImplement necessary constraints when such basic types.

Summary - New best practices

From now on, when you define a new Python class:

  • Write it as a data class (or aattrs class, if you like it)
  • Use the default__init__method. 【Note 2】
  • Add to@classmethod, provides callers with convenient and public object construction methods.
  • All dependencies are required to be satisfied by attributes, so that a valid object is always created first.
  • useTo base data types (e.g.intandstr) Add restrictions, especially when these types need to have some special properties, such as having to come from a specific library, having to be randomly generated, etc.

If you define the class this way, you will get a custom one__init__All the benefits of the method:

  • All people who call your data structure can get valid objects, because as long as the attributes are set correctly, the objects will naturally be valid.
  • Your library users can use convenient object creation methods that handle various complex tasks and make use simple. Moreover, users can discover these creation methods by just taking a look at the list of classes.

There are some other benefits:

  • Your code will stand the test of the future and can easily cope with the new needs of users to create objects.
  • If there are multiple ways to instantiate your class, you can give each method a meaningful name; you don't need to use something likedef __init__(self, maybe_a_filename: int | str | None = None):Such a monster.
  • When writing tests, you only need to provide all the required dependencies to construct the object; no monkey patches are needed, because you can call the type constructor directly without any I/O operations or side effects.

Before there was no data class, there was a strange phenomenon in the Python language: it was just a basic thing to fill data with data structures, and it was necessary to rewrite a method with 4 underscores.__init__The method is like an alien. And other magic methods, like__add__or__repr__, essentially dealing with some advanced features of the class.

Today, this historical language flaw has been solved. Have it@dataclass@classmethodandNewType, you can build classes that are easy to use, Python-style, flexible, easy to test and robust.

Notes in the article:

  1. If you are not familiar with it, the "file descriptor" is actually an integer that only makes sense within the program. When you let the operating system open a file, it responds to "I have opened file 7 for you", and after that, whenever you reference the number "7", it represents that file until you execute itclose(7)Close it.
  2. Unless you have very good reasons, of course. For example, for backward compatibility or compatibility with other libraries, these may be reasonable reasons. There are also some data consistency verifications that cannot be expressed through the type system. The most common example is a class that needs to check the relationship between two different fields, such as a "range" object, wherestartMust always be less thanend. There are always exceptions to this kind of rule. However, in__init__Performing any I/O operation in is basically not a good idea, and almost all other operations that may be useful in certain special cases can be passed__post_init__To implement it without having to write it directly__init__