Cat Words under Flowers:Our WeeklyIssue 98 shared an article, which pointed out__init__
The problems with the method and new best practices, and Issue 99 also shared an article that supports the views of the first article. I think they raise a question worth noting and thinking about, so I translated the first article into Chinese.
Original: Glyph
Translator: Cat under the Pea Flower @Python Cat
Original title: Stop Writing
__init__
Methodsoriginal:/2025/04/
Historical background
Before Python version 3.7 (released in June 2018), dataclasses were introduced,__init__
Special methods have important uses. If you have a class representing a data structure - for example, withx
andy
Attributes2DCoordinate
——If you want to pass2DCoordinate(x=1, y=2)
In this way, you need to add ax
andy
Parameters__init__
method.
The other implementation methods available at that time had serious problems:
- You can
2DCoordinate
Remove from the public API and expose amake_2d_coordinate
Functions and make them non-importable, but how do you reflect the return value or parameter type in the document? - You can record
x
andy
Properties and let users assign values separately, but this2DCoordinate()
An invalid object will be returned. - You can use the class attribute to set the coordinate default value to 0. Although this solves the problem with option 2, this requires all
2DCoordinate
Objects are not only mutable, but must be modified at each call point. - You can add a new oneabstractClasses to solve the problem of option 1, this abstract class can be exposed in the public API, but this will surge the complexity of each new public class, no matter how simple it is. What's worse is,
It didn't appear until Python 3.8, so in versions prior to 3.7, this forces you to use concrete inheritance and declare multiple classes, even for the most basic data structures.
In addition, one is responsible for assigning several attributes__init__
There is no obvious problem with the method, so it is a good choice in this case. It became obvious in most cases, given the problem with all the alternatives I just describeddefaultChoice, this makes sense.
However, because "definition is accepted"__init__
"Create an object as a userdefaultHow we have developed a habit: put a bunch of them at the beginning of each classCode that can be written as you like, These codes are executed every time they are instantiated.
Wherever there is random code, there will be uncontrollable problems.
The problem lies
Let's imagine a complex point data structure, creating a structure that interacts with external I/O:FileReader
。
Of course Python hasYour own file object abstraction, but for the sake of demonstration, we ignore it for the time being.
Suppose we have the following function, located in onefileio
In the module:
open(path: str) -> int
read(fileno: int, length: int)
close(fileno: int)
We assumeReturns an integer representing the file descriptor [Note 1],
Read from open file descriptor
length
bytes,Close the file descriptor to invalidate it.
According to us, we wrote countless__init__
We may define the thinking habits formed by methods in this wayFileReader
kind:
class FileReader:
def __init__(self, path: str) -> None:
self._fd = (path)
def read(self, length: int) -> bytes:
return (self._fd, length)
def close(self) -> None:
(self._fd)
For our initial use case, this is OK. Client code executes similarFileReader("./")
to create aFileReader
, it will descriptor the fileint
Maintained as a private state. This is exactly what we expect; we do not want user code to see or tamper with_fd
, because this may violateFileReader
the invariance of Effective constructionFileReader
All the necessary work required - i.e. callopen
——AllFileReader.__init__
It's handled.
However, as demand increases,FileReader.__init__
Becoming more and more awkward.
At first we only care about, but later, we may need to adapt a library that needs to be managed by itself for some reason
call and want to return a
int
As ours_fd
, now we have to adopt a strange workaround like this:
def reader_from_fd(fd: int) -> FileReader:
fr = object.__new__(FileReader)
fr._fd = fd
return fr
In this way, all the advantages we have gained from the canonical object creation process are lost.reader_from_fd
The type signature receives only a normalint
, it can't even suggest to the caller how to pass in the correct oneint
type.
Testing also becomes much more troublesome because when we want to get it in the testFileReader
When the instance of the file without doing actual file I/O, you must pile it to replace your ownReplicas, even if we can (for example) for testing purposes on multiple
FileReader
Share a file descriptor between.
All of the above examples assumeIt is a synchronous operation. But there are many network resources that can only be obtained through the asynchronous (so: maybe slow, maybe error-prone) API, although this may be oneHypothesisquestion. If you ever wanted to write
async def __init__(self): ...
, then you have encountered this limitation in practice.
To fully describe all the problems of this approach, I am afraid that a monograph on the philosophy of object-oriented design is required. So I briefly summarize: the root causes of all these problems are actually the same - we closely tied the behavior of "creating data structures" to "common side effects of this data structure." Since it is said to be "common", it means that they are not "always" related. And in those unrelated situations, the code becomes bulky and prone to problems
In short, definition__init__
It is an anti-pattern and we need an alternative.
This article was translated and first published on [Python Cat]:/posts/2025-05-02-init
Solution
I think the following three designs can solve the above problems:
- use
dataclass
Define attributes, - Before replacement
__init__
The behavior executed in - Use exact types to describe a valid instance.
usedataclass
Properties to create__init__
First, let'sFileReader
Refactored into onedataclass
. It will generate a__init__
Method, but this is not something we can define at will, it will be constrained, i.e. only used for assignment properties.
@dataclass
class FileReader:
_fd: int
def read(self, length: int) -> bytes:
return (self._fd, length)
def close(self) -> None:
(self._fd)
But... Oops. Repairing customization__init__
CallWhen we introduce several problems it solves:
- We're lost
FileReader("path")
simplicity and convenience. Now users have to import the underlying, which makes the most common way of creating objects both verbose and unintuitive. If we want the user to know how to create it in a real scenario
FileReader
, you have to add guidance on the use of other modules in the document. - right
_fd
As a file descriptor, there is no mandatory check; it is just an integer, and it is easy for the user to pass in incorrect numbers, but there is no error.
Look at it alone, onlydataclass
, cannot solve all problems, so we need to add the second technology.
useclassmethod
Factory to create objects
We do not want to generate additional imports or require users to view other modules - i.e.FileReader
Anything outside of itself – to figure out how to create what you wantFileReader
。
Fortunately, we have a tool that can easily solve these problems:@classmethod
. Let's define aClass Method:
from typing import Self
@dataclass
class FileReader:
_fd: int
@classmethod
def open(cls, path: str) -> Self:
return cls((path))
Now your caller canFileReader("path")
Replace with("path")
, get with__init__
Same benefits.
Also, if we need to useawait (...)
, a signature is required@classmethod async def open
This can be unlimited by__init__
As a constraint on a special method.@classmethod
It can be absolutelyasync
, it can also modify the return value, such as returning a set of related valuestuple
, not just return constructed objects.
useNewType
Solve the issue of object validity
Next, let's solve the slightly tricky object validity problem.
Our type signature calls this thingint
, the underlying , returns an ordinary integer, which we cannot change. But for valid verification, we can useNewTypeTo accurately require:
from typing import NewType
FileDescriptor = NewType("FileDescriptor", int)
There are several ways to deal with the underlying library problem, but for the sake of simplicity and to show that this method does not bring any runtime overhead, we simply tell Mypy directly:、
and
Already received
FileDescriptor
Integer of type, not normal integers.
from typing import Callable
_open: Callable[[str], FileDescriptor] = # type:ignore[assignment]
_read: Callable[[FileDescriptor, int], bytes] =
_close: Callable[[FileDescriptor], None] =
Of course, we have to adjust slightly, tooFileReader
, but the changes are small. Combining these modifications, the code becomes:
from typing import Self
@dataclass
class FileReader:
_fd: FileDescriptor
@classmethod
def open(cls, path: str) -> Self:
return cls(_open(path))
def read(self, length: int) -> bytes:
return _read(self._fd, length)
def close(self) -> None:
_close(self._fd)
Please note that the key here is not the useNewType
, instead, let objects with "full attributes" naturally become "effective instances".NewType
Just a convenient tool to help us use itint
、str
orbytes
Implement necessary constraints when such basic types.
Summary - New best practices
From now on, when you define a new Python class:
- Write it as a data class (or aattrs class, if you like it)
- Use the default
__init__
method. 【Note 2】 - Add to
@classmethod
, provides callers with convenient and public object construction methods. - All dependencies are required to be satisfied by attributes, so that a valid object is always created first.
- use
To base data types (e.g.
int
andstr
) Add restrictions, especially when these types need to have some special properties, such as having to come from a specific library, having to be randomly generated, etc.
If you define the class this way, you will get a custom one__init__
All the benefits of the method:
- All people who call your data structure can get valid objects, because as long as the attributes are set correctly, the objects will naturally be valid.
- Your library users can use convenient object creation methods that handle various complex tasks and make use simple. Moreover, users can discover these creation methods by just taking a look at the list of classes.
There are some other benefits:
- Your code will stand the test of the future and can easily cope with the new needs of users to create objects.
- If there are multiple ways to instantiate your class, you can give each method a meaningful name; you don't need to use something like
def __init__(self, maybe_a_filename: int | str | None = None):
Such a monster. - When writing tests, you only need to provide all the required dependencies to construct the object; no monkey patches are needed, because you can call the type constructor directly without any I/O operations or side effects.
Before there was no data class, there was a strange phenomenon in the Python language: it was just a basic thing to fill data with data structures, and it was necessary to rewrite a method with 4 underscores.__init__
The method is like an alien. And other magic methods, like__add__
or__repr__
, essentially dealing with some advanced features of the class.
Today, this historical language flaw has been solved. Have it@dataclass
、@classmethod
andNewType
, you can build classes that are easy to use, Python-style, flexible, easy to test and robust.
Notes in the article:
- If you are not familiar with it, the "file descriptor" is actually an integer that only makes sense within the program. When you let the operating system open a file, it responds to "I have opened file 7 for you", and after that, whenever you reference the number "7", it represents that file until you execute it
close(7)
Close it. - Unless you have very good reasons, of course. For example, for backward compatibility or compatibility with other libraries, these may be reasonable reasons. There are also some data consistency verifications that cannot be expressed through the type system. The most common example is a class that needs to check the relationship between two different fields, such as a "range" object, where
start
Must always be less thanend
. There are always exceptions to this kind of rule. However, in__init__
Performing any I/O operation in is basically not a good idea, and almost all other operations that may be useful in certain special cases can be passed__post_init__To implement it without having to write it directly__init__
。