Chi*_*ank 9 python orm design-patterns sqlalchemy
I would like to know what the best practices are for using SQLALchemy declarative models within business logic code. Perhaps stackexchange.codereview may have a been a better place to ask this, but I'm not sure.
Here's some background.
Let's say I have a bunch of classes doing various things. Most of them have little or nothing to do with each other.Each such class has between a hundred to thousand lines of code doing things that have precious little to do with the database. In fact, most of the classes aren't even database aware so far. I've gotten away with storing the actual information in flat files (csv, yaml, so on), and only maintaining a serial number table and a document path - serial number mapping in the database. Each object retrieves the files it needs by getting the correct paths from the database (by serial number) and reconstructs itself from there. This has been exceedingly convenient so far, since my 'models' have been (and admittedly, continue to be) more than fluid.
As I expand the involvement of the database in the codebase I currently have, I seem to have settled on the following model, separating the database bits and the business logic into two completely separate parts, and joining them using specific function calls instead of inheritance or even composition. Here is a basic example of the kind of code I have now (pseudocode-quality):
module/db/models.py:
class Example(Base):
    id = Column(...)
    some_var = Column(...)
module/db/controller.py:
from .models import Example
def get_example_by_id(id, session):
    return session.query(Example).filter_by(id=id).one()
def upsert_example(id=None, some_var=None, session):
    if id is not None:
        try:
            example_obj = get_example_by_id(id, session)
            example_obj.some_var = some_var
            return
        except:
            pass
    example_obj = Example(some_var=some_var)
    session.add(example_obj)
    session.flush()
module/example.py:
from db import controller
class Example(object):
    def __init__(self, id):
        self._id = id
        self._some_var = None
        try:
            self._load_from_db()
            self._defined = True
        except:
            self._defined = False
    def _load_from_db(self, session):
        db_obj = controller.get_example_by_id(self._id, session)
        self._some_var = db_obj.some_var
    def create(some_var, session):
        if self._defined is True:
            raise Exception
        self._some_var = some_var
        self._sync_to_db(session)
    def _sync_to_db(self, session):
        controller.upsert_example(self._some_var, session)
    @property
    def some_var(self):
        return self._some_var
    ... 
I'm not convinced this is the way to go.
I have a few models following this pattern, and many more that I should implement in time. The database is currently only used for persistence and archiving. Once something is in the database, it's more or less read only from there on in. However, querying on it is becoming important.
The reason I'm inclined to migrate from the flatfiles to the database is largely to improve scalability.
Thus far, if I wanted to find all instances (rows) of Example with some_var = 3, I'd have to construct all of the instances from the flat files and iterate through them. This seems like a waste of both processor time and memory. In many cases, some_var is actually a calculated property, and reached by a fairly expensive process using source data contained in the flat file.
With the structure above, what I would do is query on Example, obtain a list of 'id's which satisfy my criterion, and then reconstruct just those module instances.
The ORM approach, however, as I understand it, would use thick models, where the objects returned by the query are themselves the objects I would need. I'm wondering whether it makes sense to try to move to that kind of a structure.
To that end, I have the following 'questions' / thoughts:
My instinct is that the code snippets above are anti-patterns more than they are useful patterns. I can't put my finger on why, exactly, but I'm not very happy with it. Is there a real, tangible disadvantage to the structure as listed above? Would moving to a more ORM-ish design provide advantages in functionality / performance / maintainability over this approach?
I'm paranoid about tying myself down to a database schema. I'm also paranoid about regular DB migrations. The approach listed above gives me a certain peace of mind in knowing that if I do need to do some migration, it'll be limited to the _load_from_db and _sync_to_db functions, and let me mess around willy nilly with all the rest of the code.
If I wanted to integrate Example from module/db/models.py with Example from module/example.py in the example above, what would be the cleanest way to go about it. Alternatively, what is an accepted pattern for handling business-logic heavy models with SQLAlchemy?
In the code above, note that the business logic class keeps all of it's information in 'private' instance variables, while the Model class keeps all of it's information in class variables. How would integrating these two approaches actually work? Theoretically, they should still 'just work' even if put together in a single class definition. In practice, does it?
(The actual codebase is on github, though it's not likely to be very readable)
小智 5
我认为对我们自己的设计持批评态度是很自然的(至少对我来说),即使我们正在研究它们。你这里的结构对我来说似乎很好。他们是否合适的答案取决于您打算做什么。
如果您将代码合并到厚模型中,那么所有这些都将集中在一个地方,并且您的架构将更简单,但是,这也可能意味着您的业务逻辑将紧密绑定到数据库中创建的模式。重新思考数据库意味着重新思考应用程序中其他大部分区域。
遵循此处提供的代码示例意味着分离关注点,这会产生负面影响,例如在更多位置添加更多行代码并增加复杂性,但这也意味着耦合更松散。如果您保持正确,那么当您决定更改数据库模式或迁移到完全不同的存储形式时,您应该会遇到更少的麻烦。由于您的业务逻辑类是一个普通的旧对象,因此它充当一个很好的分离状态容器。如果您转向其他内容,您仍然需要重新设计模型层以及可能的控制器部分,但您的业务逻辑和 UI 层可能基本保持不变。
我认为真正的考验在于询问这个应用程序有多大以及您计划将其投入使用多长时间?如果我们正在考虑一个生命周期较短的小型应用程序,那么松散耦合所增加的复杂性就是一种浪费,除非您是出于教育目的而这样做。如果应用程序预计将增长到相当大或服务多年,那么从长远来看,对复杂性的早期投资应该会以较低的拥有成本得到回报,因为对各种组件进行更改应该更容易。
如果这让您感觉更好的话,出于同样的原因,在使用 ORM(例如实体框架和 hybernate)时看到 POCO 和 POJO 并不罕见。