Skip to main content

Mapping List of Unicode Strings in SQLAlchemy

·2 mins

Problem mapping JSON or List of Strings in SQLAlchemy #

When storing JSON data or “list of strings” data in sqlite using SQLAlchemy, mapping types is not easy.

class Words(Base):
    __tablename__ = "words"
    id: Mapped[int] = mapped_column(primary_key=True)
    word: Mapped[list[str]] = mapped_column(String)

Mapping word to mapped_column(String) gives error when inserting ["foo", "bar"] to word:

sqlalchemy.exc.ProgrammingError: (sqlite3.ProgrammingError) Error binding parameter 2: type 'list' is not supported

Solution to Inserting List of Strings #

In Postgres, you can use ARRAY(String) instead, but in Sqlite, ARRAY does not exist.

So, I found out that you can map JSON instead of String.

from sqlalchemy.types import JSON

class Words(Base):
    __tablename__ = "words"
    id: Mapped[int] = mapped_column(primary_key=True)
    word: Mapped[list[str]] = mapped_column(JSON)

Then, the list of strings can be inserted to word.

The Next Problem (Escaped Unicode Sequences) #

Because I often use Korean texts, when I insert a list of Korean strings, SQLAlchemy escapes Korean Unicode when serialized to JSON using default json.dumps().

So when storing ["ν•œκΈ€","μ˜μ–΄"], it is saved as "\uxxxx \uxxxx" which does not help on reading the value when querying using sqlite3 or litecli.

To store the Unicode as is, you need to make a custom type so that json.dumps("string", ensure_ascii=False) can be set.

Solution to Escaped Unicode Sequences #

from sqlalchemy.types import Text, TypeDecorator
import json

class UnicodeJSON(TypeDecorator):
    impl = Text

    def process_bind_param(self, value, dialect):
        if value is None:
            return None
        return json.dumps(value, ensure_ascii=False)

    def process_result_value(self, value, dialect):
        if value is None:
            return None
        return json.loads(value)

class Words(Base):
    __tablename__ = "words"
    id: Mapped[int] = mapped_column(primary_key=True)
    word: Mapped[list[str]] = mapped_column(UnicodeJSON)

Now, when Korean is inserted to word, the characters are stored as is.

Reference #

Custom Types in SQLAlchemy