Files
ro-crate-interoperability-p…/0.2.x/lib/python/lib-ro-crate-schema/TECHNICAL.md
Snowwpanda 60860ebbab Cleanup and adjustment to depreciated features
Added Manifest for publishing
2025-11-09 23:30:51 +01:00

342 lines
9.6 KiB
Markdown

# Technical Documentation
## Architecture Overview
`lib-ro-crate-schema` is a Python library that bridges Pydantic models with RO-Crate metadata standards. It provides a decorator-based approach to create semantically annotated research objects.
## Package Structure
```
lib_ro_crate_schema/
├── crate/
│ ├── decorators.py # Schema decorators for Pydantic models
│ ├── schema_facade.py # Main API for crate management
│ ├── schema_registry.py # Global registry for type templates
│ ├── type.py # Type definition and management
│ ├── type_property.py # Property type definitions
│ ├── property_type.py # Property metadata
│ ├── metadata_entry.py # Metadata entry handling
│ ├── jsonld_utils.py # JSON-LD conversion utilities
│ ├── rdf.py # RDF graph generation
│ ├── restriction.py # OWL restriction handling
│ ├── forward_ref_resolver.py # Forward reference resolution
│ └── literal_type.py # Literal type handling
└── check.py # Validation utilities
```
## Core Components
### 1. Decorators (`decorators.py`)
**Purpose**: Provide a clean decorator-based API for annotating Pydantic models with ontology information.
**Key Functions**:
- `@ro_crate_schema(ontology: str)`: Class decorator that registers a Pydantic model as an RO-Crate schema type
- `Field(json_schema_extra: dict, ...)`: Pydantic Field with RO-Crate metadata in json_schema_extra
**Flow**:
1. Decorator captures class definition
2. Extracts field annotations and ontology mappings from json_schema_extra
3. Creates `TypeTemplate` and registers in `SchemaRegistry`
4. Returns original class unchanged
**Example** (Pydantic 2.x compatible):
```python
@ro_crate_schema(ontology="https://schema.org/Person")
class Person(BaseModel):
name: str = Field(json_schema_extra={"ontology": "https://schema.org/name"})
email: str = Field(json_schema_extra={"ontology": "https://schema.org/email"})
```
### 2. Schema Registry (`schema_registry.py`)
**Purpose**: Global singleton that maintains a registry of all decorated types and their ontological mappings.
**Key Classes**:
- `SchemaRegistry`: Singleton registry pattern
- `TypeTemplate`: Template containing type information and property mappings
**Key Methods**:
- `register_type(model, ontology)`: Register a new type template
- `get_type_template(model_name)`: Retrieve registered type
- `get_all_type_templates()`: Get all registered types
**Flow**:
```
Decorator → TypeTemplate Creation → Registry Storage → Facade Retrieval
```
### 3. Schema Facade (`schema_facade.py`)
**Purpose**: Main API for creating and managing RO-Crates. Provides high-level operations for schema and metadata management.
**Key Classes**:
- `SchemaFacade`: Primary interface for RO-Crate operations
**Key Methods**:
- `add_all_registered_models()`: Add all decorator-registered models to schema
- `add_model_instance(instance, id)`: Add a Pydantic instance as metadata
- `to_graph()`: Convert schema + metadata to RDF graph
- `from_ro_crate(path)`: Import existing RO-Crate
- `export(path)`: Export to directory (deprecated, use jsonld_utils)
**Internal State**:
- `types`: List of `Type` objects (schema definitions)
- `type_properties`: List of `TypeProperty` objects (property definitions)
- `metadata_entries`: List of `MetadataEntry` objects (actual data)
**Flow**:
```
SchemaFacade.add_all_registered_models()
Retrieve from SchemaRegistry
Convert TypeTemplate → Type + TypeProperty
SchemaFacade.add_model_instance(person)
Convert Pydantic instance → MetadataEntry
SchemaFacade.to_graph()
Generate RDF triples
```
### 4. Type System (`type.py`, `type_property.py`, `property_type.py`)
**Type (`type.py`)**:
- Represents an RDFS Class (e.g., `schema:Person`)
- Contains type properties and restrictions
- Generates RDF class definitions
**TypeProperty (`type_property.py`)**:
- Represents an RDF Property definition
- Links a property to its domain (type) and range (value type)
- Handles cardinality restrictions (min/max)
**PropertyType (`property_type.py`)**:
- Simple data class for property metadata
- Stores property name and RDF type
**Relationships**:
```
Type (Person)
└── has TypeProperty (name)
└── has PropertyType (string)
└── has Restriction (minCardinality: 1)
```
### 5. Metadata Entry (`metadata_entry.py`)
**Purpose**: Represents an actual instance of data (not the schema).
**Key Attributes**:
- `id`: Unique identifier
- `class_id`: Reference to Type
- `properties`: Dict of property values
- `references`: List of referenced entities
**Flow**:
```
Pydantic Instance (person = Person(...))
extract_from_pydantic()
MetadataEntry
to_triples()
RDF Graph
```
### 6. JSON-LD Utilities (`jsonld_utils.py`)
**Purpose**: Convert between RO-Crate (JSON-LD) and internal representations.
**Key Functions**:
- `add_schema_to_crate(facade, crate)`: Merge schema/metadata into ROCrate object
- Handles context management
- Converts RDF graph to JSON-LD @graph array
**Flow**:
```
SchemaFacade → RDF Graph → JSON-LD @graph → ROCrate.write()
```
### 7. RDF Generation (`rdf.py`)
**Purpose**: Convert schema and metadata to RDF triples.
**Key Functions**:
- `to_graph()`: Main conversion function
- Creates RDF graph with proper namespaces
- Handles OWL restrictions, class hierarchies, property definitions
**Output**: RDFLib Graph object with schema + data triples
### 8. Forward Reference Resolution (`forward_ref_resolver.py`)
**Purpose**: Handle Pydantic forward references (e.g., `creator: "Person"` when Person is defined later).
**Key Classes**:
- `ForwardRefResolver`: Resolves string type hints to actual classes
**When Used**: During TypeTemplate creation when processing nested models
### 9. Restriction (`restriction.py`)
**Purpose**: Model OWL restrictions (cardinality constraints).
**Key Classes**:
- `Restriction`: Represents OWL constraints on properties
- `RestrictionType`: Enum of restriction types (min/max cardinality)
**Flow**:
```
Pydantic Field (required vs Optional)
Determine cardinality
Create Restriction object
Generate OWL:Restriction triples
```
## Data Flow
### Complete Export Flow
```
1. Model Definition
@ro_crate_schema(...)
class Person(BaseModel): ...
2. Registration
TypeTemplate → SchemaRegistry
3. Facade Creation
facade = SchemaFacade()
facade.add_all_registered_models()
4. Add Instances
person = Person(...)
facade.add_model_instance(person, "alice")
5. RDF Generation
graph = facade.to_graph()
6. JSON-LD Conversion
crate = ROCrate()
final_crate = add_schema_to_crate(facade, crate)
7. Export
final_crate.write(path)
```
### Complete Import Flow
```
1. Read RO-Crate
path = "crate_directory/"
2. Parse JSON-LD
ROCrate.read(path)
3. Import to Facade
facade = SchemaFacade.from_ro_crate(path)
4. Extract Schema
facade.types → Type objects
facade.type_properties → TypeProperty objects
5. Extract Metadata
facade.metadata_entries → MetadataEntry objects
6. Query/Modify
Access entities via facade
```
## Key Design Patterns
### 1. Decorator Pattern
- Non-intrusive model annotation
- Preserves Pydantic functionality
- Automatic registration
### 2. Facade Pattern
- `SchemaFacade` provides simplified interface
- Hides internal complexity of RDF/OWL generation
- Single entry point for operations
### 3. Registry Pattern
- `SchemaRegistry` maintains global type catalog
- Singleton ensures consistency
- Enables decoupled type lookup
### 4. Builder Pattern
- Incremental construction of RO-Crates
- Add types, properties, metadata step-by-step
- Flexible composition
## Extension Points
### Adding Custom Types
```python
@ro_crate_schema(ontology="https://example.org/CustomType")
class CustomType(BaseModel):
custom_field: str = Field(json_schema_extra={ontology="https://example.org/customField"})
```
### Custom Property Types
Extend `PropertyType` for specialized property handling.
### Custom Restrictions
Extend `Restriction` class for additional OWL constraints.
## Performance Considerations
- **Registry Lookups**: O(1) hash-based lookups
- **RDF Generation**: Linear in number of entities + properties
- **Memory**: Stores full graph in memory (consider streaming for large crates)
## Dependencies
- **pydantic**: Model definition and validation
- **rdflib**: RDF graph manipulation
- **rocrate**: RO-Crate standard implementation
- **pyld**: JSON-LD processing
- **pyshacl**: SHACL validation
## Testing Strategy
- **Unit Tests**: Test individual components (Type, TypeProperty, etc.)
- **Integration Tests**: Test full export/import cycles
- **Round-trip Tests**: Ensure export → import → export produces identical results
- **Published Package Tests**: Verify installability from PyPI
## Common Pitfalls
1. **Forgetting to call `add_all_registered_models()`**: Models won't appear in schema
2. **Circular references**: Use forward references carefully
3. **ID conflicts**: Ensure unique IDs when adding instances
4. **Context mixing**: RO-Crate context vs custom contexts
## Debugging Tips
- Use `facade.to_graph()` to inspect RDF triples
- Check `SchemaRegistry.get_all_type_templates()` to see registered types
- Validate JSON-LD output with online validators
- Use `pyshacl` for SHACL validation
## Future Enhancements
- Streaming support for large datasets
- SHACL shape generation from Pydantic models
- Query API for metadata
- Incremental updates to existing crates
- Better circular reference handling