Entity Relationships Model - Updated Structure
Overview
Updated the entity relationship model to use a more efficient parent-child structure with belongs_to_entity_id field on entities, plus role-based GDPR compliance and source tracking on entity relations.
Key Improvements
- BelongsTo Field: Direct parent reference on child entities (phone/email/address → person/organization)
- Role Field: Track entity context for GDPR anonymization decisions (public_figure, employee, private_individual, etc.)
- Source Tracking: Track ultimate source document (file_id, object_id, email_id) for complete audit trail
What Was Changed
1. Entity Table: Added belongs_to_entity_id
Purpose: Direct parent-child relationships for contact information ownership.
Schema Changes:
ALTER TABLE oc_openregister_entities
ADD COLUMN belongs_to_entity_id BIGINT,
ADD INDEX idx_belongs_to (belongs_to_entity_id),
ADD FOREIGN KEY (belongs_to_entity_id) REFERENCES oc_openregister_entities(id) ON DELETE SET NULL;
Replaces: EntityLink table for 'belongs_to' relationships (many-to-many → many-to-one)
2. EntityRelation Table: Added role and source tracking
Purpose: Track entity context and original source documents for GDPR compliance.
Schema Changes:
ALTER TABLE oc_openregister_entity_relations
ADD COLUMN role VARCHAR(50),
ADD COLUMN file_id BIGINT,
ADD COLUMN object_id BIGINT,
ADD COLUMN email_id BIGINT,
ADD INDEX idx_role (role),
ADD INDEX idx_file (file_id),
ADD INDEX idx_object (object_id),
ADD INDEX idx_email (email_id);
New Fields:
- role: Context of entity ('public_figure', 'employee', 'private_individual', 'customer', 'contractor', 'author', 'recipient', 'mentioned')
- file_id: Original file containing this entity
- object_id: Original object containing this entity
- email_id: Original email containing this entity
3. Relationship Pattern: Parent-Child
BelongsTo creates direct parent-child relationships:
- Phone → belongs to → Person (phone.belongs_to_entity_id = person.id)
- Email → belongs to → Person (email.belongs_to_entity_id = person.id)
- Address → belongs to → Organization (address.belongs_to_entity_id = organization.id)
- Phone → belongs to → Organization (phone.belongs_to_entity_id = organization.id)
Query Pattern:
-- Get all contact info for a person
SELECT * FROM oc_openregister_entities
WHERE belongs_to_entity_id = {person_id};
-- Get parent entity for a phone
SELECT parent.* FROM oc_openregister_entities child
JOIN oc_openregister_entities parent ON child.belongs_to_entity_id = parent.id
WHERE child.id = {phone_id};
Note: We do NOT track person-to-person (family) or person-to-organization (employment) relationships. Only attributes/contact info belonging to entities.
4. Role-Based GDPR Compliance
Role Types:
- public_figure: May not require anonymization (e.g., CEO in press release)
- employee: In official capacity, may not require anonymization
- private_individual: Always requires anonymization
- customer: Context-dependent anonymization
- contractor: Context-dependent anonymization
- author: Document creator, context-dependent
- recipient: Document recipient, context-dependent
- mentioned: Mentioned in passing, context-dependent
Anonymization Logic:
public function requiresAnonymization(): bool
{
$nonPrivateRoles = [
self::ROLE_PUBLIC_FIGURE,
self::ROLE_EMPLOYEE,
];
if ($this->role && in_array($this->role, $nonPrivateRoles)) {
return false; // May not require anonymization
}
if ($this->role === self::ROLE_PRIVATE_INDIVIDUAL) {
return true; // Always requires anonymization
}
return true; // Default: require anonymization for safety
}
5. Source Tracking Benefits
Why track file_id/object_id/email_id?
- Chunks may change: Re-chunking, content updates
- GDPR requests: Need original source documents
- Anonymization: Must trace back to original files
- Audit trails: Require source document references
Example:
Entity: John Doe
Found in:
- File #100 (contract.pdf) as role='employee'
- Object #500 (customer record) as role='mentioned'
- Email #300 (thread) as role='recipient'
- File #150 (personal letter) as role='private_individual' ← REQUIRES ANONYMIZATION
6. PHP Entity Classes Updated
GdprEntity class now includes:
belongs_to_entity_idpropertygetParent()methodgetChildren()method (via mapper)canHaveChildren()helper method
EntityRelation class now includes:
roleproperty with role constantsfile_id,object_id,email_idpropertiesrequiresAnonymization()methodgetSourceType()andgetSourceId()helper methods
7. Use Cases
Use Case 1: Complete GDPR Profile
// Find person
$person = $entityMapper->findByValue('John Doe', GdprEntity::TYPE_PERSON);
// Get all contact info (simple query with belongs_to_entity_id)
$contactInfo = $entityMapper->findByBelongsTo($person->getId());
// Get all occurrences with role and source information
$relations = $entityRelationMapper->findByEntityId($person->getId());
foreach ($relations as $relation) {
echo "Role: {$relation->getRole()}\n";
echo "Source: {$relation->getSourceType()} #{$relation->getSourceId()}\n";
echo "Requires anonymization: " . ($relation->requiresAnonymization() ? 'Yes' : 'No') . "\n";
}
Output:
Contact Information:
- Phone: +31612345678
- Phone: +31687654321
- Email: john.doe@example.com
- Email: j.doe@company.com
- Address: 123 Main St, Amsterdam
Found In:
- File #100 (contract.pdf): role=employee, anonymization=No
- Object #500 (customer record): role=mentioned, anonymization=Yes
- Email #300 (email thread): role=recipient, anonymization=Yes
- File #150 (personal letter): role=private_individual, anonymization=Yes
Use Case 2: Role-Based Anonymization
// Find all private individual occurrences
$relations = $entityRelationMapper->findByRole(EntityRelation::ROLE_PRIVATE_INDIVIDUAL);
foreach ($relations as $relation) {
if ($relation->requiresAnonymization()) {
$entity = $entityMapper->find($relation->getEntityId());
// Anonymize this occurrence
$anonymizedValue = $this->anonymizeEntity($entity->getType(), $entity->getValue());
$relation->setAnonymized(true);
$relation->setAnonymizedValue($anonymizedValue);
$entityRelationMapper->update($relation);
}
}
Use Case 3: Source Document Retrieval
// GDPR request: All documents containing John Doe
$person = $entityMapper->findByValue('John Doe', GdprEntity::TYPE_PERSON);
$relations = $entityRelationMapper->findByEntityId($person->getId());
$sources = [
'files' => [],
'objects' => [],
'emails' => []
];
foreach ($relations as $relation) {
$sourceType = $relation->getSourceType();
$sourceId = $relation->getSourceId();
if ($sourceType === 'file') {
$sources['files'][] = $sourceId;
} elseif ($sourceType === 'object') {
$sources['objects'][] = $sourceId;
} elseif ($sourceType === 'email') {
$sources['emails'][] = $sourceId;
}
}
// Retrieve actual documents
$files = $fileMapper->findByIds(array_unique($sources['files']));
$objects = $objectMapper->findByIds(array_unique($sources['objects']));
$emails = $emailMapper->findByIds(array_unique($sources['emails']));
Use Case 4: Entity Deduplication
// Find phone number shared by multiple persons
$phone = $entityMapper->findByValue('+31612345678', GdprEntity::TYPE_PHONE);
$potentialParents = $entityMapper->findAll(); // Filter by type=person with same phone
// Check if phone belongs to multiple persons (data quality issue)
$personsWithThisPhone = [];
foreach ($potentialParents as $person) {
if ($phone->getBelongsToEntityId() === $person->getId()) {
$personsWithThisPhone[] = $person;
}
}
// If >1 person, may need deduplication
if (count($personsWithThisPhone) > 1) {
// Merge logic...
}
8. Query Patterns
Get all contact info for a person (Simple!):
SELECT * FROM oc_openregister_entities
WHERE belongs_to_entity_id = {person_id};
Get parent entity for contact info:
SELECT parent.* FROM oc_openregister_entities child
JOIN oc_openregister_entities parent ON child.belongs_to_entity_id = parent.id
WHERE child.id = {contact_id};
Find all entities requiring anonymization:
SELECT DISTINCT e.*
FROM oc_openregister_entities e
JOIN oc_openregister_entity_relations er ON e.id = er.entity_id
WHERE er.role IN ('private_individual', 'customer')
AND er.anonymized = FALSE;
Find all documents containing a specific entity:
SELECT
er.file_id,
er.object_id,
er.email_id,
er.role,
er.confidence
FROM oc_openregister_entities e
JOIN oc_openregister_entity_relations er ON e.id = er.entity_id
WHERE e.value = 'John Doe' AND e.type = 'person';
9. API Endpoints
GET /api/entities/{id}/contact-info
- Get all contact information for a person/organization
GET /api/entities/{id}/parent
- Get parent entity (person/org) for contact info
GET /api/entities/{id}/occurrences
- Get all occurrences with role and source tracking
GET /api/gdpr/profile/{entityId}
- Complete GDPR profile with contact info and sources
GET /api/gdpr/documents/{entityId}
- All source documents containing this entity
GET /api/gdpr/anonymization-required
- List of entities requiring anonymization (by role)
Benefits
1. Simpler Data Model
- ✅ Direct foreign key instead of join table (belongs_to_entity_id)
- ✅ One query to get all contact info for a person
- ✅ Intuitive parent-child structure
- ✅ Better performance on common queries
2. GDPR Compliance
- ✅ Role-based anonymization decisions
- ✅ Context-aware entity handling (public figure vs private individual)
- ✅ Complete data subject profiles
- ✅ All contact information properly linked
3. Robust Source Tracking
- ✅ Always trace back to original document
- ✅ Survives re-chunking operations