Part 4 of the SOLID Principles for Scientific Programmers series
The Interface Bloat Problem
You’re building a data processing library for your research group. You create a DataSource interface that defines everything a data source might need to do: read data, write data, seek to positions, get metadata, validate checksums, compress, encrypt, and more.
Then someone wants to use your library with a simple CSV file reader. Now they have to implement all those methods, even though CSV files don’t support seeking, compression, or encryption. So they write dummy methods that throw “not supported” errors or just return None.
Later, someone writes code that assumes all DataSource objects support compression. It compiles fine, runs fine with some data sources, then crashes with the CSV reader because that feature isn’t really implemented.
Your well-intentioned interface has become a burden rather than a help.
This is the problem the Interface Segregation Principle (ISP) solves.
What Is the Interface Segregation Principle?
Robert C. Martin stated it as:
No client should be forced to depend on methods it does not use.
In practical terms for scientists: Break large interfaces into smaller, focused ones. Classes should only implement the interfaces they actually need, not be forced to implement dummy methods for features they don’t support.
Think of it like equipment capabilities: Not every instrument needs every feature. Your basic pH meter doesn’t need the same interface as your automated high-throughput system. They should implement different interfaces based on their actual capabilities.
A Real Example: The Problem
Let’s look at a data source abstraction that forces everything into one large interface:
from abc import ABC, abstractmethod
import numpy as np
from typing import Dict, Any, Optional
import json
class DataSource(ABC):
"""
Master interface for all data sources.
Every data source must implement ALL of these methods!
"""
@abstractmethod
def read(self) -> np.ndarray:
"""Read data from source."""
pass
@abstractmethod
def write(self, data: np.ndarray) -> None:
"""Write data to source."""
pass
@abstractmethod
def append(self, data: np.ndarray) -> None:
"""Append data to existing source."""
pass
@abstractmethod
def seek(self, position: int) -> None:
"""Seek to specific position in data stream."""
pass
@abstractmethod
def get_metadata(self) -> Dict[str, Any]:
"""Get metadata about the data source."""
pass
@abstractmethod
def set_metadata(self, metadata: Dict[str, Any]) -> None:
"""Set metadata for the data source."""
pass
@abstractmethod
def validate_checksum(self) -> bool:
"""Validate data integrity via checksum."""
pass
@abstractmethod
def compress(self) -> None:
"""Compress the data source."""
pass
@abstractmethod
def decompress(self) -> None:
"""Decompress the data source."""
pass
@abstractmethod
def encrypt(self, key: str) -> None:
"""Encrypt the data source."""
pass
@abstractmethod
def get_size(self) -> int:
"""Get size of data in bytes."""
pass
@abstractmethod
def close(self) -> None:
"""Close the data source."""
pass
# Simple CSV file - forced to implement everything!
class CSVDataSource(DataSource):
"""Read data from CSV file."""
def __init__(self, filename: str):
self.filename = filename
self.data = None
def read(self) -> np.ndarray:
"""Actually works - CSV files can be read."""
import pandas as pd
df = pd.read_csv(self.filename)
return df.values
def write(self, data: np.ndarray) -> None:
"""CSV files are read-only in our setup."""
raise NotImplementedError("CSV source is read-only")
def append(self, data: np.ndarray) -> None:
"""Can't append to CSV."""
raise NotImplementedError("CSV source doesn't support append")
def seek(self, position: int) -> None:
"""CSV files don't support seeking."""
raise NotImplementedError("CSV source doesn't support seeking")
def get_metadata(self) -> Dict[str, Any]:
"""CSV has no metadata format."""
return {} # Return empty dict
def set_metadata(self, metadata: Dict[str, Any]) -> None:
"""Can't store metadata in CSV."""
raise NotImplementedError("CSV doesn't support metadata")
def validate_checksum(self) -> bool:
"""CSV files have no checksum."""
return True # Just pretend it's valid
def compress(self) -> None:
"""CSV compression not supported."""
raise NotImplementedError("CSV compression not supported")
def decompress(self) -> None:
"""CSV decompression not supported."""
raise NotImplementedError("CSV decompression not supported")
def encrypt(self, key: str) -> None:
"""CSV encryption not supported."""
raise NotImplementedError("CSV encryption not supported")
def get_size(self) -> int:
"""Can implement this one."""
import os
return os.path.getsize(self.filename)
def close(self) -> None:
"""Nothing to close for CSV."""
pass # Do nothing
# Database source - also forced to implement everything!
class DatabaseDataSource(DataSource):
"""Read data from database."""
def __init__(self, connection_string: str, table: str):
self.connection_string = connection_string
self.table = table
def read(self) -> np.ndarray:
"""Can read from database."""
# Simulate database read
return np.random.rand(100, 10)
def write(self, data: np.ndarray) -> None:
"""Can write to database."""
# Implementation here
pass
def append(self, data: np.ndarray) -> None:
"""Can append to database."""
# Implementation here
pass
def seek(self, position: int) -> None:
"""Databases don't work like files - no seeking."""
raise NotImplementedError("Database doesn't support seeking")
def get_metadata(self) -> Dict[str, Any]:
"""Can get metadata from database."""
return {'table': self.table, 'rows': 100}
def set_metadata(self, metadata: Dict[str, Any]) -> None:
"""Databases don't store arbitrary metadata this way."""
raise NotImplementedError("Database metadata is schema-defined")
def validate_checksum(self) -> bool:
"""Databases have their own integrity checking."""
raise NotImplementedError("Use database DBCC instead")
def compress(self) -> None:
"""Database compression is at server level."""
raise NotImplementedError("Database compression not available")
def decompress(self) -> None:
"""Database decompression is at server level."""
raise NotImplementedError("Database decompression not available")
def encrypt(self, key: str) -> None:
"""Database encryption is at server level."""
raise NotImplementedError("Database encryption not available")
def get_size(self) -> int:
"""Can query size from database."""
# Implementation here
return 1024
def close(self) -> None:
"""Must close database connection."""
# Close connection
pass
# Now code that uses these sources has problems
def process_with_compression(source: DataSource):
"""Process data with compression for efficiency."""
data = source.read()
source.compress() # Might crash if not supported!
# Process data
return data
def backup_data(source: DataSource, backup_source: DataSource):
"""Backup data from one source to another."""
data = source.read()
backup_source.write(data) # Crashes if backup is read-only!
# Usage - lots of runtime errors!
csv = CSVDataSource("data.csv")
process_with_compression(csv) # Crashes - CSV doesn't compress!
csv2 = CSVDataSource("backup.csv")
backup_data(csv, csv2) # Crashes - CSV is read-only!
Problems with This Design
Problem 1: Forced to Implement Irrelevant Methods
The CSV reader has to implement 9 methods it doesn’t support, filling them with NotImplementedError or dummy returns.
Problem 2: Runtime Errors Instead of Compile-Time Safety
Code compiles fine even when using unsupported features. Errors only appear at runtime.
Problem 3: Misleading Interface
The interface promises features (compression, encryption) that many implementations don’t actually provide.
Problem 4: Difficult to Understand
Looking at CSVDataSource, which methods actually work? You have to read the implementation to find out.
Problem 5: Fragile Code
Code written against the interface can’t rely on anything working. Every method call is a potential runtime error.
The Solution: Interface Segregation Principle
Let’s break this monolithic interface into focused, cohesive interfaces:
from abc import ABC, abstractmethod
import numpy as np
from typing import Dict, Any, Protocol
import pandas as pd
# CORE INTERFACE: Minimum that all data sources provide
class Readable(ABC):
"""Data sources that can be read."""
@abstractmethod
def read(self) -> np.ndarray:
"""Read data from source."""
pass
class Writeable(ABC):
"""Data sources that can be written to."""
@abstractmethod
def write(self, data: np.ndarray) -> None:
"""Write data to source."""
pass
class Appendable(ABC):
"""Data sources that support appending."""
@abstractmethod
def append(self, data: np.ndarray) -> None:
"""Append data to existing source."""
pass
class Seekable(ABC):
"""Data sources that support seeking."""
@abstractmethod
def seek(self, position: int) -> None:
"""Seek to specific position."""
pass
@abstractmethod
def tell(self) -> int:
"""Return current position."""
pass
class HasMetadata(ABC):
"""Data sources with metadata support."""
@abstractmethod
def get_metadata(self) -> Dict[str, Any]:
"""Get metadata."""
pass
@abstractmethod
def set_metadata(self, metadata: Dict[str, Any]) -> None:
"""Set metadata."""
pass
class Compressible(ABC):
"""Data sources that support compression."""
@abstractmethod
def compress(self) -> None:
"""Compress the data."""
pass
@abstractmethod
def decompress(self) -> None:
"""Decompress the data."""
pass
@abstractmethod
def is_compressed(self) -> bool:
"""Check if data is compressed."""
pass
class Encryptable(ABC):
"""Data sources that support encryption."""
@abstractmethod
def encrypt(self, key: str) -> None:
"""Encrypt the data."""
pass
@abstractmethod
def decrypt(self, key: str) -> None:
"""Decrypt the data."""
pass
@abstractmethod
def is_encrypted(self) -> bool:
"""Check if data is encrypted."""
pass
class Validatable(ABC):
"""Data sources that support integrity checking."""
@abstractmethod
def validate_checksum(self) -> bool:
"""Validate data integrity."""
pass
@abstractmethod
def compute_checksum(self) -> str:
"""Compute checksum of data."""
pass
class Sizeable(ABC):
"""Data sources with queryable size."""
@abstractmethod
def get_size(self) -> int:
"""Get size in bytes."""
pass
class Closeable(ABC):
"""Resources that need cleanup."""
@abstractmethod
def close(self) -> None:
"""Close and cleanup resources."""
pass
# Now implementations only implement what they support!
class CSVDataSource(Readable, Sizeable):
"""
CSV file reader - only implements what it supports.
Readable: Yes, can read CSV files
Sizeable: Yes, can get file size
"""
def __init__(self, filename: str):
self.filename = filename
def read(self) -> np.ndarray:
"""Read CSV file."""
df = pd.read_csv(self.filename)
return df.values
def get_size(self) -> int:
"""Get file size."""
import os
return os.path.getsize(self.filename)
class DatabaseSource(Readable, Writable, Appendable, HasMetadata, Sizeable, Closeable):
"""
Database source - implements many interfaces.
Can read, write, append, has metadata, has size, needs closing.
"""
def __init__(self, connection_string: str, table: str):
self.connection_string = connection_string
self.table = table
self.connection = None
def read(self) -> np.ndarray:
"""Read from database."""
# Implementation
return np.random.rand(100, 10)
def write(self, data: np.ndarray) -> None:
"""Write to database."""
# Implementation
pass
def append(self, data: np.ndarray) -> None:
"""Append to database."""
# Implementation
pass
def get_metadata(self) -> Dict[str, Any]:
"""Get table metadata."""
return {'table': self.table, 'rows': 100}
def set_metadata(self, metadata: Dict[str, Any]) -> None:
"""Update table metadata."""
# Implementation
pass
def get_size(self) -> int:
"""Get table size."""
return 1024
def close(self) -> None:
"""Close database connection."""
if self.connection:
self.connection.close()
class BinaryFileSource(Readable, Writable, Seekable, Compressible, Sizeable, Closeable):
"""
Binary file with full features.
Supports reading, writing, seeking, compression.
"""
def __init__(self, filename: str):
self.filename = filename
self.file = None
self.compressed = False
def read(self) -> np.ndarray:
"""Read binary data."""
return np.load(self.filename)
def write(self, data: np.ndarray) -> None:
"""Write binary data."""
np.save(self.filename, data)
def seek(self, position: int) -> None:
"""Seek in file."""
if self.file:
self.file.seek(position)
def tell(self) -> int:
"""Get current position."""
return self.file.tell() if self.file else 0
def compress(self) -> None:
"""Compress file."""
import gzip
# Compression implementation
self.compressed = True
def decompress(self) -> None:
"""Decompress file."""
# Decompression implementation
self.compressed = False
def is_compressed(self) -> bool:
"""Check compression status."""
return self.compressed
def get_size(self) -> int:
"""Get file size."""
import os
return os.path.getsize(self.filename)
def close(self) -> None:
"""Close file."""
if self.file:
self.file.close()
class InMemorySource(Readable, Writable):
"""
Simple in-memory data source.
Just readable and writable - nothing fancy.
"""
def __init__(self):
self.data = None
def read(self) -> np.ndarray:
"""Read from memory."""
return self.data
def write(self, data: np.ndarray) -> None:
"""Write to memory."""
self.data = data.copy()
# Now functions specify exactly what they need!
def read_data(source: Readable) -> np.ndarray:
"""Read data from any readable source."""
return source.read()
def backup_data(source: Readable, destination: Writable) -> None:
"""Backup data from readable to writable source."""
data = source.read()
destination.write(data)
def compress_if_possible(source: Readable) -> None:
"""Compress source if it supports compression."""
if isinstance(source, Compressible):
source.compress()
print("Data compressed")
else:
print("Source doesn't support compression - skipping")
def process_with_metadata(source: Readable & HasMetadata) -> Dict[str, Any]:
"""Process data and return metadata (requires both interfaces)."""
data = source.read()
metadata = source.get_metadata()
# Process data
metadata['processed'] = True
return metadata
def safe_close(source: Any) -> None:
"""Close source if it needs closing."""
if isinstance(source, Closeable):
source.close()
# Usage - type-safe and clear!
# CSV reading works
csv = CSVDataSource("data.csv")
data = read_data(csv) # Works - CSV is Readable
# Compression is optional
compress_if_possible(csv) # Prints "doesn't support compression"
binary = BinaryFileSource("data.npy")
compress_if_possible(binary) # Actually compresses
# Backup works with any combination
db = DatabaseSource("connection", "table")
backup_data(csv, db) # CSV (Readable) → Database (Writable) ✓
backup_data(db, binary) # Database → Binary ✓
# Compile-time safety with type hints
def needs_writable(dest: Writable) -> None:
dest.write(np.array([1, 2, 3]))
# needs_writable(csv) # Type checker catches this! CSV isn't Writable
# Metadata only works with sources that have it
# metadata = process_with_metadata(csv) # Type error - CSV doesn't have HasMetadata
metadata = process_with_metadata(db) # Works - Database has HasMetadata
# Cleanup is explicit
safe_close(csv) # Does nothing - CSV doesn't need closing
safe_close(db) # Closes database connection
Why This Is Better
1. Implement Only What You Support
CSVDataSource implements 2 interfaces instead of 12 methods. No dummy implementations!
2. Type Safety
Functions declare exactly what they need:
def backup_data(source: Readable, dest: Writable)
Type checkers verify you’re passing compatible sources at compile time.
3. Clear Capabilities
Looking at class declaration tells you exactly what it supports:
class BinaryFileSource(Readable, Writable, Seekable, Compressible, ...)
4. Flexible Combinations
Different sources implement different combinations of interfaces based on their actual capabilities.
5. Graceful Feature Detection
if isinstance(source, Compressible):
source.compress()
Code can check for optional features without causing errors.
Python-Specific: Protocols vs ABCs
Python 3.8+ added Protocols for structural typing (duck typing with type hints):
from typing import Protocol
class Readable(Protocol):
"""Readable as a Protocol - no inheritance needed!"""
def read(self) -> np.ndarray:
"""Read data."""
...
# Any class with a read() method is automatically Readable!
class MyCustomSource:
def read(self) -> np.ndarray:
return np.array([1, 2, 3])
def process(source: Readable):
"""Works with anything that has read()."""
return source.read()
# No inheritance needed!
custom = MyCustomSource()
process(custom) # Type checker is happy
Use Protocols when:
- You want duck typing with type checking
- Third-party classes can’t inherit from your interfaces
- You want minimal coupling
Use ABCs when:
- You want to enforce implementation
- You want to provide shared method implementations
- You want explicit interface contracts
Real-World Example: Instrument Interface
Here’s how ISP applies to laboratory instruments:
# BAD: One giant instrument interface
class Instrument(ABC):
@abstractmethod
def initialize(self): pass
@abstractmethod
def measure(self): pass
@abstractmethod
def calibrate(self): pass
@abstractmethod
def get_temperature(self): pass # Not all instruments measure temperature!
@abstractmethod
def get_pressure(self): pass # Not all measure pressure!
@abstractmethod
def set_wavelength(self, wl): pass # Only spectroscopy!
@abstractmethod
def start_scan(self): pass # Only scanning instruments!
# GOOD: Segregated interfaces
class Instrument(ABC):
"""Base interface - what ALL instruments have."""
@abstractmethod
def initialize(self): pass
@abstractmethod
def shutdown(self): pass
class Measurable(ABC):
"""Instruments that take measurements."""
@abstractmethod
def measure(self) -> Measurement: pass
class Calibratable(ABC):
"""Instruments that support calibration."""
@abstractmethod
def calibrate(self, reference: Any): pass
class TemperatureSensor(ABC):
"""Instruments that measure temperature."""
@abstractmethod
def get_temperature(self) -> float: pass
class Spectrometer(ABC):
"""Spectroscopy-specific features."""
@abstractmethod
def set_wavelength(self, wavelength: float): pass
@abstractmethod
def get_spectrum(self) -> np.ndarray: pass
class Scanner(ABC):
"""Scanning instruments."""
@abstractmethod
def start_scan(self): pass
@abstractmethod
def stop_scan(self): pass
# Now implementations are clean
class UVVisSpectrometer(Instrument, Spectrometer, Measurable, Calibratable):
"""UV-Vis spectrometer with all its actual capabilities."""
pass
class SimpleThermocouple(Instrument, TemperatureSensor):
"""Simple thermocouple - just temperature, no calibration."""
pass
class ScanningTunnelMicroscope(Instrument, Scanner, Measurable):
"""STM - scanning and measuring, but not a spectrometer."""
pass
The Role Interface vs Interface Segregation
Sometimes you need role-based interfaces that group related functionality:
# ROLE: Data analysis pipeline component
class PipelineStage(Readable, Writable):
"""A stage in a pipeline must read input and write output."""
pass
# ROLE: Archivable data
class Archivable(Readable, Compressible, Validatable):
"""Data that can be archived needs these three capabilities."""
pass
# Specific implementation combines roles
class AnalysisResult(PipelineStage, Archivable, HasMetadata):
"""Analysis results play multiple roles."""
pass
This is fine! Role interfaces compose smaller interfaces. The key: don’t force implementation of methods that aren’t needed.
Testing with Segregated Interfaces
Small interfaces make testing easier:
import unittest
from unittest.mock import Mock
class TestDataProcessing(unittest.TestCase):
def test_read_data(self):
"""Test with minimal mock - only need read() method."""
mock_source = Mock(spec=Readable)
mock_source.read.return_value = np.array([1, 2, 3])
result = read_data(mock_source)
self.assertEqual(len(result), 3)
mock_source.read.assert_called_once()
def test_backup_data(self):
"""Test with two mocks - one readable, one writable."""
mock_source = Mock(spec=Readable)
mock_source.read.return_value = np.array([1, 2, 3])
mock_dest = Mock(spec=Writable)
backup_data(mock_source, mock_dest)
mock_source.read.assert_called_once()
mock_dest.write.assert_called_once()
# With fat interface, would need to mock 12 methods!
# With ISP, only mock what you need!
Common Mistakes
Mistake 1: Too Many Small Interfaces
Don’t create an interface for every single method:
# TOO GRANULAR
class CanGetX(ABC):
@abstractmethod
def get_x(self): pass
class CanGetY(ABC):
@abstractmethod
def get_y(self): pass
# BETTER: Group related operations
class Point2D(ABC):
@abstractmethod
def get_x(self): pass
@abstractmethod
def get_y(self): pass
Mistake 2: Interfaces That Are Still Too Large
# STILL TOO LARGE
class DataProcessing(ABC):
@abstractmethod
def load(self): pass
@abstractmethod
def clean(self): pass
@abstractmethod
def transform(self): pass
@abstractmethod
def analyze(self): pass
@abstractmethod
def visualize(self): pass
# BETTER: Split by client needs
class DataLoader(ABC):
@abstractmethod
def load(self): pass
class DataCleaner(ABC):
@abstractmethod
def clean(self, data): pass
class DataAnalyzer(ABC):
@abstractmethod
def analyze(self, data): pass
When to Segregate Interfaces
Segregate when:
- ✅ Different clients need different subsets of functionality
- ✅ Implementations often don’t support all features
- ✅ You find yourself writing many
raise NotImplementedError - ✅ You have optional features that not all implementations provide
Don’t segregate when:
- ✅ All implementations genuinely need all methods
- ✅ The methods are tightly coupled (can’t use one without the others)
- ✅ You’re creating an internal class, not a public interface
- ✅ Segregation would create more confusion than clarity
The Client Perspective
ISP is about clients (code that uses interfaces), not implementations:
# This function only needs reading
def analyze_data(source: Readable):
data = source.read()
return np.mean(data)
# Works with ANY readable source!
analyze_data(csv)
analyze_data(db)
analyze_data(binary)
analyze_data(memory)
# Doesn't care about compression, encryption, metadata, etc.
Summary
The Interface Segregation Principle says: No client should be forced to depend on methods it does not use.
Following ISP in scientific code:
- Reduces dummy implementations
- Improves type safety
- Makes capabilities explicit
- Enables flexible combinations
- Simplifies testing
- Creates more maintainable code
The key insight: Design interfaces from the client’s perspective, not the implementer’s. Ask “What does this client need?” not “What can this implementation do?”
Practical Guidelines
Before creating an interface, ask:
- Will all implementations support all methods?
- Do all clients need all methods?
- Are there natural groupings of methods?
- Can I split this into smaller, focused interfaces?
If you find yourself:
- Writing
raise NotImplementedErrorfrequently - Implementing dummy methods
- Checking
isinstancebefore calling methods - → Your interface needs segregation
Next Steps
- Find a large interface in your code
- List which implementations support which methods
- Group methods by which clients use them together
- Split into smaller interfaces
- Update implementations to only implement what they support
In the final post of this series, we’ll bring together all five SOLID principles and discuss when to apply them (and when not to) in real scientific programming scenarios.
Have you written raise NotImplementedError more times than you’d like? Share your interface horror stories in the comments!
Previous posts in this series:
- Single Responsibility Principle for Scientists
- Open/Closed Principle for Scientists
- Liskov Substitution Principle for Scientists
Next in this series:
- The Dependency Inversion Principle for Scientists - Coming next week
Related posts:
- Duck Typing vs. Explicit Interfaces
- Protocol Classes in Python 3.8+
- Designing Composable Interfaces for Laboratory Software