Scholar Module (stx.scholar)

Literature management: search papers, download PDFs, enrich BibTeX, and organize a local library across multiple projects.

Quick Reference

from scitex.scholar import Scholar

scholar = Scholar(project="my_research")

# Load and enrich BibTeX
papers = scholar.load_bibtex("references.bib")
enriched = scholar.enrich_papers(papers)
# Adds: DOIs, abstracts, citation counts, impact factors

# Save to library and export
scholar.save_papers_to_library(enriched)
scholar.save_papers_as_bibtex(enriched, "enriched.bib")

# Search your library
results = scholar.search_library("neural oscillations")

# Download PDFs
scholar.download_pdfs(dois, output_dir)

CLI Usage

# Full pipeline from BibTeX
scitex scholar bibtex refs.bib --project myresearch --num-workers 8

# Search papers
scitex scholar search "deep learning EEG"

# Download PDFs
scitex scholar download --doi 10.1038/nature12373

# Institutional authentication
scitex scholar auth --method openathens
scitex scholar auth --method shibboleth --institution "MIT"

Data Sources

Searches and enriches from:

  • CrossRef (167M+ papers) – DOI resolution, citation counts

  • Semantic Scholar – Abstracts, references, influence scores

  • PubMed – Biomedical literature

  • arXiv – Preprints

  • OpenAlex (284M+ works) – Open metadata

Key Classes

  • Scholar – Main entry point (search, enrich, download, organize)

  • Paper – Type-safe metadata container (Pydantic model)

  • Papers – Collection with filtering, sorting, and export

  • ScholarConfig – YAML-based configuration

  • ScholarLibrary – Local library storage and caching

Paper Metadata

Each Paper contains structured metadata sections:

paper.metadata.basic          # title, authors, year, abstract, keywords
paper.metadata.id             # DOI, arXiv, PMID, Semantic Scholar ID
paper.metadata.publication    # journal, impact factor, volume, issue
paper.metadata.citation_count # total + yearly breakdown (2015--2024)
paper.metadata.url            # DOI URL, publisher, arXiv, PDFs
paper.metadata.access         # open access status, license

Filtering and Sorting

# Criteria-based filtering
recent = papers.filter(year_min=2020, has_doi=True)
elite = papers.filter(min_impact_factor=10, min_citations=500)

# Lambda filtering
custom = papers.filter(lambda p: "EEG" in (p.metadata.basic.title or ""))

# Sorting
papers.sort_by("year", reverse=True)
papers.sort_by("citation_count", reverse=True)

# Chaining
top_recent = papers.filter(year_min=2020).sort_by("citation_count", reverse=True)

Project Organization

scholar = Scholar(project="review_paper")
scholar.list_projects()
papers = scholar.load_project()

# Export to multiple formats
scholar.save_papers_as_bibtex(papers, "output.bib")
papers.to_dataframe()  # pandas DataFrame

Storage Architecture

~/.scitex/scholar/library/
+-- MASTER/                     # Centralized master storage
|   +-- 8DIGIT01/              # Hash-based unique ID from DOI
|   |   +-- metadata.json
|   |   +-- paper.pdf
+-- project_name/               # Project-specific symlinks
    +-- Author-Year-Journal -> ../MASTER/8DIGIT01

API Reference

SciTeX Scholar – scientific paper search, enrichment, and management.

Quick Start:

from scitex_scholar import Scholar, Paper, Papers

scholar = Scholar() papers = scholar.search(“deep learning”) papers.save(“results.bib”)

Installation:

pip install scitex-scholar

This module uses PEP 562 lazy __getattr__ so import scitex_scholar stays under 500ms cold-start. Submodules are imported on first attribute access only.

class scitex.scholar.Scholar(config=None, project=None, project_description=None, browser_mode=None)[source]

Bases: EnricherMixin, URLFindingMixin, PDFDownloadMixin, LoaderMixin, SearchMixin, SaverMixin, ProjectHandlerMixin, LibraryHandlerMixin, PipelineMixin, ServiceMixin

Main interface for SciTeX Scholar - scientific literature management made simple.

By default, papers are automatically enriched with:

  • Journal impact factors from impact_factor package (2024 JCR data)

  • Citation counts from Semantic Scholar (via DOI/title matching)

Examples

Basic search with automatic enrichment:

scholar = Scholar()
papers = scholar.search("deep learning neuroscience")
# Papers now have impact_factor and citation_count populated
papers.save("my_pac.bib")

Disable automatic enrichment if needed:

config = ScholarConfig(enable_auto_enrich=False)
scholar = Scholar(config=config)

Search a specific source:

papers = scholar.search("transformer models", sources='arxiv')

Advanced workflow:

papers = (
    scholar.search("transformer models", year_min=2020)
           .filter(min_citations=50)
           .sort_by("impact_factor")
           .save("transformers.bib")
)

Local library:

scholar._index_local_pdfs("./my_papers")
local_papers = scholar.search_local("attention mechanism")
property name

Class name for logging.

__init__(config=None, project=None, project_description=None, browser_mode=None)[source]

Initialize Scholar with configuration.

Parameters:
  • config (Union[ScholarConfig, str, Path, None]) –

    One of:

    • ScholarConfig instance

    • Path to YAML config file (str or Path)

    • None (uses ScholarConfig.load() to find config)

  • project (Optional[str]) – Default project name for operations.

  • project_description (Optional[str]) – Optional description for the project.

  • browser_mode (Optional[str]) – Browser mode ('stealth', 'interactive', 'manual').

class scitex.scholar.Paper(**data)[source]

Bases: BaseModel

Complete paper with metadata and container.

metadata: PaperMetadataStructure
container: ContainerMetadata
model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_dump(**kwargs)[source]

Custom serialization to ensure all nested models use aliases.

Return type:

Dict[str, Any]

classmethod from_dict(data)[source]

Create from dictionary (for loading from JSON).

Uses Pydantic’s model_validate which handles: - Type validation - Type coercion (e.g., “2024” -> 2024) - Field aliases (e.g., “2025” -> y2025)

Return type:

Paper

to_dict()[source]

Convert to dictionary for JSON serialization.

Alias for model_dump() for backward compatibility.

Return type:

Dict[str, Any]

detect_open_access(use_unpaywall=False, update_metadata=True)[source]

Detect open access status for this paper.

Uses identifiers (DOI, arXiv ID, PMCID) and known OA sources to determine if the paper is freely available.

Parameters:
  • use_unpaywall (bool) – If True, query Unpaywall API for uncertain cases

  • update_metadata (bool) – If True, update self.metadata.access with results

Return type:

OAResult

Returns:

OAResult with detection results

property is_open_access: bool

Check if paper is open access (quick check without API calls).

class scitex.scholar.Papers(papers=None, project=None, config=None)[source]

Bases: object

A simple collection of Paper objects.

This is a minimal collection class. Most business logic (loading, saving, enrichment, etc.) is handled by Scholar.

Methods have been reduced from 39 to ~15 for simplicity. Complex operations should use Scholar or utility functions.

__init__(papers=None, project=None, config=None)[source]

Initialize Papers collection.

Parameters:
__len__()[source]

Number of papers in collection.

Return type:

int

__iter__()[source]

Iterate over papers.

Return type:

Iterator[Paper]

__getitem__(index)[source]

Get paper(s) by index or slice.

Parameters:

index (Union[int, slice]) – Integer index or slice

Return type:

Union[Paper, Papers]

Returns:

Single Paper if integer index, Papers collection if slice

__repr__()[source]

String representation.

Return type:

str

__str__()[source]

Human-readable string.

Return type:

str

__dir__()[source]

Custom dir for better discoverability.

Return type:

List[str]

property papers: List[Paper]

Get the underlying papers list.

append(paper)[source]

Add a paper to the collection.

Parameters:

paper (Paper) – Paper to add

Return type:

None

extend(papers)[source]

Add multiple papers to the collection.

Parameters:

papers (Union[List[Paper], Papers]) – List of papers or another Papers collection

Return type:

None

to_list()[source]

Get papers as a list.

Return type:

List[Paper]

Returns:

List of Paper objects

filter(condition=None, year_min=None, year_max=None, has_doi=None, has_abstract=None, has_pdf=None, min_citations=None, max_citations=None, min_impact_factor=None, max_impact_factor=None, journal=None, author=None, keyword=None, publisher=None, **kwargs)[source]

Filter papers by condition or criteria.

Parameters:
  • condition (Optional[Callable[[Paper], bool]]) – Function that takes a Paper and returns bool.

  • year_min (Optional[int]) – Minimum year.

  • year_max (Optional[int]) – Maximum year.

  • has_doi (Optional[bool]) – Filter papers with/without DOI.

  • has_abstract (Optional[bool]) – Filter papers with/without abstract.

  • has_pdf (Optional[bool]) – Filter papers with/without PDF URL.

  • min_citations (Optional[int]) – Minimum citation count.

  • max_citations (Optional[int]) – Maximum citation count.

  • min_impact_factor (Optional[float]) – Minimum journal impact factor.

  • max_impact_factor (Optional[float]) – Maximum journal impact factor.

  • journal (Optional[str]) – Journal name (partial match).

  • author (Optional[str]) – Author name (partial match).

  • keyword (Optional[str]) – Keyword (searches in keywords, title, abstract).

  • publisher (Optional[str]) – Publisher name (partial match).

  • **kwargs – Additional keyword arguments for backward compatibility.

Returns:

New Papers collection with filtered papers.

Return type:

Papers

Examples

Filter using a lambda condition:

high_impact = papers.filter(lambda p: p.journal_impact_factor and p.journal_impact_factor > 10)
highly_cited = papers.filter(lambda p: p.citation_count and p.citation_count > 500)
recent = papers.filter(lambda p: p.year and p.year >= 2020)

Filter using built-in parameters:

high_impact_v2 = papers.filter(min_impact_factor=10.0)
highly_cited_v2 = papers.filter(min_citations=500)
recent_v2 = papers.filter(year_min=2020)

Combine multiple parameters:

filtered = papers.filter(
    min_impact_factor=5.0,
    min_citations=100,
    year_min=2015,
    year_max=2023,
    journal="Nature",
    has_doi=True,
)

Chain filters for AND logic:

elite_recent = papers.filter(min_impact_factor=10).filter(year_min=2020)
sort_by(*criteria, reverse=False, **kwargs)[source]

Sort papers by criteria.

Parameters:
  • *criteria – Field names (as strings) or lambda functions to sort by.

  • reverse (bool) – Sort in descending order (default: False).

  • **kwargs – Additional options.

Returns:

New sorted Papers collection.

Return type:

Papers

Notes

Available Paper fields for sorting:

  • title – Paper title

  • year – Publication year

  • citation_count – Number of citations

  • journal_impact_factor – Journal impact factor

  • journal – Journal name

  • publisher – Publisher name

  • doi – Digital Object Identifier

  • created_at – When record was created

  • updated_at – When record was last updated

Examples

Sort by a single field:

by_year = papers.sort_by('year')
by_citations_desc = papers.sort_by('citation_count', reverse=True)

Sort by multiple fields (primary, secondary, etc.):

by_year_then_citations = papers.sort_by('year', 'citation_count')

Sort using a lambda function:

by_citations = papers.sort_by(lambda p: p.citation_count or 0, reverse=True)
by_year_safe = papers.sort_by(lambda p: p.year if p.year else 9999)

Sort by a computed value:

by_citation_per_year = papers.sort_by(
    lambda p: (p.citation_count or 0) / (2024 - p.year) if p.year else 0,
    reverse=True,
)
classmethod from_bibtex(bibtex_input)[source]

Load papers from BibTeX.

DEPRECATED: Use Scholar.from_bibtex() instead. This method is kept for backward compatibility.

Parameters:

bibtex_input (Union[str, Path]) – Path to BibTeX file or BibTeX string

Return type:

Papers

Returns:

Papers collection

save(output_path, format='auto', **kwargs)[source]

Save papers to file.

DEPRECATED: Use Scholar.save_papers() or Scholar.export_bibtex() instead. This method is kept for backward compatibility.

Parameters:
  • output_path (Union[str, Path]) – Path to save file

  • format (Optional[str]) – Output format (auto, bibtex, json, csv)

  • **kwargs – Additional options

Return type:

None

to_dict()[source]

Convert to dictionary.

DEPRECATED: Use papers_utils.papers_to_dict() for new code.

Return type:

List[Dict[str, Any]]

Returns:

Dictionary representation

to_dataframe()[source]

Convert to pandas DataFrame.

DEPRECATED: Use papers_utils.papers_to_dataframe() for new code.

Return type:

Any

Returns:

DataFrame with papers data

summary()[source]

Get summary statistics.

DEPRECATED: Use papers_utils.papers_statistics() for new code.

Return type:

Dict[str, Any]

Returns:

Dictionary with statistics

class scitex.scholar.ScholarConfig(config_path=None, scholar_dir=None)[source]

Bases: object

__init__(config_path=None, scholar_dir=None)[source]

Initialize ScholarConfig.

Parameters:
  • config_path (Union[str, Path, None]) – Path to custom config YAML file

  • scholar_dir (Union[str, Path, None]) – Direct path to scholar directory (e.g., /data/users/alice/.scitex) This bypasses SCITEX_DIR env var for thread-safe multi-user usage. Use this in Django/multi-user environments to avoid race conditions.

__getattr__(name)[source]

Delegate all get_* methods to path_manager.

__dir__()[source]

Include path_manager’s get_* methods in dir() output.

resolve(key, direct_val=None, default=None, type=<class 'str'>, mask=None)[source]

Resolve configuration value with precedence: direct → config → env → default

get(key)[source]

Get value from config dict only

print()[source]

Print how each config was resolved

clear_log()[source]

Clear resolution log

load_yaml(path)[source]
Return type:

dict

classmethod load(path=None)[source]
property paths

Access to path manager for organized directory structure

class scitex.scholar.ScholarAuthManager(email_openathens=None, email_ezproxy=None, email_shibboleth=None, config=None)[source]

Bases: object

Manages multiple authentication providers.

This class coordinates between different authentication methods (OpenAthens, Lean Library, etc.) and provides a unified interface.

__init__(email_openathens=None, email_ezproxy=None, email_shibboleth=None, config=None)[source]

Initialize the authentication manager.

Parameters:
  • email_openathens (Optional[str]) – User’s institutional email for OpenAthens authentication

  • email_ezproxy (Optional[str]) – User’s institutional email for EZProxy authentication

  • email_shibboleth (Optional[str]) – User’s institutional email for Shibboleth authentication

  • config (Optional[ScholarConfig]) – ScholarConfig instance (creates new if None)

async ensure_authenticate_async(provider_name=None, verify_live=True, **kwargs)[source]
Return type:

bool

async is_authenticate_async(verify_live=True)[source]

Check if authenticate_async with any provider.

Return type:

bool

async authenticate_async(provider_name=None, **kwargs)[source]

Authenticate with specified or active provider.

Return type:

dict

async get_auth_headers_async()[source]

Get authentication headers from active provider.

Return type:

Dict[str, str]

async get_auth_options()[source]
Return type:

dict

async get_auth_cookies_async(essential_only=True)[source]

Get authentication cookies from active provider.

Return type:

List[Dict[str, Any]]

set_active_provider(name)[source]

Set the active authentication provider.

Return type:

None

get_active_provider()[source]

Get the currently active provider.

Return type:

Optional[BaseAuthenticator]

async logout_async()[source]

Log out from all providers.

Return type:

None

list_providers()[source]

List all registered providers.

Return type:

List[str]

class scitex.scholar.ScholarBrowserManager(browser_mode=None, auth_manager=None, chrome_profile_name=None, config=None)[source]

Bases: BrowserMixin

Manages a local browser instance with stealth enhancements and invisible mode.

__init__(browser_mode=None, auth_manager=None, chrome_profile_name=None, config=None)[source]

Initialize ScholarBrowserManager with invisible browser capabilities.

Parameters:
  • auth_manager – Authentication manager instance

  • config (ScholarConfig) – Scholar configuration instance

async get_authenticated_browser_and_context_async(**context_options)[source]

Get browser context with authentication cookies and extensions loaded.

Return type:

tuple[Browser, BrowserContext]

async take_screenshot_async(page, path, timeout_sec=30.0, timeout_after_sec=30.0, full_page=False)[source]

Take screenshot without viewport changes.

async start_periodic_screenshots_async(page, output_dir, prefix='periodic', interval_seconds=1, duration_seconds=10, verbose=False)[source]

Start taking periodic screenshots in the background.

Parameters:
  • page – The page to screenshot

  • prefix (str) – Prefix for screenshot filenames

  • interval_seconds (int) – Seconds between screenshots

  • duration_seconds (int) – Total duration to take screenshots (0 = infinite)

  • verbose (bool) – Whether to log each screenshot

Returns:

asyncio.Task that can be cancelled to stop screenshots

async stop_periodic_screenshots_async(task)[source]

Stop periodic screenshots task.

async close()[source]

Close browser while preserving authentication and extension data.

class scitex.scholar.ScholarURLFinder(context, config=None)[source]

Bases: object

Find PDF URLs from web pages.

Simple, focused responsibility: - Input: Page or URL string - Output: List of PDF URLs

Authentication/DOI resolution should be handled BEFORE calling this.

PAGE_LOAD_TIMEOUT = 30000
async find_pdf_urls(page_or_url, base_url=None)[source]

Find PDF URLs from page or URL string.

Parameters:
  • page_or_url (Union[Page, str]) – Playwright Page object or URL string

  • base_url (Optional[str]) – Optional base URL for the page

Returns:

[{“url”: “…”, “source”: “zotero_translator”}]

Return type:

List of PDF URL dicts

class scitex.scholar.CitationGraphBuilder(db_path=None, api_url=None)[source]

Bases: object

Build citation network graphs for academic papers.

Auto-detects backend via crossref_local.Config (DB → HTTP).

Example (auto-detect):
>>> builder = CitationGraphBuilder()
>>> graph = builder.build("10.1038/s41586-020-2008-3", top_n=20)
Example (explicit SQLite):
>>> builder = CitationGraphBuilder(db_path="/path/to/crossref.db")
Example (explicit HTTP):
>>> builder = CitationGraphBuilder(api_url="http://localhost:31291")
__init__(db_path=None, api_url=None)[source]

Initialize builder with database path, HTTP API URL, or auto-detect.

When no args given, delegates to crossref_local.Config for auto-detection: 1. CROSSREF_LOCAL_MODE env var (explicit “db” or “http”) 2. CROSSREF_LOCAL_API_URL env var → HTTP mode 3. Local DB file existence → DB mode 4. Fallback to HTTP mode

Parameters:
  • db_path (str) – Path to CrossRef SQLite database (local mode)

  • api_url (str) – URL of crossref-local HTTP API (HTTP mode)

build(seed_doi, top_n=20, weight_coupling=2.0, weight_cocitation=2.0, weight_direct=1.0)[source]

Build citation network around a seed paper.

Parameters:
  • seed_doi (str) – DOI of the seed paper

  • top_n (int) – Number of most similar papers to include

  • weight_coupling (float) – Weight for bibliographic coupling

  • weight_cocitation (float) – Weight for co-citation

  • weight_direct (float) – Weight for direct citations

Return type:

CitationGraph

Returns:

CitationGraph object with nodes and edges

build_from_dois(dois, num_related_per_doi=20, weight_coupling=2.0, weight_cocitation=2.0, weight_direct=1.0)[source]

Build citation network from multiple seed DOIs.

Combines similarity scores from all seeds to find papers related to the entire set, producing a richer connected graph.

Parameters:
  • dois (List[str]) – List of seed DOIs

  • num_related_per_doi (int) – Number of related papers to discover per DOI

  • weight_coupling (float) – Weight for bibliographic coupling

  • weight_cocitation (float) – Weight for co-citation

  • weight_direct (float) – Weight for direct citations

Return type:

CitationGraph

Returns:

CitationGraph with all seeds + related papers + edges

build_from_query(query, num_related_per_doi=20, search_limit=10, weight_coupling=2.0, weight_cocitation=2.0, weight_direct=1.0)[source]

Build citation network from a text query.

Searches local databases, extracts DOIs from results, then delegates to build_from_dois().

Parameters:
  • query (str) – Search query (e.g. “hippocampal sharp wave ripples”)

  • num_related_per_doi (int) – Related papers per seed DOI

  • search_limit (int) – Max papers to fetch from search

  • weight_coupling (float) – Weight for bibliographic coupling

  • weight_cocitation (float) – Weight for co-citation

  • weight_direct (float) – Weight for direct citations

Return type:

CitationGraph

Returns:

CitationGraph with search-discovered seeds + related papers

export_json(graph, output_path)[source]

Export graph to JSON file for visualization.

Parameters:
  • graph (CitationGraph) – CitationGraph to export

  • output_path (str) – Path to output JSON file

get_paper_summary(doi)[source]

Get summary information for a paper.

Parameters:

doi (str) – DOI of the paper

Return type:

Optional[dict]

Returns:

Dictionary with paper summary

scitex.scholar.plot_citation_graph(graph, backend='auto', output=None, **kwargs)[source]

Visualize a citation graph with pluggable backends.

Parameters:
  • graph (CitationGraph or networkx.DiGraph) – Citation network to visualize. CitationGraph is auto-converted via to_networkx().

  • backend (str) – Rendering backend: ‘auto’, ‘figrecipe’, ‘scitex.plt’, ‘matplotlib’, or ‘pyvis’. Default ‘auto’ picks the best available.

  • output (str, optional) – Output file path. Required for ‘pyvis’ backend (HTML). For static backends, saves the figure to this path.

  • **kwargs – Backend-specific keyword arguments (layout, seed, figsize, etc.).

Returns:

Backend-specific result. Static backends return {'fig', 'ax', 'pos', 'backend'}. Pyvis returns {'output', 'backend'}.

Return type:

dict

scitex.scholar.to_bibtex(paper)[source]

Format a standard paper dict as a BibTeX entry.

Return type:

str

scitex.scholar.to_ris(paper)[source]

Format a standard paper dict as a RIS entry.

Return type:

str

scitex.scholar.to_endnote(paper)[source]

Format a standard paper dict as an EndNote entry.

Return type:

str

scitex.scholar.to_text_citation(paper, style='apa', doc_type='article')[source]

Format a paper dict as a text citation in the given style.

Parameters:
  • paper (dict) – Standard paper dict.

  • style (str) – One of apa, mla, chicago, vancouver.

  • doc_type (str) – One of article, dataset.

Returns:

Formatted citation string.

Return type:

str

scitex.scholar.papers_to_format(papers, fmt)[source]

Format a list of paper dicts to the given format string.

Return type:

str

scitex.scholar.generate_cite_key(paper)[source]

Generate a BibTeX citation key from a paper dict.

Return type:

str

scitex.scholar.make_citation_key(last_name, year=None)[source]

Generate a citation key from author last name and year.

Parameters:
  • last_name (str) – Author last name (special chars stripped).

  • year – Publication year (optional).

Return type:

str

Returns:

Citation key string, e.g. smith2024.

scitex.scholar.from_connected_papers(paper_id, *, cp_api_key=None, s2_api_key=None, output_format='citation_graph', dry_run=False)[source]

Import a Connected Papers graph into scitex.

Parameters:
  • paper_id (str) – Semantic Scholar paper ID (40-char SHA) for the seed paper.

  • cp_api_key (str, optional) – Connected Papers API key.

  • s2_api_key (str, optional) – Semantic Scholar API key for DOI resolution.

  • output_format (str) – “citation_graph” returns CitationGraph, “papers” returns Papers.

  • dry_run (bool) – If True, fetch and report stats without creating objects.

Returns:

{success: True, graph/papers, stats, warnings} or {success: False, error: str}.

Return type:

dict

scitex.scholar.to_connected_papers(graph, *, output=None)[source]

Export a CitationGraph as BibTeX/JSON for Connected Papers.

Parameters:
  • graph (CitationGraph) – Citation graph to export.

  • output (str or Path, optional) – Output directory. Defaults to current directory.

Returns:

{success, bibtex_path, json_path, paper_count} or {success: False, error}.

Return type:

dict

scitex.scholar.apply_filters(papers, filters=None, parsed_operators=None)[source]

Filter a list of paper dicts by various criteria.

Parameters:
  • papers (List[Dict[str, Any]]) – List of paper dicts. Each dict should contain the keys described in the module docstring; missing keys are treated as empty / zero values.

  • filters (Optional[Dict[str, Any]]) –

    Dict of filter criteria extracted from a search form or URL parameters. Supported keys:

    • year_from, year_to – year range (int)

    • min_citations, max_citations – citation range (int)

    • min_impact_factor – minimum IF (float)

    • max_impact_factor – maximum IF (float)

    • authors – list of author name strings (legacy)

    • journal – journal name substring (legacy, str)

    • open_access – bool

    • doc_type"review" | "preprint" | other

    • language – language string ("english" passes)

  • parsed_operators (Optional[Dict[str, Any]]) –

    Dict produced by SearchQueryParser.from_shell_syntax() or the equivalent parse_query_operators() function from scitex-cloud. Supported keys:

    • title_includes, title_excludes – list[str]

    • author_includes, author_excludes – list[str]

    • journal_includes, journal_excludes – list[str]

    • year_min, year_max – int

    • citations_min, citations_max – int

    • impact_factor_min, impact_factor_max – float

Returns:

Filtered list of paper dicts (same objects, not copies).

Return type:

list of dict