aoptk.literature.databases.pmc ============================== .. py:module:: aoptk.literature.databases.pmc Classes ------- .. autoapisummary:: aoptk.literature.databases.pmc.PMC Module Contents --------------- .. py:class:: PMC(query: str, storage: str, figure_storage: str) Bases: :py:obj:`aoptk.literature.get_publication.GetPublication`, :py:obj:`aoptk.literature.get_pdf.GetPDF`, :py:obj:`aoptk.literature.get_id.GetID` Class for retrieving and parsing open access PMC publications. .. py:attribute:: aws_region :value: 'us-east-1' .. py:attribute:: s3 .. py:attribute:: bucket :value: 'pmc-oa-opendata' .. py:attribute:: paginator .. py:attribute:: max_pmc_results :value: 9998 .. py:attribute:: max_concurrency :value: 2 .. py:attribute:: max_requests_per_second :value: 2.0 .. py:attribute:: minimal_year_publication :value: 1800 .. py:attribute:: semaphore .. py:attribute:: limiter .. py:attribute:: retries :value: 5 .. py:attribute:: image_extensions :value: ('.jpg', '.jpeg', '.png', '.gif', '.bmp', '.tiff') .. py:attribute:: _query .. py:attribute:: id_list .. py:attribute:: storage .. py:attribute:: figure_storage .. py:method:: get_pdfs() -> list[aoptk.literature.pdf.PDF] Retrieve PDFs based on the query. :returns: A list of PDF objects corresponding to the publications matching the query. :rtype: list[PDF] .. py:method:: get_publications() -> list[aoptk.literature.publication.Publication] Get a list of publications. :returns: A list of Publication objects. :rtype: list[Publication] .. py:method:: get_ids() -> list[aoptk.literature.id.ID] :async: Retrieve a list of publication IDs based on the query. .. py:method:: _get_publication(publication_id: str) -> aoptk.literature.publication.Publication Parse a single PDF and return a Publication object. :param publication_id: The publication ID to retrieve and parse. :type publication_id: str .. py:method:: _get_full_text(publication_id: str) -> str | None Retrieve the full text for a given publication ID. :param publication_id: The publication ID to retrieve the full text for. :type publication_id: str .. py:method:: _get_file(publication_id: str, file_format: str) -> aoptk.literature.pdf.PDF | str | None Retrieve the file for a given publication ID and format. :param publication_id: The publication ID to retrieve the file for. :type publication_id: str :param file_format: The format of the file to retrieve (pdf, xml, json, or txt). :type file_format: str :param Formats txt: :param xml: :param pdf contain full-text: :param while json contains metadata.: .. py:method:: _get_figures(publication_id: str) -> list[str] Retrieve the figure files for a given publication ID. :param publication_id: The publication ID to retrieve the figure files for. :type publication_id: str .. py:method:: _extract_figures_from_supplements(publication_id: str, supplementary_files: list[str]) -> list[str] Extract figure files from the supplementary files. :param publication_id: The publication ID to retrieve the figure files for. :type publication_id: str :param supplementary_files: A list of supplementary file URLs to extract figures from. :type supplementary_files: list[str] .. py:method:: _get_json(publication_id: str) -> str | None Retrieve the json for a given publication ID. :param publication_id: The publication ID to retrieve the json for. :type publication_id: str .. py:method:: _get_pdf(publication_id: str) -> aoptk.literature.pdf.PDF | None Retrieve the PDF for a given publication ID. :param publication_id: The publication ID to retrieve the PDF for. :type publication_id: str .. py:method:: _get_publication_count_and_ids(mindate: str | None = None, maxdate: str | None = None) -> tuple[int, list[str]] .. py:method:: _async_get_publication_count_and_ids(mindate: str | None = None, maxdate: str | None = None) -> tuple[int, list[str]] | None :async: .. py:method:: _collect_ids_for_year(year: int) -> list[str] :async: .. py:method:: _collect_ids_split_by_months_days(year: int) -> list[str] :async: