aoptk.literature.databases.pmc
==============================

.. py:module:: aoptk.literature.databases.pmc


Classes
-------

.. autoapisummary::

   aoptk.literature.databases.pmc.PMC


Module Contents
---------------

.. py:class:: PMC(query: str, storage: str, figure_storage: str)

   Bases: :py:obj:`aoptk.literature.get_publication.GetPublication`, :py:obj:`aoptk.literature.get_pdf.GetPDF`, :py:obj:`aoptk.literature.get_id.GetID`


   Class for retrieving and parsing open access PMC publications.


   .. py:attribute:: aws_region
      :value: 'us-east-1'


   .. py:attribute:: s3


   .. py:attribute:: bucket
      :value: 'pmc-oa-opendata'


   .. py:attribute:: paginator


   .. py:attribute:: max_pmc_results
      :value: 9998


   .. py:attribute:: max_concurrency
      :value: 2


   .. py:attribute:: max_requests_per_second
      :value: 2.0


   .. py:attribute:: minimal_year_publication
      :value: 1800


   .. py:attribute:: semaphore


   .. py:attribute:: limiter


   .. py:attribute:: retries
      :value: 5


   .. py:attribute:: image_extensions
      :value: ('.jpg', '.jpeg', '.png', '.gif', '.bmp', '.tiff')


   .. py:attribute:: _query


   .. py:attribute:: id_list


   .. py:attribute:: storage


   .. py:attribute:: figure_storage


   .. py:method:: get_pdfs() -> list[aoptk.literature.pdf.PDF]

      Retrieve PDFs based on the query.

      :returns: A list of PDF objects corresponding to the publications matching the query.
      :rtype: list[PDF]


   .. py:method:: get_publications() -> list[aoptk.literature.publication.Publication]

      Get a list of publications.

      :returns: A list of Publication objects.
      :rtype: list[Publication]


   .. py:method:: get_ids() -> list[aoptk.literature.id.ID]
      :async:


      Retrieve a list of publication IDs based on the query.


   .. py:method:: _get_publication(publication_id: str) -> aoptk.literature.publication.Publication

      Parse a single PDF and return a Publication object.

      :param publication_id: The publication ID to retrieve and parse.
      :type publication_id: str


   .. py:method:: _get_full_text(publication_id: str) -> str | None

      Retrieve the full text for a given publication ID.

      :param publication_id: The publication ID to retrieve the full text for.
      :type publication_id: str


   .. py:method:: _get_file(publication_id: str, file_format: str) -> aoptk.literature.pdf.PDF | str | None

      Retrieve the file for a given publication ID and format.

      :param publication_id: The publication ID to retrieve the file for.
      :type publication_id: str
      :param file_format: The format of the file to retrieve (pdf, xml, json, or txt).
      :type file_format: str
      :param Formats txt:
      :param xml:
      :param pdf contain full-text:
      :param while json contains metadata.:


   .. py:method:: _get_figures(publication_id: str) -> list[str]

      Retrieve the figure files for a given publication ID.

      :param publication_id: The publication ID to retrieve the figure files for.
      :type publication_id: str


   .. py:method:: _extract_figures_from_supplements(publication_id: str, supplementary_files: list[str]) -> list[str]

      Extract figure files from the supplementary files.

      :param publication_id: The publication ID to retrieve the figure files for.
      :type publication_id: str
      :param supplementary_files: A list of supplementary file URLs to extract figures from.
      :type supplementary_files: list[str]


   .. py:method:: _get_json(publication_id: str) -> str | None

      Retrieve the json for a given publication ID.

      :param publication_id: The publication ID to retrieve the json for.
      :type publication_id: str


   .. py:method:: _get_pdf(publication_id: str) -> aoptk.literature.pdf.PDF | None

      Retrieve the PDF for a given publication ID.

      :param publication_id: The publication ID to retrieve the PDF for.
      :type publication_id: str


   .. py:method:: _get_publication_count_and_ids(mindate: str | None = None, maxdate: str | None = None) -> tuple[int, list[str]]


   .. py:method:: _async_get_publication_count_and_ids(mindate: str | None = None, maxdate: str | None = None) -> tuple[int, list[str]] | None
      :async:


   .. py:method:: _collect_ids_for_year(year: int) -> list[str]
      :async:


   .. py:method:: _collect_ids_split_by_months_days(year: int) -> list[str]
      :async: