aoptk.literature.databases.europepmc
====================================

.. py:module:: aoptk.literature.databases.europepmc


Classes
-------

.. autoapisummary::

   aoptk.literature.databases.europepmc.EuropePMC


Functions
---------

.. autoapisummary::

   aoptk.literature.databases.europepmc._get_publication_id


Module Contents
---------------

.. py:class:: EuropePMC(query: str, storage: str, figure_storage: str)

   Bases: :py:obj:`aoptk.literature.get_abstract.GetAbstract`, :py:obj:`aoptk.literature.get_pdf.GetPDF`, :py:obj:`aoptk.literature.get_id.GetID`, :py:obj:`aoptk.literature.get_publication.GetPublication`, :py:obj:`aoptk.literature.get_publication_metadata.GetPublicationMetadata`


   Class to get PDFs from EuropePMC based on a query.


   .. py:attribute:: page_size
      :value: 1000


   .. py:attribute:: timeout
      :value: 10


   .. py:attribute:: headers
      :type:  ClassVar


   .. py:attribute:: image_extensions
      :value: ('.jpg', '.jpeg', '.png', '.bmp', '.tiff')


   .. py:attribute:: _query


   .. py:attribute:: storage


   .. py:attribute:: figure_storage


   .. py:attribute:: _session


   .. py:attribute:: id_list
      :value: []


   .. py:method:: get_pdfs() -> list[aoptk.literature.pdf.PDF]

      Retrieve PDFs based on the query.


   .. py:method:: get_abstracts() -> list[aoptk.literature.abstract.Abstract]

      Retrieve Abstracts based on the query.


   .. py:method:: get_publications() -> list[aoptk.literature.publication.Publication]

      Retrieve Publications based on the query.


   .. py:method:: get_publications_metadata() -> list[aoptk.literature.publication_metadata.PublicationMetadata]

      Retrieve Publication metadata based on the query.


   .. py:method:: get_ids() -> list[aoptk.literature.id.ID]

      Get a list of publication IDs from EuropePMC based on the query.


   .. py:method:: remove_reviews() -> EuropePMC

      Modify the query to exclude review articles.


   .. py:method:: abstracts_only() -> EuropePMC

      Modify the query to search in the text of abstracts only.


   .. py:method:: _get_pdf(publication_id: str) -> aoptk.literature.pdf.PDF | None

      Retrieve the PDF for a given publication ID.

      :param publication_id: The ID of the publication for which to retrieve the PDF.
      :type publication_id: str

      :returns: The PDF object if successful, None otherwise.
      :rtype: PDF | None


   .. py:method:: _write_pdf(publication_id: str, response: requests.Response) -> aoptk.literature.pdf.PDF

      Write the PDF content to a file and return a PDF object.

      :param publication_id: The ID of the publication for which the PDF is being written.
      :type publication_id: str
      :param response: The HTTP response containing the PDF content.
      :type response: requests.Response


   .. py:method:: _get_abstract(publication_id: str) -> aoptk.literature.abstract.Abstract

      Return abstract from Europe PMC for a given publication ID.

      :param publication_id: The ID of the publication for which to retrieve the abstract.
      :type publication_id: str

      :returns: The abstract object if successful, None otherwise.
      :rtype: Abstract


   .. py:method:: _call_api(cursor_mark: str, result_type: str, query: str) -> dict

      Call the EuropePMC web api to query the search.

      :param cursor_mark: Parameter for pagination.
      :type cursor_mark: str
      :param result_type: Whether to search for idlists or core.
      :type result_type: str
      :param query: main query to carry out - default self._query
      :type query: str

      :returns: JSON response
      :rtype: dict


   .. py:method:: _get_publication_metadata(publication_id: str) -> aoptk.literature.publication_metadata.PublicationMetadata | None

      Return abstract from Europe PMC for a given publication ID.

      :param publication_id: The ID of the publication to retrieve metadata for.
      :type publication_id: str


   .. py:method:: _get_publication(publication_id: str) -> aoptk.literature.publication.Publication | None

      Return a Publication object for a given publication ID.

      :param publication_id: The ID of the publication to retrieve.
      :type publication_id: str


   .. py:method:: _parse_xml_abstract(root: xml.etree.ElementTree.Element) -> str

      Return the full text content of the first <abstract> element as a single string.

      :param root: The root element of the XML tree.
      :type root: ET.Element


   .. py:method:: _parse_xml_full_text(root: xml.etree.ElementTree.Element) -> str

      Parse the XML content to extract the full text.

      :param root: The root element of the XML tree.
      :type root: ET.Element


   .. py:method:: _parse_xml_figure_descriptions(root: xml.etree.ElementTree.Element) -> str

      Parse the XML content to extract the figure descriptions.

      :param root: The root element of the XML tree.
      :type root: ET.Element


   .. py:method:: _parse_xml_tables(root: xml.etree.ElementTree.Element) -> list[pandas.DataFrame]

      Parse the XML content to extract tables as a list of DataFrames, preserving order.

      :param root: The root element of the XML tree.
      :type root: ET.Element


   .. py:method:: _extract_rows(table_elem: xml.etree.ElementTree.Element) -> list[list[str]]

      Extract rows from a table element, preserving order.

      :param table_elem: The XML element representing the table.
      :type table_elem: ET.Element


   .. py:method:: _get_xml(publication_id: str) -> str | None

      Retrieve the XML content for a given publication ID.

      :param publication_id: The ID of the publication to retrieve XML for.
      :type publication_id: str


   .. py:method:: _get_figures(publication_id: str) -> list[str]

      Retrieve the figure file paths for a given publication ID.

      :param publication_id: The ID of the publication to retrieve figures for.
      :type publication_id: str


   .. py:method:: _get_supplementary_zip_path(publication_id: str) -> str | None

      Download the supplementary files ZIP for a given publication ID and return the path to the ZIP file.

      :param publication_id: The ID of the publication to retrieve supplementary files for.
      :type publication_id: str


.. py:function:: _get_publication_id(result: dict) -> str | None

   Extract the publication ID from the API result, checking for 'pmcid', 'pmid', and 'id' in order.

   Args:
   result (dict): The API result containing publication information.