aoptk.literature.databases.europepmc ==================================== .. py:module:: aoptk.literature.databases.europepmc Classes ------- .. autoapisummary:: aoptk.literature.databases.europepmc.EuropePMC Functions --------- .. autoapisummary:: aoptk.literature.databases.europepmc._get_publication_id Module Contents --------------- .. py:class:: EuropePMC(query: str, storage: str, figure_storage: str) Bases: :py:obj:`aoptk.literature.get_abstract.GetAbstract`, :py:obj:`aoptk.literature.get_pdf.GetPDF`, :py:obj:`aoptk.literature.get_id.GetID`, :py:obj:`aoptk.literature.get_publication.GetPublication`, :py:obj:`aoptk.literature.get_publication_metadata.GetPublicationMetadata` Class to get PDFs from EuropePMC based on a query. .. py:attribute:: page_size :value: 1000 .. py:attribute:: timeout :value: 10 .. py:attribute:: headers :type: ClassVar .. py:attribute:: image_extensions :value: ('.jpg', '.jpeg', '.png', '.bmp', '.tiff') .. py:attribute:: _query .. py:attribute:: storage .. py:attribute:: figure_storage .. py:attribute:: _session .. py:attribute:: id_list :value: [] .. py:method:: get_pdfs() -> list[aoptk.literature.pdf.PDF] Retrieve PDFs based on the query. .. py:method:: get_abstracts() -> list[aoptk.literature.abstract.Abstract] Retrieve Abstracts based on the query. .. py:method:: get_publications() -> list[aoptk.literature.publication.Publication] Retrieve Publications based on the query. .. py:method:: get_publications_metadata() -> list[aoptk.literature.publication_metadata.PublicationMetadata] Retrieve Publication metadata based on the query. .. py:method:: get_ids() -> list[aoptk.literature.id.ID] Get a list of publication IDs from EuropePMC based on the query. .. py:method:: remove_reviews() -> EuropePMC Modify the query to exclude review articles. .. py:method:: abstracts_only() -> EuropePMC Modify the query to search in the text of abstracts only. .. py:method:: _get_pdf(publication_id: str) -> aoptk.literature.pdf.PDF | None Retrieve the PDF for a given publication ID. :param publication_id: The ID of the publication for which to retrieve the PDF. :type publication_id: str :returns: The PDF object if successful, None otherwise. :rtype: PDF | None .. py:method:: _write_pdf(publication_id: str, response: requests.Response) -> aoptk.literature.pdf.PDF Write the PDF content to a file and return a PDF object. :param publication_id: The ID of the publication for which the PDF is being written. :type publication_id: str :param response: The HTTP response containing the PDF content. :type response: requests.Response .. py:method:: _get_abstract(publication_id: str) -> aoptk.literature.abstract.Abstract Return abstract from Europe PMC for a given publication ID. :param publication_id: The ID of the publication for which to retrieve the abstract. :type publication_id: str :returns: The abstract object if successful, None otherwise. :rtype: Abstract .. py:method:: _call_api(cursor_mark: str, result_type: str, query: str) -> dict Call the EuropePMC web api to query the search. :param cursor_mark: Parameter for pagination. :type cursor_mark: str :param result_type: Whether to search for idlists or core. :type result_type: str :param query: main query to carry out - default self._query :type query: str :returns: JSON response :rtype: dict .. py:method:: _get_publication_metadata(publication_id: str) -> aoptk.literature.publication_metadata.PublicationMetadata | None Return abstract from Europe PMC for a given publication ID. :param publication_id: The ID of the publication to retrieve metadata for. :type publication_id: str .. py:method:: _get_publication(publication_id: str) -> aoptk.literature.publication.Publication | None Return a Publication object for a given publication ID. :param publication_id: The ID of the publication to retrieve. :type publication_id: str .. py:method:: _parse_xml_abstract(root: xml.etree.ElementTree.Element) -> str Return the full text content of the first element as a single string. :param root: The root element of the XML tree. :type root: ET.Element .. py:method:: _parse_xml_full_text(root: xml.etree.ElementTree.Element) -> str Parse the XML content to extract the full text. :param root: The root element of the XML tree. :type root: ET.Element .. py:method:: _parse_xml_figure_descriptions(root: xml.etree.ElementTree.Element) -> str Parse the XML content to extract the figure descriptions. :param root: The root element of the XML tree. :type root: ET.Element .. py:method:: _parse_xml_tables(root: xml.etree.ElementTree.Element) -> list[pandas.DataFrame] Parse the XML content to extract tables as a list of DataFrames, preserving order. :param root: The root element of the XML tree. :type root: ET.Element .. py:method:: _extract_rows(table_elem: xml.etree.ElementTree.Element) -> list[list[str]] Extract rows from a table element, preserving order. :param table_elem: The XML element representing the table. :type table_elem: ET.Element .. py:method:: _get_xml(publication_id: str) -> str | None Retrieve the XML content for a given publication ID. :param publication_id: The ID of the publication to retrieve XML for. :type publication_id: str .. py:method:: _get_figures(publication_id: str) -> list[str] Retrieve the figure file paths for a given publication ID. :param publication_id: The ID of the publication to retrieve figures for. :type publication_id: str .. py:method:: _get_supplementary_zip_path(publication_id: str) -> str | None Download the supplementary files ZIP for a given publication ID and return the path to the ZIP file. :param publication_id: The ID of the publication to retrieve supplementary files for. :type publication_id: str .. py:function:: _get_publication_id(result: dict) -> str | None Extract the publication ID from the API result, checking for 'pmcid', 'pmid', and 'id' in order. Args: result (dict): The API result containing publication information.