aoptk.literature.databases.europepmc
Classes
Class to get PDFs from EuropePMC based on a query. |
Functions
|
Extract the publication ID from the API result, checking for 'pmcid', 'pmid', and 'id' in order. |
Module Contents
- class aoptk.literature.databases.europepmc.EuropePMC(query: str, storage: str, figure_storage: str)[source]
Bases:
aoptk.literature.get_abstract.GetAbstract,aoptk.literature.get_pdf.GetPDF,aoptk.literature.get_id.GetID,aoptk.literature.get_publication.GetPublication,aoptk.literature.get_publication_metadata.GetPublicationMetadataClass to get PDFs from EuropePMC based on a query.
- get_pdfs() list[aoptk.literature.pdf.PDF][source]
Retrieve PDFs based on the query.
- get_abstracts() list[aoptk.literature.abstract.Abstract][source]
Retrieve Abstracts based on the query.
- get_publications() list[aoptk.literature.publication.Publication][source]
Retrieve Publications based on the query.
- get_publications_metadata() list[aoptk.literature.publication_metadata.PublicationMetadata][source]
Retrieve Publication metadata based on the query.
- get_ids() list[aoptk.literature.id.ID][source]
Get a list of publication IDs from EuropePMC based on the query.
- _get_pdf(publication_id: str) aoptk.literature.pdf.PDF | None[source]
Retrieve the PDF for a given publication ID.
- _write_pdf(publication_id: str, response: requests.Response) aoptk.literature.pdf.PDF[source]
Write the PDF content to a file and return a PDF object.
- Parameters:
publication_id (str) – The ID of the publication for which the PDF is being written.
response (requests.Response) – The HTTP response containing the PDF content.
- _get_abstract(publication_id: str) aoptk.literature.abstract.Abstract[source]
Return abstract from Europe PMC for a given publication ID.
- _call_api(cursor_mark: str, result_type: str, query: str) dict[source]
Call the EuropePMC web api to query the search.
- _get_publication_metadata(publication_id: str) aoptk.literature.publication_metadata.PublicationMetadata | None[source]
Return abstract from Europe PMC for a given publication ID.
- Parameters:
publication_id (str) – The ID of the publication to retrieve metadata for.
- _get_publication(publication_id: str) aoptk.literature.publication.Publication | None[source]
Return a Publication object for a given publication ID.
- Parameters:
publication_id (str) – The ID of the publication to retrieve.
- _parse_xml_abstract(root: xml.etree.ElementTree.Element) str[source]
Return the full text content of the first <abstract> element as a single string.
- Parameters:
root (ET.Element) – The root element of the XML tree.
- _parse_xml_full_text(root: xml.etree.ElementTree.Element) str[source]
Parse the XML content to extract the full text.
- Parameters:
root (ET.Element) – The root element of the XML tree.
- _parse_xml_figure_descriptions(root: xml.etree.ElementTree.Element) str[source]
Parse the XML content to extract the figure descriptions.
- Parameters:
root (ET.Element) – The root element of the XML tree.
- _parse_xml_tables(root: xml.etree.ElementTree.Element) list[pandas.DataFrame][source]
Parse the XML content to extract tables as a list of DataFrames, preserving order.
- Parameters:
root (ET.Element) – The root element of the XML tree.
- _extract_rows(table_elem: xml.etree.ElementTree.Element) list[list[str]][source]
Extract rows from a table element, preserving order.
- Parameters:
table_elem (ET.Element) – The XML element representing the table.
- _get_xml(publication_id: str) str | None[source]
Retrieve the XML content for a given publication ID.
- Parameters:
publication_id (str) – The ID of the publication to retrieve XML for.