Introduction

Welcome to the Scholarcy APIs. We have two core API servies:

Metadata extraction API at https://api.scholarcy.com This is a developer API that comprises a number of endpoints for extracting machine-readable knowledge as JSON data from documents in many formats. The service is optimised to work with research papers and articles, but should provide useful results for any document in any format.
Synopsis API at https://summarizer.scholarcy.com/ This includes a web front end for testing and a developer endpoint at /summarize

We provide examples in Shell, Ruby, and Python. You can view code examples in the dark area to the right, and you can switch the programming language of the examples with the tabs in the top right.

Authentication

Authentication headers must be sent with every request:


# 1. Metadata API:

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :file => File.new(file_path, 'rb')
          })
response = request.execute
puts(response.body)


# 2. Synopsis API:

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :file => File.new(file_path, 'rb')
          })
response = request.execute
puts(response.body)


# 1. Metadata API:

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

file_path = '/path/to/local/file.pdf'

with open(file_path, 'rb') as file_data:
    file_payload = {'file': file_data}
    r = requests.post(POST_ENDPOINT,
          headers=headers,
          files=file_payload,
          timeout=timeout)
    print(r.json())


# 2. Synopsis API:

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com/'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

file_path = '/path/to/local/file.pdf'

with open(file_path, 'rb') as file_data:
    file_payload = {'file': file_data}
    r = requests.post(POST_ENDPOINT,
          headers=headers,
          files=file_payload,
          timeout=timeout)
    print(r.json())


# 1. Metadata API:

# With shell, you can just pass the correct header with each request
curl "https://api.scholarcy.com/api/posters/generate" \
  -H "Authorization: Bearer abcdefg" \
  -F "file=@/path/to/local/file.pdf"

  curl "https://api.scholarcy.com/api/posters/generate" \
    -H "Authorization: Bearer abcdefg" \
    -d "url=https://www.nature.com/articles/s41746-019-0180-3"


# 2. Synopsis API:

# With shell, you can just pass the correct header with each request
curl "https://summarizer.scholarcy.com/summarize" \
  -H "Authorization: Bearer abcdefg" \
  -F "file=@/path/to/local/file.pdf"

  curl "https://summarizer.scholarcy.com/summarize" \
    -H "Authorization: Bearer abcdefg" \
    -d "url=https://www.nature.com/articles/s41746-019-0180-3"

Make sure to replace abcdefg with your API key.

Each API service (metadata and synopsis) requires a separate key.

The Scholarcy APIs expects the API key to be included in all API requests to the server in a header that looks like the following:

Authorization: Bearer abcdefg

You must replace abcdefg with your partner API key.

Generate a Poster

The API endpoints at https://api.scholarcy.com/api/posters/generate will extract the information needed to populate data into your own poster-creation services, and will also generate a basic Powerpoint template for you to use as a starting point for editing.

POST a local file to generate a poster

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :file => File.new(file_path, 'rb'),
            :type => 'headline',
            :start_page => 24,
            :end_page => 37
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

file_path = '/path/to/local/file.pdf'

params = {'type': 'headline', 'start_page': 24, 'end_page': 37}
with open(file_path, 'rb') as file_data:
    file_payload = {'file': file_data}
    r = requests.post(POST_ENDPOINT,
          headers=headers,
          files=file_payload,
          data=params,
          timeout=timeout)
    print(r.json())

curl "https://api.scholarcy.com/api/posters/generate" \
  -H "Authorization: Bearer abcdefg" \
  -F "file=@/path/to/local/file.pdf" \
  -F "type=headline" \
  -F "start_page=24" \
  -F "end_page=37"

The above command returns JSON structured like this:

{
  "filename": "filename.pdf",
  "content_type": "application/pdf",
  "file_size": 123456,
  "metadata": {
    "title": "Article title",
    "author": "Smith, J.",
    "pages": "15",
    "date": 2021,
    "affiliations": [
      "Department of Silly Walks, University of Life, London, UK"
    ],
    "identifiers": {
      "arxiv": null,
      "doi": "10.1010/101010.10.10.1101010",
      "isbn": null,
      "doc_id": null
    },
    "abstract": "This is a very exciting paper. Please read it.",
    "references": [
      "1. Smith, J. (2001) A study on self-citation. J. Chem. Biol., 123, 456-789.",
      "2. Jones, R. (2015) He didn't write this one. Science, 101, 101010."
    ],
    "emails": ["smith.j@uni.ac.uk"],
    "figure_captions": [
      {
        "id": "1",
        "caption": "Figure 1 caption"
      },
      {
        "id": "2",
        "caption": "Figure 2 caption"
      }
    ],
    "figure_urls": [
      "https://api.scholarcy.com/images/file.pdf_agtuhsnt_images_1x1uj_5t/img-000.png",
      "https://api.scholarcy.com/images/file.pdf_agtuhsnt_images_1x1uj_5t/img-002.png"
    ],
    "poster_url": "https://api.scholarcy.com/posters/file.pdf_agtuhsnt.pptx",
    "keywords": [
      "atomic force microscopy",
      "dna nanostructure",
      "drug release",
      "single-stranded DNA",
      "double-stranded DNA",
      "dox molecule"
    ],
    "abbreviations": {
      "DONs": "DNA origami nanostructures",
      "ROS": "reactive oxygen species",
      "DNase I": "deoxyribonuclease I",
      "PEG": "polyethylene glycol"
    },
    "headline": "We prove some important facts in this paper",
    "highlights": [
      "Facts are very important.",
      "The force is strong in this one.",
      "We ran some tests and this is what we found"
    ],
    "summary": {
      "Introduction": ["Introduction paragraph 1", "Introduction paragraph 2"],
      "Methods": [
        "We mixed some chemicals.",
        "We heated them up.",
        "We distilled the mixture."
      ],
      "Results": [
        "There was a big explosion",
        "But the crystals were pure",
        "We identified a new compound"
      ],
      "Conclusion": [
        "We proved some important things and we summarise them here.",
        "Further work is necessary"
      ]
    }
  }
}

This endpoint generates a poster from a local file. File formats supported are:

PDF
Word (.docx)
Rich Text (.rtf)
Powerpoint (.pptx)
BibTeX (.bib)
RIS (.ris)
XML
HTML
Plain Text (.txt)
LaTeX (.tex)

Please note that when sending a file, at least one additional parameter needs to be sent with the payload, e.g. start_page=1.

HTTP Request

POST https://api.scholarcy.com/api/posters/generate

Query Parameters

Parameter	Default	Description
file	null	A file object.
url	null	URL of public, open-access document.
type	full	The type of poster to generate. `full` will create a large, landscape poster with blocks for each section. `headline` will create a portrait poster containing the main takeaway finding and a single image.
start_page	1	Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page	null	Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.

GET a poster from a URL

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :url => 'https://www.nature.com/articles/s41746-019-0180-3',
            :type => 'full',
            :start_page => 1
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

params = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'type': 'full',
'start_page': 1
}
r = requests.post(POST_ENDPOINT,
      headers=headers,
      data=params,
      timeout=timeout)
print(r.json())

curl "https://api.scholarcy.com/api/posters/generate" \
  -H "Authorization: Bearer abcdefg" \
  -d "url=https://www.nature.com/articles/s41746-019-0180-3" \
  -d "type=full" \
  -d "start_page=1"

The above command returns JSON structured as for the POST endpoint:

{
  "filename": "filename.pdf",
  "content_type": "application/pdf",
  "file_size": 123456,
  "metadata": {
    "title": "Article title",
    "author": "Smith, J.",
    "pages": "15",
    "date": 2021,
    "affiliations": [],
    "identifiers": {
      "arxiv": null,
      "doi": "10.1010/101010.10.10.1101010",
      "isbn": null,
      "doc_id": null
    },
    "abstract": "This is a very exciting paper. Please read it.",
    "references": [],
    "emails": ["smith.j@uni.ac.uk"],
    "figure_captions": [
      {
        "id": "1",
        "caption": "Figure 1 caption"
      },
      {
        "id": "2",
        "caption": "Figure 2 caption"
      }
    ],
    "figure_urls": [],
    "poster_url": "https://api.scholarcy.com/posters/file.pdf_agtuhsnt.pptx",
    "keywords": [],
    "abbreviations": {},
    "headline": "We prove some important facts in this paper",
    "highlights": [],
    "summary": {
      "Introduction": [],
      "Methods": [],
      "Results": [],
      "Conclusion": []
    }
  }
}

This endpoint generates a poster from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint

HTTP Request

GET https://api.scholarcy.com/api/posters/generate

Query Parameters

Parameter	Default	Description
url	null	URL of public, open-access document.
type	full	The type of poster to generate. `full` will create a large, landscape poster with blocks for each section. `headline` will create a portrait poster containing the main takeaway finding and a single image.
start_page	1	Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page	null	Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.

Extract Highlights

The API endpoints at https://api.scholarcy.com/api/highlights/extract will pull out the key findings/highlights of an article and also provide a longer, extractive summary.

POST a local file to extract highlights

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/highlights/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :file => File.new(file_path, 'rb'),
            :start_page => 24,
            :end_page => 37
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/highlights/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

file_path = '/path/to/local/file.pdf'

params = {'wiki_links': True, 'reference_links': True}
with open(file_path, 'rb') as file_data:
    file_payload = {'file': file_data}
    r = requests.post(POST_ENDPOINT,
          headers=headers,
          files=file_payload,
          data=params,
          timeout=timeout)
    print(r.json())

curl "https://api.scholarcy.com/api/highlights/extract" \
  -H "Authorization: Bearer abcdefg" \
  -F "file=@/path/to/local/file.pdf" \
  -F "start_page=24" \
  -F "end_page=37"

The above command returns JSON structured like this:

{
  "filename": "filename.pdf",
  "content_type": "application/pdf",
  "file_size": 123456,
  "metadata": {
    "title": "Article title",
    "author": "Smith, J.",
    "pages": "15",
    "date": 2021,
    "affiliations": [
      "Department of Silly Walks, University of Life, London, UK"
    ],
    "identifiers": {
      "arxiv": null,
      "doi": "10.1010/101010.10.10.1101010",
      "isbn": null,
      "doc_id": null
    },
    "abstract": "This is a very exciting paper. Please read it.",
    "funding": [
      {
        "award-group": [
          {
            "funding-source": "FEDER/COMPETE",
            "award-id": ["AAA/BBB/04007/2019"]
          }
        ],
        "funding-statement": "We acknowledge financial support from Fundação para a Ciência e a Tecnologia and FEDER/COMPETE (grant AAA/BBB/04007/2019)"
      }
    ]
  },
  "keywords": [
    "atomic force microscopy",
    "dna nanostructure",
    "drug release",
    "single-stranded DNA",
    "double-stranded DNA",
    "dox molecule"
  ],
  "keyword_relevance": {
    "atomic force microscopy": 0.345678,
    "dna nanostructure": 0.23456,
    "drug release": 0.12345,
    "single-stranded DNA": 0.034567,
    "double-stranded DNA": 0.02345,
    "dox molecule": 0.01234
  },
  "abbreviations": {
    "DONs": "DNA origami nanostructures",
    "ROS": "reactive oxygen species",
    "DNase I": "deoxyribonuclease I",
    "PEG": "polyethylene glycol"
  },
  "headline": "We prove some important facts in this paper",
  "highlights": [
    "Facts are very important.",
    "The force is strong in this one.",
    "We ran some tests and this is what we found"
  ],
  "findings": [
    "A statistically significant difference was noted between the four groups on the combined dependent variables",
    "We also noted significant differences when we performed a one-way between-groups analysis of variance on each of the 14 items (P < 0.001)"
  ],
  "summary": [],
  "structured_summary": {
    "Introduction": ["Introduction paragraph 1", "Introduction paragraph 2"],
    "Methods": [
      "We mixed some chemicals.",
      "We heated them up.",
      "We distilled the mixture."
    ],
    "Results": [
      "There was a big explosion",
      "But the crystals were pure",
      "We identified a new compound"
    ],
    "Conclusion": [
      "We proved some important things and we summarise them here.",
      "Further work is necessary"
    ]
  }
}

This endpoint extracts highlights from a local file. File formats supported are:

PDF
Word (.docx)
Rich Text (.rtf)
Powerpoint (.pptx)
BibTeX (.bib)
RIS (.ris)
XML
HTML
Plain Text (.txt)
LaTeX (.tex)

Please note that when sending a file, at least one additional parameter needs to be sent with the payload, e.g. wiki_links=true.

HTTP Request

POST https://api.scholarcy.com/api/highlights/extract

Query Parameters

Parameter	Default	Description
file	null	A file object.
url	null	URL of public, open-access document.
text	null	Plain text content to be processed.
start_page	1	Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page	null	Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
external_metadata	false	If true, fetch article metadata from the relevant remote repository (e.g. CrossRef).
wiki_links	false	If true, map extracted key terms to their Wikipedia pages
reference_links	false	If true, parse and link each reference to its full text location
replace_pronouns	false	If true, replace first-person pronouns with third-person mentions (the author(s)?, they).
key_points	5	The number of key points/key takeaway items to extract.
focus_terms	null	Semicolon separated list of terms around which the extracted highlights will focus.
sampling	representative	For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content.
extract_snippets	true	If true, sample snippets from each section, otherwise, sample the full text.
add_background_info	false	If true, generate an introductory sentence. Useful generating an abstract from an article.
add_concluding_info	false	If true, generate an concluding sentence. Useful generating an abstract from an article.
structured_summary	false	If true, summarise each of the main sections separately, and then provide a summary structured according to those sections.
summary_engine	v1	v1: Best for articles. v2: best for book chapters.
highlights_algorithm	weighted	weighted: attend more closely to the results and conclusion. unweighted: attend to all content equally.
headline_from	highlights	highlights: use the highest scoring highlight as the headline. summary: use the first summary sentence as the headline. conclusions: use the first conclusion statement as a headline. claims: use the main claim as the headline.

GET highlights from a URL

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/highlights/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}

request = RestClient::Request.new(
          :method => :get,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :url => 'https://www.nature.com/articles/s41746-019-0180-3',
            :start_page => 1
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/highlights/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

payload = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'start_page': 1
}
r = requests.get(POST_ENDPOINT,
      headers=headers,
      params=payload,
      timeout=timeout)
print(r.json())

curl "https://api.scholarcy.com/api/highlights/extract" \
  -H "Authorization: Bearer abcdefg" \
  -d "url=https://www.nature.com/articles/s41746-019-0180-3" \
  -d "start_page=1"

The above command returns JSON structured as for the POST endpoint:

{
  "filename": "filename.pdf",
  "content_type": "application/pdf",
  "file_size": 123456,
  "metadata": {
    "title": "Article title",
    "author": "Smith, J.",
    "pages": "15",
    "date": 2021,
    "affiliations": [
      "Department of Silly Walks, University of Life, London, UK"
    ],
    "identifiers": {
      "arxiv": null,
      "doi": "10.1010/101010.10.10.1101010",
      "isbn": null,
      "doc_id": null
    },
    "abstract": "This is a very exciting paper. Please read it.",
    "funding": [
      {
        "award-group": [
          {
            "funding-source": "FEDER/COMPETE",
            "award-id": ["AAA/BBB/04007/2019"]
          }
        ],
        "funding-statement": "..."
      }
    ]
  },
  "keywords": [],
  "keyword_relevance": {},
  "abbreviations": {},
  "headline": "We prove some important facts in this paper",
  "highlights": [],
  "findings": [],
  "summary": [],
  "structured_summary": {
    "Introduction": [],
    "Methods": [],
    "Results": [],
    "Conclusion": []
  }
}

This endpoint extracts highlights from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint

HTTP Request

GET https://api.scholarcy.com/api/highlights/extract

Query Parameters

Parameter	Default	Description
url	null	URL of public, open-access document.
text	null	Plain text content to be processed.
start_page	1	Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page	null	Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
external_metadata	false	If true, fetch article metadata from the relevant remote repository (e.g. CrossRef).
wiki_links	false	If true, map extracted key terms to their Wikipedia pages
reference_links	false	If true, parse and link each reference to its full text location
replace_pronouns	false	If true, replace first-person pronouns with third-person mentions (the author(s)?, they).
key_points	5	The number of key points/key takeaway items to extract.
focus_terms	null	Semicolon separated list of terms around which the extracted highlights will focus.
sampling	representative	For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content.
extract_snippets	true	If true, sample snippets from each section, otherwise, sample the full text.
add_background_info	false	If true, generate an introductory sentence. Useful generating an abstract from an article.
add_concluding_info	false	If true, generate an concluding sentence. Useful generating an abstract from an article.
structured_summary	false	If true, summarise each of the main sections separately, and then provide a summary structured according to those sections.
summary_engine	v1	v1: Best for articles. v2: best for book chapters.
highlights_algorithm	weighted	weighted: attend more closely to the results and conclusion. unweighted: attend to all content equally.
headline_from	highlights	highlights: use the highest scoring highlight as the headline. summary: use the first summary sentence as the headline. conclusions: use the first conclusion statement as a headline. claims: use the main claim as the headline.

Extract Structured Content

The API endpoints at https://api.scholarcy.com/api/metadata/extract and /api/metadata/basic will convert a document into structured, machine-readable data in JSON format.

POST a local file to extract content

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/metadata/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :file => File.new(file_path, 'rb'),
            :start_page => 24,
            :end_page => 37
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/metadata/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

file_path = '/path/to/local/file.pdf'

params = {'start_page': 24, 'end_page': 37}
with open(file_path, 'rb') as file_data:
    file_payload = {'file': file_data}
    r = requests.post(POST_ENDPOINT,
          headers=headers,
          files=file_payload,
          data=params,
          timeout=timeout)
    print(r.json())

curl "https://api.scholarcy.com/api/metadata/extract" \
  -H "Authorization: Bearer abcdefg" \
  -F "file=@/path/to/local/file.pdf" \
  -F "start_page=24" \
  -F "end_page=37"

The above command returns JSON structured like this:

{
  "filename": "filename.pdf",
  "content_type": "application/pdf",
  "file_size": 123456,
  "metadata": {
    "title": "Article title",
    "author": "Smith, J.",
    "pages": "15",
    "date": 2021,
    "affiliations": [
      "Department of Silly Walks, University of Life, London, UK"
    ],
    "identifiers": {
      "arxiv": null,
      "doi": "10.1010/101010.10.10.1101010",
      "isbn": null,
      "doc_id": null
    },
    "abstract": "This is a very exciting paper. Please read it.",
    "references": [],
    "emails": ["author@email.com"],
    "funding": [
      {
        "award-group": [
          {
            "funding-source": "FEDER/COMPETE",
            "award-id": ["AAA/BBB/04007/2019"]
          }
        ],
        "funding-statement": "We acknowledge financial support from Fundação para a Ciência e a Tecnologia and FEDER/COMPETE (grant AAA/BBB/04007/2019)"
      }
    ],
    "table_captions": [
      {
        "id": "1",
        "caption": "Sample demographics and characteristics"
      },
      {
        "id": "2",
        "caption": "Construct measurements"
      }
    ],
    "figure_captions": []
  },
  "sections": {
    "introduction": ["Introduction section contents"],
    "methodology": ["Methods section contents"],
    "findings": ["Main results contents"],
    "conclusion": ["Concluding remarks"],
    "limitations": [
      "There are also several limitations to this research. Small sample size was an issue."
    ],
    "acknowledgements": [
      "We'd like to thank our supervisors for support, tea and biscuits."
    ],
    "funding": [
      "The authors acknowledge financial support from Fundação para a Ciência e a Tecnologia and FEDER/COMPETE (grant AAA/BBB/04007/2019)"
    ],
    "future_work": [
      "More research is needed to better understand what is going on."
    ],
    "objectives": [
      "The aim of this research is to provide insights into the inner workings of cellular processes."
    ]
  },
  "structured_content": [
    {
      "heading": "ABSTRACT",
      "content": ["This is a very exciting paper. Please read it."]
    },
    {
      "heading": "INTRODUCTION",
      "content": ["Introduction paragraph 1", "Introduction paragraph 2"]
    },
    {
      "heading": "RESEARCH METHODOLOGY",
      "content": ["Methods paragraph 1", "Methods paragraph 2"]
    },
    {
      "heading": "FINDINGS AND DISCUSSION",
      "content": ["Results paragraph 1", "Results paragraph 2"]
    },
    {
      "heading": "CONCLUSION",
      "content": ["Conclusion paragraph 1", "Conclusion paragraph 2"]
    }
  ],
  "participants": [
    {
      "participant": "Patients",
      "number": 15,
      "context": "Fifteen patients participated in the study."
    }
  ],
  "statistics": [
    {
      "tests": {
        "context": "We performed exploratory factor analysis using SPSS 20",
        "tests": [
          {
            "test": "exploratory factor analysis"
          }
        ]
      }
    },
    {
      "tests": {
        "context": "We performed confirmatory factor analyses with AMOS 20 using the maximum likelihood estimation method",
        "tests": [
          {
            "test": "confirmatory factor analyses"
          },
          {
            "test": "maximum likelihood estimation method"
          }
        ]
      }
    },
    {
      "p_value": "P < 0.001",
      "context": "We also noted significant differences when we performed a one-way between-groups analysis of variance on each of the 14 items (P < 0.001)</mark>",
      "tests": {
        "tests": [
          {
            "test": "analysis of variance",
            "value": "P < 0.001"
          }
        ]
      }
    }
  ],
  "keywords": [
    "atomic force microscopy",
    "dna nanostructure",
    "drug release",
    "single-stranded DNA",
    "double-stranded DNA",
    "dox molecule"
  ],
  "keyword_relevance": {
    "atomic force microscopy": 0.345678,
    "dna nanostructure": 0.23456,
    "drug release": 0.12345,
    "single-stranded DNA": 0.034567,
    "double-stranded DNA": 0.02345,
    "dox molecule": 0.01234
  },
  "abbreviations": {
    "DONs": "DNA origami nanostructures",
    "ROS": "reactive oxygen species",
    "DNase I": "deoxyribonuclease I",
    "PEG": "polyethylene glycol"
  },
  "headline": "We prove some important facts in this paper",
  "top_statements": [
    "Facts are very important.",
    "The force is strong in this one.",
    "We ran some tests and this is what we found"
  ],
  "findings": [
    "A statistically significant difference was noted between the four groups on the combined dependent variables",
    "We also noted significant differences when we performed a one-way between-groups analysis of variance on each of the 14 items (P < 0.001)"
  ],
  "facts": [],
  "claims": [],
  "summary": [],
  "structured_summary": {
    "Introduction": ["Introduction paragraph 1", "Introduction paragraph 2"],
    "Methods": [
      "We mixed some chemicals.",
      "We heated them up.",
      "We distilled the mixture."
    ],
    "Results": [
      "There was a big explosion",
      "But the crystals were pure",
      "We identified a new compound"
    ],
    "Conclusion": [
      "We proved some important things and we summarise them here.",
      "Further work is necessary"
    ]
  }
}

This endpoint extracts structured content from a local file. File formats supported are:

PDF
Word (.docx)
Rich Text (.rtf)
Powerpoint (.pptx)
BibTeX (.bib)
RIS (.ris)
XML
HTML
Plain Text (.txt)
LaTeX (.tex)

Please note that when sending a file, at least one additional parameter needs to be sent with the payload, e.g. engine=v1.

HTTP Request

POST https://api.scholarcy.com/api/metadata/extract

Query Parameters

Parameter	Default	Description
file	null	A file object.
url	null	URL of public, open-access document.
text	null	Plain text content to be processed.
start_page	1	Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page	null	Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
external_metadata	false	If true, fetch article metadata from the relevant remote repository (e.g. CrossRef).
parse_references	false	If true, parse into BibTeX and link each reference to its full text location.
reference_style	ensemble	Referencing style used by the document, if known, or use the default. Available values : acs, ama, anystyle, apa, chicago, ensemble, experimental, harvard, ieee, mhra, mla, nature, vancouver.
reference_format	text	Output references in plain text or bibtex format.
generate_summary	true	Create an extractive summary of the article.
summary_engine	v1	v1: Best for articles. v2: best for book chapters.
replace_pronouns	false	If true, replace first-person pronouns in summary with third-person mentions (the author(s)?, they).
strip_dialogue	false	If true, remove dialog and quoted text from input prior to summarising.
summary_size	400	Length of summary in words.
summary_percent	0	Length of summary as a % of the original article.
structured_summary	false	If true, summarise each of the main sections separately, and then provide a summary structured according to those sections.
keyword_method	sgrank+acr	Available values : sgrank, sgrank+np, sgrank+acr, textrank, np, regex.
keyword_sample	representative	For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content.
keyword_limit	25	Target number of key terms to extract.
abbreviation_method	schwartz	Select an abbreviation extraction method. Available values: schwartz, statistical, ensemble.
wiki_links	false	If true, map extracted key terms to their Wikipedia pages.
extract_facts	true	Extract SVO-style factual statements from the article.
extract_claims	true	Extract specific claims made by the article.
key_points	5	The number of key points/key takeaway items to extract.
focus_terms	null	Semicolon separated list of terms around which the extracted highlights will focus.
citation_contexts	false	If true, extract the inline citation contexts (preceding and current sentences).
inline_citation_links	false	If true, link inline citations to their identifiers in the references.
extract_pico	true	Extract population, intervention, control, outcome data.
extract_tables	false	If true, extract tabular data as CSV/Excel files.
extract_figures	false	If true, extract figures and images as PNG files.
require_captions	true	Requires an accompanying caption to trigger figure/table extraction.
extract_sections	true	Extracts section headers and paragraphs.
include_bodytext	true	If extracting sections, includes the main body text content for each section.
unstructured_content	false	If true, include a raw, unstructured text dump of the file.
extract_snippets	true	If true, sample snippets from each section, otherwise, sample the full text.
engine	v1	PDF text extraction engine. v1: best general purpose. v2: best for articles containing marginal line numbering or narrow column gutters.
image_engine	v1	Image extraction engine. v1: best for bitmap images. v2: best for line images. Available values : v1, v2, v1+v2.

Output fields

Field	Description
filename	The filename of the uploaded document, or input URL slug
content_type	The file or URL MIME type
metadata	Structured article metadata
message	Any error or status messages
title	Article title
author	List of authors
pages	Number of pages in the document
date	Article year
full_date	Article date string in ISO 8601 format (where available)
affiliations	Author affiliations
journal	Journal title (from CrossRef)
abbreviated_journal	Abbreviated journal title (where available)
volume	Journal volume (from CrossRef)
page	Journal page range (from CrossRef)
cited_by	Citation count (from CrossRef)
identifiers	Any identifier extracted from the document, such as DOI, ISBN, arXiv ID, or other identifier. If an open-access version of the paper is available, the URL to that version will be displayed here.
abstract	The author-written abstract, if available, or a proxy for the abstract, such as background, introduction, preface etc.
keywords	Author-supplied keywords
references	The plain reference strings extracted from the end of the article, or from the footnotes
emails	Email addresses of the authors
type	Article type: journal-article, book-chapter, preprint, web-page, review-article, case-study, report
references_ris	RIS parse of the references
links	Any URLs identified in the document
author_conclusions	Author-stated conclusions/takeaways
funding	Funding statement structured as follows: `"award-group": [{"funding-source": "National Institutes of Health", "award-id": ["R43HL137469"] }]`
table_captions	Table captions
figure_captions	Figure captions
tables_url	Link to download the tables as Excel
figure_urls	List of links to download extracted images as PNG files
word_count	A range representing maximum and minimum estimated word count. The maximum includes appendices and supplementary information. The minimum includes the core article body text. Both exclude references and footnotes.
is_oa	Boolean flag if the document is open access or not. This flag is only present if the input is a DOI URL, e.g. https://doi.org/10.1177/0846537120913497
oa_status	Open access status: closed, bronze, green, or gold. This flag is only present if the input is a DOI URL, e.g. https://doi.org/10.1177/0846537120913497
sections	Snippets from each main section in the article
introduction, methods, results, conclusion	If section headings can be mapped to standard names such as Introduction, Methods, Results, Conclusions, these snippets are shown here
funding	Any funding statements
disclosures	Any disclosures of conflicts of interest
ethical_compliance	Any information about consent and ethical regulations
data_availability	Any information about data and code availability related to this study
limitations	Any discussion of study limitations
future_work	Any information about further research needed and future work
registrations	Any study registration identifiers
structured_content	The section headings as they appear in the source document, along with their full section content.
participants	Quantifiable information about the study subjects
statistics	Information about statistical tests and analysis performed in the study
populations	Quantifiable information about the population background
keywords	A combination of the author-supplied keywords, plus new keywords or key terms extracted from the document
keyword_relevance	keywords ranked by their relevance scores
species	Any Latin species names detected
summary	An extractive summary of the main points of the entire article.
structured_summary	An extractive summary structured according to the main sections of the article.
reference_links	Shown if reference parsing has been enabled. This contains links to the full text for each of the references in the paper
facts	Subject-predicate-object statements expressed in the article
claims	Claims made by the authors of the study
findings	Any important, quantitative findings extracted from the document, such as statistically significant results
key_statements	A longer set of important sentences, from which the `top_statements` are selected.
top_statements	The top 3-7 key points in the document. Typically, these highlights will include introductory and concluding information, as well as the main claims and findings of the article
headline	A short, one line summary of the entire article. This headline attempts to express the main finding or main result of the paper.
abbreviations	Abbreviations and their fully spelt out names, extracted from the document

GET structured content from a URL

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/metadata/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}

request = RestClient::Request.new(
          :method => :get,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :url => 'https://www.nature.com/articles/s41746-019-0180-3',
            :start_page => 1
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/metadata/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

payload = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'start_page': 1
}
r = requests.get(POST_ENDPOINT,
      headers=headers,
      params=payload,
      timeout=timeout)
print(r.json())

curl "https://api.scholarcy.com/api/metadata/extract" \
  -H "Authorization: Bearer abcdefg" \
  -d "url=https://www.nature.com/articles/s41746-019-0180-3" \
  -d "start_page=1"

The above command returns JSON structured as for the POST endpoint:

{
  "filename": "filename.pdf",
  "content_type": "application/pdf",
  "file_size": 123456,
  "metadata": {
    "title": "Article title",
    "author": "Smith, J.",
    "pages": "15",
    "date": 2021,
    "affiliations": [
      "Department of Silly Walks, University of Life, London, UK"
    ],
    "identifiers": {
      "arxiv": null,
      "doi": "10.1010/101010.10.10.1101010",
      "isbn": null,
      "doc_id": null
    },
    "abstract": "This is a very exciting paper. Please read it.",
    "references": [],
    "emails": ["author@email.com"],
    "funding": [],
    "table_captions": [],
    "figure_captions": []
  },
  "sections": {
    "introduction": [],
    "methodology": [],
    "findings": [],
    "conclusion": [],
    "limitations": [],
    "acknowledgements": [],
    "funding": [],
    "future_work": [],
    "objectives": []
  },
  "structured_content": [],
  "participants": [],
  "statistics": [],
  "keywords": [],
  "keyword_relevance": {},
  "abbreviations": {},
  "headline": "We prove some important facts in this paper",
  "top_statements": [],
  "findings": [],
  "facts": [],
  "claims": [],
  "summary": [],
  "structured_summary": {}
}

This endpoint extracts structured content from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint

HTTP Request

GET https://api.scholarcy.com/api/metadata/extract

Query Parameters

Parameter	Default	Description
url	null	URL of public, open-access document.
text	null	Plain text content to be processed.
start_page	1	Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page	null	Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
external_metadata	false	If true, fetch article metadata from the relevant remote repository (e.g. CrossRef).
parse_references	false	If true, parse into BibTeX and link each reference to its full text location.
reference_style	ensemble	Referencing style used by the document, if known, or use the default. Available values : acs, ama, anystyle, apa, chicago, ensemble, experimental, harvard, ieee, mhra, mla, nature, vancouver.
reference_format	text	Output references in plain text or bibtex format.
generate_summary	true	Create an extractive summary of the article.
summary_engine	v1	v1: Best for articles. v2: best for book chapters.
replace_pronouns	false	If true, replace first-person pronouns in summary with third-person mentions (the author(s)?, they).
strip_dialogue	false	If true, remove dialog and quoted text from input prior to summarising.
summary_size	400	Length of summary in words.
summary_percent	0	Length of summary as a % of the original article.
structured_summary	false	If true, summarise each of the main sections separately, and then provide a summary structured according to those sections.
keyword_method	sgrank+acr	Available values : sgrank, sgrank+np, sgrank+acr, textrank, np, regex.
keyword_sample	representative	For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content.
keyword_limit	25	Target number of key terms to extract.
abbreviation_method	schwartz	Select an abbreviation extraction method. Available values: schwartz, statistical, ensemble.
wiki_links	false	If true, map extracted key terms to their Wikipedia pages.
extract_facts	true	Extract SVO-style factual statements from the article.
extract_claims	true	Extract specific claims made by the article.
key_points	5	The number of key points/key takeaway items to extract.
focus_terms	null	Semicolon separated list of terms around which the extracted highlights will focus.
citation_contexts	false	If true, extract the inline citation contexts (preceding and current sentences).
inline_citation_links	false	If true, link inline citations to their identifiers in the references.
extract_pico	true	Extract population, intervention, control, outcome data.
extract_tables	false	If true, extract tabular data as CSV/Excel files.
extract_figures	false	If true, extract figures and images as PNG files.
require_captions	true	Requires an accompanying caption to trigger figure/table extraction.
extract_sections	true	Extracts section headers and paragraphs.
include_bodytext	true	If extracting sections, includes the main body text content for each section.
unstructured_content	false	If true, include a raw, unstructured text dump of the file.
extract_snippets	true	If true, sample snippets from each section, otherwise, sample the full text.
engine	v1	PDF text extraction engine. v1: best general purpose. v2: best for articles containing marginal line numbering or narrow column gutters.
image_engine	v1	Image extraction engine. v1: best for bitmap images. v2: best for line images. Available values : v1, v2, v1+v2.

Extract Key Terms

The API endpoints at https://api.scholarcy.com/api/keywords/extract will pull out the key terms from an article.

POST a local file to extract key terms

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/keywords/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :file => File.new(file_path, 'rb'),
            :start_page => 24,
            :end_page => 37
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/keywords/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

file_path = '/path/to/local/file.pdf'

params = {'start_page': 24, 'end_page': 37}
with open(file_path, 'rb') as file_data:
    file_payload = {'file': file_data}
    r = requests.post(POST_ENDPOINT,
          headers=headers,
          files=file_payload,
          data=params,
          timeout=timeout)
    print(r.json())

curl "https://api.scholarcy.com/api/keywords/extract" \
  -H "Authorization: Bearer abcdefg" \
  -F "file=@/path/to/local/file.pdf" \
  -F "start_page=24" \
  -F "end_page=37"

The above command returns JSON structured like this:

{
  "filename": "article2016.pdf",
  "abbreviations": {
    "U&G": "uses and gratification",
    "SNS": "social networking sites",
    "COBRA": "Consumer Online Brand-Related Activity",
    "AVE": "average variance extracted",
    "CLF": "common latent factor"
  },
  "keywords": [
    {
      "term": "Facebook",
      "url": "https://en.wikipedia.org/wiki/Facebook"
    },
    {
      "term": "typology",
      "url": "https://en.wikipedia.org/wiki/typology"
    },
    {
      "term": "social media",
      "url": "https://en.wikipedia.org/wiki/social_media"
    },
    {
      "term": "consumer behavior",
      "url": "https://en.wikipedia.org/wiki/consumer_behavior"
    },
    {
      "term": "average variance extracted",
      "url": "https://en.wikipedia.org/wiki/average_variance_extracted"
    },
    {
      "term": "online branding",
      "url": "https://en.wikipedia.org/wiki/online_branding"
    },
    {
      "term": "cluster analysis",
      "url": "https://en.wikipedia.org/wiki/cluster_analysis"
    },
    {
      "term": "social networking sites",
      "url": "https://en.wikipedia.org/wiki/social_networking_sites"
    },
    {
      "term": "brand manager",
      "url": "https://en.wikipedia.org/wiki/brand_manager"
    }
  ],
  "keyword_relevance": {
    "Facebook": 0.3834808259587021,
    "social media": 0.16224188790560473,
    "social networking sites": 0.08259587020648967,
    "brand interaction": 0.07964601769911504,
    "typology": 0.038348082595870206,
    "brand manager": 0.038348082595870206,
    "uses and gratification": 0.035398230088495575,
    "consumer interaction": 0.032448377581120944,
    "consumer behavior": 0.02359882005899705,
    "brand communication": 0.02064896755162242,
    "cluster analysis": 0.017699115044247787,
    "main motivation": 0.017699115044247787,
    "Consumer Online Brand-Related Activity": 0.014749262536873156,
    "average variance extracted": 0.011799410029498525,
    "common latent factor": 0.008849557522123894,
    "online branding": 0.008849557522123894
  }
}

The above command can also returns CSV structured like this:

"filename","key term","wikipedia_link"
"article2016.pdf","Facebook","https://en.wikipedia.org/wiki/Facebook"
"article2016.pdf","typology","https://en.wikipedia.org/wiki/typology"
"article2016.pdf","social media","https://en.wikipedia.org/wiki/social_media"
"article2016.pdf","consumer behavior","https://en.wikipedia.org/wiki/consumer_behavior"
"article2016.pdf","average variance extracted","https://en.wikipedia.org/wiki/average_variance_extracted"
"article2016.pdf","online branding","https://en.wikipedia.org/wiki/online_branding"
"article2016.pdf","cluster analysis","https://en.wikipedia.org/wiki/cluster_analysis"
"article2016.pdf","social networking sites","https://en.wikipedia.org/wiki/social_networking_sites"
"article2016.pdf","brand manager","https://en.wikipedia.org/wiki/brand_manager"

This endpoint extracts key terms from a local file. File formats supported are:

PDF
Word (.docx)
Rich Text (.rtf)
Powerpoint (.pptx)
BibTeX (.bib)
RIS (.ris)
XML
HTML
Plain Text (.txt)
LaTeX (.tex)

Please note that when sending a file, at least one additional parameter needs to be sent with the payload, e.g. output_format=csv.

HTTP Request

POST https://api.scholarcy.com/api/keywords/extract

Query Parameters

Parameter	Default	Description
file	null	A file object.
url	null	URL of public, open-access document.
text	null	Plain text content to be processed.
start_page	1	Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page	null	Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
external_metadata	false	If true, fetch article metadata from the relevant remote repository (e.g. CrossRef).
wiki_links	false	If true, map extracted key terms to their Wikipedia pages
sampling	representative	For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content.
extract_snippets	true	If true, sample snippets from each section, otherwise, sample the full text.
output_format	json	json or CSV.

GET key terms from a URL

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/keywords/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}

request = RestClient::Request.new(
          :method => :get,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :url => 'https://www.nature.com/articles/s41746-019-0180-3',
            :start_page => 1
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/keywords/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

payload = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'start_page': 1
}
r = requests.get(POST_ENDPOINT,
      headers=headers,
      params=payload,
      timeout=timeout)
print(r.json())

curl "https://api.scholarcy.com/api/keywords/extract" \
  -H "Authorization: Bearer abcdefg" \
  -d "url=https://www.nature.com/articles/s41746-019-0180-3" \
  -d "start_page=1"

The above command returns JSON structured as for the POST endpoint:

The above command returns JSON structured like this:

{
  "filename": "article2016.pdf",
  "abbreviations": {
    "U&G": "uses and gratification",
    "SNS": "social networking sites",
    "COBRA": "Consumer Online Brand-Related Activity",
    "AVE": "average variance extracted",
    "CLF": "common latent factor"
  },
  "keywords": [
    {
      "term": "Facebook",
      "url": "https://en.wikipedia.org/wiki/Facebook"
    },
    {
      "term": "typology",
      "url": "https://en.wikipedia.org/wiki/typology"
    },
    {
      "term": "social media",
      "url": "https://en.wikipedia.org/wiki/social_media"
    },
    {
      "term": "consumer behavior",
      "url": "https://en.wikipedia.org/wiki/consumer_behavior"
    },
    {
      "term": "average variance extracted",
      "url": "https://en.wikipedia.org/wiki/average_variance_extracted"
    },
    {
      "term": "online branding",
      "url": "https://en.wikipedia.org/wiki/online_branding"
    },
    {
      "term": "cluster analysis",
      "url": "https://en.wikipedia.org/wiki/cluster_analysis"
    },
    {
      "term": "social networking sites",
      "url": "https://en.wikipedia.org/wiki/social_networking_sites"
    },
    {
      "term": "brand manager",
      "url": "https://en.wikipedia.org/wiki/brand_manager"
    }
  ],
  "keyword_relevance": {
    "Facebook": 0.3834808259587021,
    "social media": 0.16224188790560473,
    "social networking sites": 0.08259587020648967,
    "brand interaction": 0.07964601769911504,
    "typology": 0.038348082595870206,
    "brand manager": 0.038348082595870206,
    "uses and gratification": 0.035398230088495575,
    "consumer interaction": 0.032448377581120944,
    "consumer behavior": 0.02359882005899705,
    "brand communication": 0.02064896755162242,
    "cluster analysis": 0.017699115044247787,
    "main motivation": 0.017699115044247787,
    "Consumer Online Brand-Related Activity": 0.014749262536873156,
    "average variance extracted": 0.011799410029498525,
    "common latent factor": 0.008849557522123894,
    "online branding": 0.008849557522123894
  }
}

The above command can also returns CSV structured like this:

"filename","key term","wikipedia_link"
"article2016.pdf","Facebook","https://en.wikipedia.org/wiki/Facebook"
"article2016.pdf","typology","https://en.wikipedia.org/wiki/typology"
"article2016.pdf","social media","https://en.wikipedia.org/wiki/social_media"
"article2016.pdf","consumer behavior","https://en.wikipedia.org/wiki/consumer_behavior"
"article2016.pdf","average variance extracted","https://en.wikipedia.org/wiki/average_variance_extracted"
"article2016.pdf","online branding","https://en.wikipedia.org/wiki/online_branding"
"article2016.pdf","cluster analysis","https://en.wikipedia.org/wiki/cluster_analysis"
"article2016.pdf","social networking sites","https://en.wikipedia.org/wiki/social_networking_sites"
"article2016.pdf","brand manager","https://en.wikipedia.org/wiki/brand_manager"

This endpoint extracts key terms from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint

HTTP Request

GET https://api.scholarcy.com/api/keywords/extract

Query Parameters

Parameter	Default	Description
url	null	URL of public, open-access document.
text	null	Plain text content to be processed.
start_page	1	Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page	null	Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
external_metadata	false	If true, fetch article metadata from the relevant remote repository (e.g. CrossRef).
wiki_links	false	If true, map extracted key terms to their Wikipedia pages
sampling	representative	For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content.
extract_snippets	true	If true, sample snippets from each section, otherwise, sample the full text.
output_format	json	json or CSV.

Generate a Synopsis

The API endpoints at https://summarizer.scholarcy.com/summarize will generate a short, abstractive synopsis (70-100 words) or a mini-review (around 150-300 words), depending on the parameters chosen.

By default, output is in JSON format.

Alternatively, you can receive output in HTML format if you pass an Accept: text/html header with your request.

POST a local file to generate a synopsis

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :file => File.new(file_path, 'rb'),
            :wiki_links => true,
            :format_summary => true
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

file_path = '/path/to/local/file.pdf'

params = {'wiki_links': True, 'format_summary': True}
with open(file_path, 'rb') as file_data:
    file_payload = {'file': file_data}
    r = requests.post(POST_ENDPOINT,
          headers=headers,
          files=file_payload,
          data=params,
          timeout=timeout)
    print(r.json())

curl "https://summarizer.scholarcy.com/summarize" \
  -H "Authorization: Bearer abcdefg" \
  -F "file=@/path/to/local/file.pdf" \
  -F "wiki_links=true" \
  -F "format_summary=true"

The above command returns JSON structured like this:

{
  "response": {
    "abbreviations": {
      "EPOSA": "European Project on OSteoArthritis",
      "GPS": "Global Positioning System",
      "ISD": "Integrated Surface Database",
      "OR": "odds ratio"
    },
    "headline": "Researchers have used smartphone data to investigate the relationship between pain and weather conditions, and found that there is a small but significant relationship.",
    "keywords": [
      {
        "term": "physical activity",
        "url": "https://en.wikipedia.org/wiki/physical_activity"
      },
      {
        "term": "osteoarthritis",
        "url": "https://en.wikipedia.org/wiki/osteoarthritis"
      },
      {
        "term": "atmospheric pressure",
        "url": "https://en.wikipedia.org/wiki/atmospheric_pressure"
      },
      {
        "term": "rheumatoid arthritis",
        "url": "https://en.wikipedia.org/wiki/rheumatoid_arthritis"
      },
      {
        "term": "Global Positioning System",
        "url": "https://en.wikipedia.org/wiki/Global_Positioning_System"
      },
      {
        "term": "fibromyalgia",
        "url": "https://en.wikipedia.org/wiki/fibromyalgia"
      },
      {
        "term": "arthritis",
        "url": "https://en.wikipedia.org/wiki/arthritis"
      },
      {
        "term": "smartphone app",
        "url": "https://en.wikipedia.org/wiki/smartphone_app"
      },
      {
        "term": "relative humidity",
        "url": "https://en.wikipedia.org/wiki/relative_humidity"
      },
      {
        "term": "chronic pain",
        "url": "https://en.wikipedia.org/wiki/chronic_pain"
      },
      {
        "term": "odds ratio",
        "url": "https://en.wikipedia.org/wiki/odds_ratio"
      },
      {
        "term": "Parkinson disease",
        "url": "https://en.wikipedia.org/wiki/Parkinson_disease"
      },
      {
        "term": "joint pain",
        "url": "https://en.wikipedia.org/wiki/joint_pain"
      },
      {
        "term": "wind speed",
        "url": "https://en.wikipedia.org/wiki/wind_speed"
      },
      {
        "term": "cohort study",
        "url": "https://en.wikipedia.org/wiki/cohort_study"
      }
    ],
    "message": "",
    "metadata": {
      "citation": "William G. Dixon, Anna L. Beukenhorst, Belay B. Yimer, Louise Cook, Antonio Gasparrini, Tal El-Hay, Bruce Hellman, Ben James, Ana M. Vicedo-Cabrera, Malcolm Maclure, Ricardo Silva, John Ainsworth, Huai Leng Pisaniello, Thomas House, Mark Lunt, Carolyn Gamble, Caroline Sanders, David M. Schultz, Jamie C. Sergeant, John McBeth (2019). How the weather affects the pain of citizen scientists using a smartphone app. npj Digital Medicine 2. https://www.nature.com/articles/s41746-019-0180-3",
      "citation_affiliation": "",
      "citation_author": "William G. Dixon et al.",
      "citation_date": 2019,
      "citation_title": "How the weather affects the pain of citizen scientists using a smartphone app",
      "citation_url": "https://www.nature.com/articles/s41746-019-0180-3"
    },
    "readership_level": "technical-readership-accurate",
    "summary": "<a class=\"has-tooltip\" title=\"Read the article\" target=\"_blank\" href=\"https://www.nature.com/articles/s41746-019-0180-3\">William Dixon et al. (2019)</a> studied how the weather affects the pain of citizen scientists using a smartphone app. Weather has been thought to affect symptoms in patients with chronic disease since the time of Hippocrates over 2000 years ago.\nMultivariable case-crossover analysis including the four state weather variables demonstrated that an increase in relative humidity was associated with a higher odds of a pain event with an OR of 1.139 (95% confidence interval 1.099\u20131.181) per 10 percentage point increase.\nThis study has demonstrated that higher relative humidity and wind speed, and lower atmospheric pressure, were associated with increased pain severity in people with long-term pain conditions.\nThe \u2018worst\u2019 combination of weather variables would increase the odds of a pain event by just over 20% compared to an average day.<br/><br/>There were 2658 patients involved in the research. Discussing potential improvements, \u201cThere are potential limitations to this study.\nIt is possible only people with a strong belief in a weather\u2013pain relationship participated.\nRain and cold weather were the most common pre-existing beliefs, authors say,\u201d they admit. ",
    "title": "How the weather affects the pain of citizen scientists using a smartphone app"
  }
}

This endpoint generates a synopsis from a local file. File formats supported are:

PDF (.pdf)
Word (.docx)
Rich Text (.rtf)
Powerpoint (.pptx)
BibTeX (.bib)
RIS (.ris)
XML (.xml)
HTML (.html or .htm)
Plain Text (.txt)
LaTeX (.tex)

HTTP Request

POST https://summarizer.scholarcy.com/summarize

Query Parameters

Parameter	Default	Description
file	null	A file object.
url	null	URL of public, open-access document. Can be a DOI but must be qualified with a resolver domain, e.g. https://doi.org/10.1177/0846537120913497
input_text	null	You can pass a text string directly to the endpoint, instead of uploading a file or passing a URL.
structured_summary	false	Take the document structure into account, considering specific sections such as Introduction, Background, Methods, Results, Discussion, Conclusion.
summary_type	combined	Level of detail of summary: `overview`: abstractive synopsis of Scholarcy highlights. `detail`: abstractive synopsis of Scholarcy summary. `combined`: union of overview and detail. `merged`: an abstractive synopsis of the union of Scholarcy highlights and Scholarcy summary.
focus_level	4	This internal hyperparameter controls whether the summary takes a narrow focus on a specific fact or a wider focus on multiple facts within the source. `4`: wide focus. `3`: medium focus. `2`: narrow focus. `1`: narrowest focus
readership_level	technical-readership- accurate	This controls the level of language complexity and amount of paraphrasing in the output. `technical-readership-accurate`: output is for a technical/academic reader with a high level of factual accuracy in relation to the source text. `technical-readership-fast`: output is for a technical/academic reader and provides a little more paraphrasing, which may result in a slight loss in accuracy. However, it is 2x faster than `technical-readership-accurate`. `lay-readership-accurate`: output is for a lay/non-expert reader, with moderate paraphrasing and good level of accuracy in relation to the source text. `lay-readership-fast`: output is for a lay/non- expert reader, with much paraphrasing and reasonable level of accuracy in relation to the source text. However, it is 2x faster than `lay-readership-accurate`.
wiki_links	false	Map extracted key terms to Wikipedia entries.
format_summary	false	Format the summary so it can be more easily used as part of a referenced report: 1) Personal pronouns referring to the authors are replaced with the author names. 2) The summary is correctly cited with author and date. 3) A formatted reference to the source is generated
headline_type	verbatim	Determines how the headline is generated. `verbatim` (default): uses the main finding extracted directly from the paper. The other options are as for `readership_level`, i.e. `technical-readership-accurate`, `technical-readership-fast`, `lay-readership-accurate` and `lay-readership-fast`. If `format_summary` is `true`, then `headline_type` defaults to `lay-readership-accurate` unless otherwise specified.

GET a synopsis from a URL

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}

request = RestClient::Request.new(
          :method => :get,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :url => 'https://www.nature.com/articles/s41746-019-0180-3',
            :wiki_links => true,
            :format_summary => true
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

payload = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'wiki_links': True,
'format_summary': True
}
r = requests.get(POST_ENDPOINT,
      headers=headers,
      params=payload,
      timeout=timeout)
print(r.json())

curl "https://summarizer.scholarcy.com/summarize" \
  -H "Authorization: Bearer abcdefg" \
  -d "url=https://www.nature.com/articles/s41746-019-0180-3" \
  -d "wiki_links=true" \
  -d "format_summary=true"

The above command returns JSON structured as for the POST endpoint:

{
  "response": {
    "abbreviations": {
      "EPOSA": "European Project on OSteoArthritis",
      "GPS": "Global Positioning System",
      "ISD": "Integrated Surface Database",
      "OR": "odds ratio"
    },
    "headline": "Researchers have used smartphone data to investigate the relationship between pain and weather conditions, and found that there is a small but significant relationship.",
    "keywords": [
      {
        "term": "physical activity",
        "url": "https://en.wikipedia.org/wiki/physical_activity"
      },
      {
        "term": "osteoarthritis",
        "url": "https://en.wikipedia.org/wiki/osteoarthritis"
      },
      {
        "term": "atmospheric pressure",
        "url": "https://en.wikipedia.org/wiki/atmospheric_pressure"
      },
      {
        "term": "rheumatoid arthritis",
        "url": "https://en.wikipedia.org/wiki/rheumatoid_arthritis"
      },
      {
        "term": "Global Positioning System",
        "url": "https://en.wikipedia.org/wiki/Global_Positioning_System"
      },
      {
        "term": "fibromyalgia",
        "url": "https://en.wikipedia.org/wiki/fibromyalgia"
      },
      {
        "term": "arthritis",
        "url": "https://en.wikipedia.org/wiki/arthritis"
      },
      {
        "term": "smartphone app",
        "url": "https://en.wikipedia.org/wiki/smartphone_app"
      },
      {
        "term": "relative humidity",
        "url": "https://en.wikipedia.org/wiki/relative_humidity"
      },
      {
        "term": "chronic pain",
        "url": "https://en.wikipedia.org/wiki/chronic_pain"
      },
      {
        "term": "odds ratio",
        "url": "https://en.wikipedia.org/wiki/odds_ratio"
      },
      {
        "term": "Parkinson disease",
        "url": "https://en.wikipedia.org/wiki/Parkinson_disease"
      },
      {
        "term": "joint pain",
        "url": "https://en.wikipedia.org/wiki/joint_pain"
      },
      {
        "term": "wind speed",
        "url": "https://en.wikipedia.org/wiki/wind_speed"
      },
      {
        "term": "cohort study",
        "url": "https://en.wikipedia.org/wiki/cohort_study"
      }
    ],
    "message": "",
    "metadata": {
      "citation": "William G. Dixon, Anna L. Beukenhorst, Belay B. Yimer, Louise Cook, Antonio Gasparrini, Tal El-Hay, Bruce Hellman, Ben James, Ana M. Vicedo-Cabrera, Malcolm Maclure, Ricardo Silva, John Ainsworth, Huai Leng Pisaniello, Thomas House, Mark Lunt, Carolyn Gamble, Caroline Sanders, David M. Schultz, Jamie C. Sergeant, John McBeth (2019). How the weather affects the pain of citizen scientists using a smartphone app. npj Digital Medicine 2. https://www.nature.com/articles/s41746-019-0180-3",
      "citation_affiliation": "",
      "citation_author": "William G. Dixon et al.",
      "citation_date": 2019,
      "citation_title": "How the weather affects the pain of citizen scientists using a smartphone app",
      "citation_url": "https://www.nature.com/articles/s41746-019-0180-3"
    },
    "readership_level": "technical-readership-accurate",
    "summary": "<a class=\"has-tooltip\" title=\"Read the article\" target=\"_blank\" href=\"https://www.nature.com/articles/s41746-019-0180-3\">William Dixon et al. (2019)</a> studied how the weather affects the pain of citizen scientists using a smartphone app. Weather has been thought to affect symptoms in patients with chronic disease since the time of Hippocrates over 2000 years ago.\nMultivariable case-crossover analysis including the four state weather variables demonstrated that an increase in relative humidity was associated with a higher odds of a pain event with an OR of 1.139 (95% confidence interval 1.099\u20131.181) per 10 percentage point increase.\nThis study has demonstrated that higher relative humidity and wind speed, and lower atmospheric pressure, were associated with increased pain severity in people with long-term pain conditions.\nThe \u2018worst\u2019 combination of weather variables would increase the odds of a pain event by just over 20% compared to an average day.<br/><br/>There were 2658 patients involved in the research. Discussing potential improvements, \u201cThere are potential limitations to this study.\nIt is possible only people with a strong belief in a weather\u2013pain relationship participated.\nRain and cold weather were the most common pre-existing beliefs, authors say,\u201d they admit. ",
    "title": "How the weather affects the pain of citizen scientists using a smartphone app"
  }
}

This endpoint generates synopsis from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint

HTTP Request

GET https://summarizer.scholarcy.com/summarize

Query Parameters

Parameter	Default	Description
url	null	URL of public, open-access document. Can be a DOI but must be qualified with a resolver domain, e.g. https://doi.org/10.1177/0846537120913497
input_text	null	You can pass a text string directly to the endpoint, instead of uploading a file or passing a URL.
structured_summary	false	Take the document structure into account, considering specific sections such as Introduction, Background, Methods, Results, Discussion, Conclusion.
summary_type	combined	Level of detail of summary: `overview`: abstractive synopsis of Scholarcy highlights. `detail`: abstractive synopsis of Scholarcy summary. `combined`: union of overview and detail. `merged`: an abstractive synopsis of the union of Scholarcy highlights and Scholarcy summary.
focus_level	4	This internal hyperparameter controls whether the summary takes a narrow focus on a specific fact or a wider focus on multiple facts within the source. `4`: wide focus. `3`: medium focus. `2`: narrow focus. `1`: narrowest focus
readership_level	technical-readership- accurate	This controls the level of language complexity and amount of paraphrasing in the output. `technical-readership-accurate`: output is for a technical/academic reader with a high level of factual accuracy in relation to the source text. `technical-readership-fast`: output is for a technical/academic reader and provides a little more paraphrasing, which may result in a slight loss in accuracy. However, it is 2x faster than `technical-readership-accurate`. `lay-readership-accurate`: output is for a lay/non-expert reader, with moderate paraphrasing and good level of accuracy in relation to the source text. `lay-readership-fast`: output is for a lay/non- expert reader, with much paraphrasing and reasonable level of accuracy in relation to the source text. However, it is 2x faster than `lay-readership-accurate`.
wiki_links	false	Map extracted key terms to Wikipedia entries.
format_summary	false	Format the summary into a 'mini review' so it can be more easily used as the basis of a referenced report: 1) Personal pronouns referring to the authors are replaced with the author names. 2) The summary is correctly cited with author and date. 3) A formatted reference to the source is generated.
headline_type	verbatim	Determines how the headline is generated. `verbatim` (default): uses the main finding extracted directly from the paper. The other options are as for `readership_level`, i.e. `technical-readership-accurate`, `technical-readership-fast`, `lay-readership-accurate` and `lay-readership-fast`. If `format_summary` is `true`, then `headline_type` defaults to `lay-readership-accurate` unless otherwise specified.

Errors

The Scholarcy API uses the following error codes:

Error Code	Meaning
400	Bad Request -- Your request is invalid.
401	Unauthorized -- Your API key is wrong.
403	Forbidden -- The API endpoint requested is hidden for administrators only.
404	Not Found -- The API endpoint could not be found.
405	Method Not Allowed -- You tried to call the API with an invalid method.
406	Not Acceptable -- You requested a format that isn't JSON.
429	Too Many Requests -- You're making too many API requests.
500	Internal Server Error -- We had a problem with our server. Try again later.
503	Service Unavailable -- We're temporarily offline for maintenance. Please try again later.
504	Gateway Timeout -- Serving your request took longer than expected. Please try again.