NAV
shell ruby python

Introduction

Welcome to the Scholarcy APIs. We have two core API servies:

  1. Metadata extraction API at https://api.scholarcy.com This is a developer API that comprises a number of endpoints for extracting machine-readable knowledge as JSON data from documents in many formats. The service is optimised to work with research papers and articles, but should provide useful results for any document in any format.
  2. Synopsis API at https://summarizer.scholarcy.com/ This includes a web front end for testing and a developer endpoint at /summarize

We provide examples in Shell, Ruby, and Python. You can view code examples in the dark area to the right, and you can switch the programming language of the examples with the tabs in the top right.

Authentication

Authentication headers must be sent with every request:


# 1. Metadata API:

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :file => File.new(file_path, 'rb')
          })
response = request.execute
puts(response.body)

# 2. Synopsis API:

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :file => File.new(file_path, 'rb')
          })
response = request.execute
puts(response.body)

# 1. Metadata API:

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

file_path = '/path/to/local/file.pdf'

with open(file_path, 'rb') as file_data:
    file_payload = {'file': file_data}
    r = requests.post(POST_ENDPOINT,
          headers=headers,
          files=file_payload,
          timeout=timeout)
    print(r.json())

# 2. Synopsis API:

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com/'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

file_path = '/path/to/local/file.pdf'

with open(file_path, 'rb') as file_data:
    file_payload = {'file': file_data}
    r = requests.post(POST_ENDPOINT,
          headers=headers,
          files=file_payload,
          timeout=timeout)
    print(r.json())

# 1. Metadata API:

# With shell, you can just pass the correct header with each request
curl "https://api.scholarcy.com/api/posters/generate" \
  -H "Authorization: Bearer abcdefg" \
  -F "file=@/path/to/local/file.pdf"

  curl "https://api.scholarcy.com/api/posters/generate" \
    -H "Authorization: Bearer abcdefg" \
    -d "url=https://www.nature.com/articles/s41746-019-0180-3"

# 2. Synopsis API:

# With shell, you can just pass the correct header with each request
curl "https://summarizer.scholarcy.com/summarize" \
  -H "Authorization: Bearer abcdefg" \
  -F "file=@/path/to/local/file.pdf"

  curl "https://summarizer.scholarcy.com/summarize" \
    -H "Authorization: Bearer abcdefg" \
    -d "url=https://www.nature.com/articles/s41746-019-0180-3"

Make sure to replace abcdefg with your API key.

Each API service (metadata and synopsis) requires a separate key.

The Scholarcy APIs expects the API key to be included in all API requests to the server in a header that looks like the following:

Authorization: Bearer abcdefg

Generate a Poster

The API endpoints at https://api.scholarcy.com/api/posters/generate will extract the information needed to populate data into your own poster-creation services, and will also generate a basic Powerpoint template for you to use as a starting point for editing.

POST a local file to generate a poster

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :file => File.new(file_path, 'rb'),
            :type => 'headline',
            :start_page => 24,
            :end_page => 37
          })
response = request.execute
puts(response.body)
import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

file_path = '/path/to/local/file.pdf'

params = {'type': 'headline', 'start_page': 24, 'end_page': 37}
with open(file_path, 'rb') as file_data:
    file_payload = {'file': file_data}
    r = requests.post(POST_ENDPOINT,
          headers=headers,
          files=file_payload,
          data=params,
          timeout=timeout)
    print(r.json())

curl "https://api.scholarcy.com/api/posters/generate" \
  -H "Authorization: Bearer abcdefg" \
  -F "file=@/path/to/local/file.pdf" \
  -F "type=headline" \
  -F "start_page=24" \
  -F "end_page=37"

The above command returns JSON structured like this:

{
  "filename": "filename.pdf",
  "content_type": "application/pdf",
  "file_size": 123456,
  "metadata": {
    "title": "Article title",
    "author": "Smith, J.",
    "pages": "15",
    "date": 2021,
    "affiliations": [
      "Department of Silly Walks, University of Life, London, UK"
    ],
    "identifiers": {
      "arxiv": null,
      "doi": "10.1010/101010.10.10.1101010",
      "isbn": null,
      "doc_id": null
    },
    "abstract": "This is a very exciting paper. Please read it.",
    "references": [
      "1. Smith, J. (2001) A study on self-citation. J. Chem. Biol., 123, 456-789.",
      "2. Jones, R. (2015) He didn't write this one. Science, 101, 101010."
    ],
    "emails": ["smith.j@uni.ac.uk"],
    "figure_captions": [
      {
        "id": "1",
        "caption": "Figure 1 caption"
      },
      {
        "id": "2",
        "caption": "Figure 2 caption"
      }
    ],
    "figure_urls": [
      "https://api.scholarcy.com/images/file.pdf_agtuhsnt_images_1x1uj_5t/img-000.png",
      "https://api.scholarcy.com/images/file.pdf_agtuhsnt_images_1x1uj_5t/img-002.png"
    ],
    "poster_url": "https://api.scholarcy.com/posters/file.pdf_agtuhsnt.pptx",
    "keywords": [
      "atomic force microscopy",
      "dna nanostructure",
      "drug release",
      "single-stranded DNA",
      "double-stranded DNA",
      "dox molecule"
    ],
    "abbreviations": {
      "DONs": "DNA origami nanostructures",
      "ROS": "reactive oxygen species",
      "DNase I": "deoxyribonuclease I",
      "PEG": "polyethylene glycol"
    },
    "headline": "We prove some important facts in this paper",
    "highlights": [
      "Facts are very important.",
      "The force is strong in this one.",
      "We ran some tests and this is what we found"
    ],
    "summary": {
      "Introduction": ["Introduction paragraph 1", "Introduction paragraph 2"],
      "Methods": [
        "We mixed some chemicals.",
        "We heated them up.",
        "We distilled the mixture."
      ],
      "Results": [
        "There was a big explosion",
        "But the crystals were pure",
        "We identified a new compound"
      ],
      "Conclusion": [
        "We proved some important things and we summarise them here.",
        "Further work is necessary"
      ]
    }
  }
}

This endpoint generates a poster from a local file. File formats supported are:

Please note that when sending a file, at least one additional parameter needs to be sent with the payload, e.g. start_page=1.

HTTP Request

POST https://api.scholarcy.com/api/posters/generate

Query Parameters

Parameter Default Description
file null A file object.
url null URL of public, open-access document.
type full The type of poster to generate. full will create a large, landscape poster with blocks for each section. headline will create a portrait poster containing the main takeaway finding and a single image.
start_page 1 Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page null Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.

GET a poster from a URL

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :url => 'https://www.nature.com/articles/s41746-019-0180-3',
            :type => 'full',
            :start_page => 1
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

params = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'type': 'full',
'start_page': 1
}
r = requests.post(POST_ENDPOINT,
      headers=headers,
      data=params,
      timeout=timeout)
print(r.json())

curl "https://api.scholarcy.com/api/posters/generate" \
  -H "Authorization: Bearer abcdefg" \
  -d "url=https://www.nature.com/articles/s41746-019-0180-3" \
  -d "type=full" \
  -d "start_page=1"

The above command returns JSON structured as for the POST endpoint:

{
  "filename": "filename.pdf",
  "content_type": "application/pdf",
  "file_size": 123456,
  "metadata": {
    "title": "Article title",
    "author": "Smith, J.",
    "pages": "15",
    "date": 2021,
    "affiliations": [],
    "identifiers": {
      "arxiv": null,
      "doi": "10.1010/101010.10.10.1101010",
      "isbn": null,
      "doc_id": null
    },
    "abstract": "This is a very exciting paper. Please read it.",
    "references": [],
    "emails": ["smith.j@uni.ac.uk"],
    "figure_captions": [
      {
        "id": "1",
        "caption": "Figure 1 caption"
      },
      {
        "id": "2",
        "caption": "Figure 2 caption"
      }
    ],
    "figure_urls": [],
    "poster_url": "https://api.scholarcy.com/posters/file.pdf_agtuhsnt.pptx",
    "keywords": [],
    "abbreviations": {},
    "headline": "We prove some important facts in this paper",
    "highlights": [],
    "summary": {
      "Introduction": [],
      "Methods": [],
      "Results": [],
      "Conclusion": []
    }
  }
}

This endpoint generates a poster from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint

HTTP Request

GET https://api.scholarcy.com/api/posters/generate

Query Parameters

Parameter Default Description
url null URL of public, open-access document.
type full The type of poster to generate. full will create a large, landscape poster with blocks for each section. headline will create a portrait poster containing the main takeaway finding and a single image.
start_page 1 Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page null Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.

Extract Highlights

The API endpoints at https://api.scholarcy.com/api/highlights/extract will pull out the key findings/highlights of an article and also provide a longer, extractive summary.

POST a local file to extract highlights

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/highlights/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :file => File.new(file_path, 'rb'),
            :start_page => 24,
            :end_page => 37
          })
response = request.execute
puts(response.body)
import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/highlights/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

file_path = '/path/to/local/file.pdf'

params = {'wiki_links': True, 'reference_links': True}
with open(file_path, 'rb') as file_data:
    file_payload = {'file': file_data}
    r = requests.post(POST_ENDPOINT,
          headers=headers,
          files=file_payload,
          data=params,
          timeout=timeout)
    print(r.json())

curl "https://api.scholarcy.com/api/highlights/extract" \
  -H "Authorization: Bearer abcdefg" \
  -F "file=@/path/to/local/file.pdf" \
  -F "start_page=24" \
  -F "end_page=37"

The above command returns JSON structured like this:

{
  "filename": "filename.pdf",
  "content_type": "application/pdf",
  "file_size": 123456,
  "metadata": {
    "title": "Article title",
    "author": "Smith, J.",
    "pages": "15",
    "date": 2021,
    "affiliations": [
      "Department of Silly Walks, University of Life, London, UK"
    ],
    "identifiers": {
      "arxiv": null,
      "doi": "10.1010/101010.10.10.1101010",
      "isbn": null,
      "doc_id": null
    },
    "abstract": "This is a very exciting paper. Please read it.",
    "funding": [
      {
        "award-group": [
          {
            "funding-source": "FEDER/COMPETE",
            "award-id": ["AAA/BBB/04007/2019"]
          }
        ],
        "funding-statement": "We acknowledge financial support from Fundação para a Ciência e a Tecnologia and FEDER/COMPETE (grant AAA/BBB/04007/2019)"
      }
    ]
  },
  "keywords": [
    "atomic force microscopy",
    "dna nanostructure",
    "drug release",
    "single-stranded DNA",
    "double-stranded DNA",
    "dox molecule"
  ],
  "keyword_relevance": {
    "atomic force microscopy": 0.345678,
    "dna nanostructure": 0.23456,
    "drug release": 0.12345,
    "single-stranded DNA": 0.034567,
    "double-stranded DNA": 0.02345,
    "dox molecule": 0.01234
  },
  "abbreviations": {
    "DONs": "DNA origami nanostructures",
    "ROS": "reactive oxygen species",
    "DNase I": "deoxyribonuclease I",
    "PEG": "polyethylene glycol"
  },
  "headline": "We prove some important facts in this paper",
  "highlights": [
    "Facts are very important.",
    "The force is strong in this one.",
    "We ran some tests and this is what we found"
  ],
  "findings": [
    "A statistically significant difference was noted between the four groups on the combined dependent variables",
    "We also noted significant differences when we performed a one-way between-groups analysis of variance on each of the 14 items (P < 0.001)"
  ],
  "summary": [],
  "structured_summary": {
    "Introduction": ["Introduction paragraph 1", "Introduction paragraph 2"],
    "Methods": [
      "We mixed some chemicals.",
      "We heated them up.",
      "We distilled the mixture."
    ],
    "Results": [
      "There was a big explosion",
      "But the crystals were pure",
      "We identified a new compound"
    ],
    "Conclusion": [
      "We proved some important things and we summarise them here.",
      "Further work is necessary"
    ]
  }
}

This endpoint extracts highlights from a local file. File formats supported are:

Please note that when sending a file, at least one additional parameter needs to be sent with the payload, e.g. wiki_links=true.

HTTP Request

POST https://api.scholarcy.com/api/highlights/extract

Query Parameters

Parameter Default Description
file null A file object.
url null URL of public, open-access document.
text null Plain text content to be processed.
start_page 1 Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page null Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
external_metadata false If true, fetch article metadata from the relevant remote repository (e.g. CrossRef).
wiki_links false If true, map extracted key terms to their Wikipedia pages
reference_links false If true, parse and link each reference to its full text location
replace_pronouns false If true, replace first-person pronouns with third-person mentions (the author(s)?, they).
key_points 5 The number of key points/key takeaway items to extract.
focus_terms null Semicolon separated list of terms around which the extracted highlights will focus.
sampling representative For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content.
extract_snippets true If true, sample snippets from each section, otherwise, sample the full text.
add_background_info false If true, generate an introductory sentence. Useful generating an abstract from an article.
add_concluding_info false If true, generate an concluding sentence. Useful generating an abstract from an article.
structured_summary false If true, summarise each of the main sections separately, and then provide a summary structured according to those sections.
summary_engine v1 v1: Best for articles. v2: best for book chapters.
highlights_algorithm weighted weighted: attend more closely to the results and conclusion. unweighted: attend to all content equally.
headline_from highlights highlights: use the highest scoring highlight as the headline. summary: use the first summary sentence as the headline. conclusions: use the first conclusion statement as a headline. claims: use the main claim as the headline.

GET highlights from a URL

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/highlights/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}

request = RestClient::Request.new(
          :method => :get,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :url => 'https://www.nature.com/articles/s41746-019-0180-3',
            :start_page => 1
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/highlights/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

payload = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'start_page': 1
}
r = requests.get(POST_ENDPOINT,
      headers=headers,
      params=payload,
      timeout=timeout)
print(r.json())

curl "https://api.scholarcy.com/api/highlights/extract" \
  -H "Authorization: Bearer abcdefg" \
  -d "url=https://www.nature.com/articles/s41746-019-0180-3" \
  -d "start_page=1"

The above command returns JSON structured as for the POST endpoint:

{
  "filename": "filename.pdf",
  "content_type": "application/pdf",
  "file_size": 123456,
  "metadata": {
    "title": "Article title",
    "author": "Smith, J.",
    "pages": "15",
    "date": 2021,
    "affiliations": [
      "Department of Silly Walks, University of Life, London, UK"
    ],
    "identifiers": {
      "arxiv": null,
      "doi": "10.1010/101010.10.10.1101010",
      "isbn": null,
      "doc_id": null
    },
    "abstract": "This is a very exciting paper. Please read it.",
    "funding": [
      {
        "award-group": [
          {
            "funding-source": "FEDER/COMPETE",
            "award-id": ["AAA/BBB/04007/2019"]
          }
        ],
        "funding-statement": "..."
      }
    ]
  },
  "keywords": [],
  "keyword_relevance": {},
  "abbreviations": {},
  "headline": "We prove some important facts in this paper",
  "highlights": [],
  "findings": [],
  "summary": [],
  "structured_summary": {
    "Introduction": [],
    "Methods": [],
    "Results": [],
    "Conclusion": []
  }
}

This endpoint extracts highlights from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint

HTTP Request

GET https://api.scholarcy.com/api/highlights/extract

Query Parameters

Parameter Default Description
url null URL of public, open-access document.
text null Plain text content to be processed.
start_page 1 Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page null Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
external_metadata false If true, fetch article metadata from the relevant remote repository (e.g. CrossRef).
wiki_links false If true, map extracted key terms to their Wikipedia pages
reference_links false If true, parse and link each reference to its full text location
replace_pronouns false If true, replace first-person pronouns with third-person mentions (the author(s)?, they).
key_points 5 The number of key points/key takeaway items to extract.
focus_terms null Semicolon separated list of terms around which the extracted highlights will focus.
sampling representative For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content.
extract_snippets true If true, sample snippets from each section, otherwise, sample the full text.
add_background_info false If true, generate an introductory sentence. Useful generating an abstract from an article.
add_concluding_info false If true, generate an concluding sentence. Useful generating an abstract from an article.
structured_summary false If true, summarise each of the main sections separately, and then provide a summary structured according to those sections.
summary_engine v1 v1: Best for articles. v2: best for book chapters.
highlights_algorithm weighted weighted: attend more closely to the results and conclusion. unweighted: attend to all content equally.
headline_from highlights highlights: use the highest scoring highlight as the headline. summary: use the first summary sentence as the headline. conclusions: use the first conclusion statement as a headline. claims: use the main claim as the headline.

Extract Structured Content

The API endpoints at https://api.scholarcy.com/api/metadata/extract and /api/metadata/basic will convert a document into structured, machine-readable data in JSON format.

POST a local file to extract content

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/metadata/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :file => File.new(file_path, 'rb'),
            :start_page => 24,
            :end_page => 37
          })
response = request.execute
puts(response.body)
import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/metadata/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

file_path = '/path/to/local/file.pdf'

params = {'start_page': 24, 'end_page': 37}
with open(file_path, 'rb') as file_data:
    file_payload = {'file': file_data}
    r = requests.post(POST_ENDPOINT,
          headers=headers,
          files=file_payload,
          data=params,
          timeout=timeout)
    print(r.json())

curl "https://api.scholarcy.com/api/metadata/extract" \
  -H "Authorization: Bearer abcdefg" \
  -F "file=@/path/to/local/file.pdf" \
  -F "start_page=24" \
  -F "end_page=37"

The above command returns JSON structured like this:

{
  "filename": "filename.pdf",
  "content_type": "application/pdf",
  "file_size": 123456,
  "metadata": {
    "title": "Article title",
    "author": "Smith, J.",
    "pages": "15",
    "date": 2021,
    "affiliations": [
      "Department of Silly Walks, University of Life, London, UK"
    ],
    "identifiers": {
      "arxiv": null,
      "doi": "10.1010/101010.10.10.1101010",
      "isbn": null,
      "doc_id": null
    },
    "abstract": "This is a very exciting paper. Please read it.",
    "references": [],
    "emails": ["author@email.com"],
    "funding": [
      {
        "award-group": [
          {
            "funding-source": "FEDER/COMPETE",
            "award-id": ["AAA/BBB/04007/2019"]
          }
        ],
        "funding-statement": "We acknowledge financial support from Fundação para a Ciência e a Tecnologia and FEDER/COMPETE (grant AAA/BBB/04007/2019)"
      }
    ],
    "table_captions": [
      {
        "id": "1",
        "caption": "Sample demographics and characteristics"
      },
      {
        "id": "2",
        "caption": "Construct measurements"
      }
    ],
    "figure_captions": []
  },
  "sections": {
    "introduction": ["Introduction section contents"],
    "methodology": ["Methods section contents"],
    "findings": ["Main results contents"],
    "conclusion": ["Concluding remarks"],
    "limitations": [
      "There are also several limitations to this research. Small sample size was an issue."
    ],
    "acknowledgements": [
      "We'd like to thank our supervisors for support, tea and biscuits."
    ],
    "funding": [
      "The authors acknowledge financial support from Fundação para a Ciência e a Tecnologia and FEDER/COMPETE (grant AAA/BBB/04007/2019)"
    ],
    "future_work": [
      "More research is needed to better understand what is going on."
    ],
    "objectives": [
      "The aim of this research is to provide insights into the inner workings of cellular processes."
    ]
  },
  "structured_content": [
    {
      "heading": "ABSTRACT",
      "content": ["This is a very exciting paper. Please read it."]
    },
    {
      "heading": "INTRODUCTION",
      "content": ["Introduction paragraph 1", "Introduction paragraph 2"]
    },
    {
      "heading": "RESEARCH METHODOLOGY",
      "content": ["Methods paragraph 1", "Methods paragraph 2"]
    },
    {
      "heading": "FINDINGS AND DISCUSSION",
      "content": ["Results paragraph 1", "Results paragraph 2"]
    },
    {
      "heading": "CONCLUSION",
      "content": ["Conclusion paragraph 1", "Conclusion paragraph 2"]
    }
  ],
  "participants": [
    {
      "participant": "Patients",
      "number": 15,
      "context": "Fifteen patients participated in the study."
    }
  ],
  "statistics": [
    {
      "tests": {
        "context": "We performed exploratory factor analysis using SPSS 20",
        "tests": [
          {
            "test": "exploratory factor analysis"
          }
        ]
      }
    },
    {
      "tests": {
        "context": "We performed confirmatory factor analyses with AMOS 20 using the maximum likelihood estimation method",
        "tests": [
          {
            "test": "confirmatory factor analyses"
          },
          {
            "test": "maximum likelihood estimation method"
          }
        ]
      }
    },
    {
      "p_value": "P < 0.001",
      "context": "We also noted significant differences when we performed a one-way between-groups analysis of variance on each of the 14 items (P < 0.001)</mark>",
      "tests": {
        "tests": [
          {
            "test": "analysis of variance",
            "value": "P < 0.001"
          }
        ]
      }
    }
  ],
  "keywords": [
    "atomic force microscopy",
    "dna nanostructure",
    "drug release",
    "single-stranded DNA",
    "double-stranded DNA",
    "dox molecule"
  ],
  "keyword_relevance": {
    "atomic force microscopy": 0.345678,
    "dna nanostructure": 0.23456,
    "drug release": 0.12345,
    "single-stranded DNA": 0.034567,
    "double-stranded DNA": 0.02345,
    "dox molecule": 0.01234
  },
  "abbreviations": {
    "DONs": "DNA origami nanostructures",
    "ROS": "reactive oxygen species",
    "DNase I": "deoxyribonuclease I",
    "PEG": "polyethylene glycol"
  },
  "headline": "We prove some important facts in this paper",
  "top_statements": [
    "Facts are very important.",
    "The force is strong in this one.",
    "We ran some tests and this is what we found"
  ],
  "findings": [
    "A statistically significant difference was noted between the four groups on the combined dependent variables",
    "We also noted significant differences when we performed a one-way between-groups analysis of variance on each of the 14 items (P < 0.001)"
  ],
  "facts": [],
  "claims": [],
  "summary": [],
  "structured_summary": {
    "Introduction": ["Introduction paragraph 1", "Introduction paragraph 2"],
    "Methods": [
      "We mixed some chemicals.",
      "We heated them up.",
      "We distilled the mixture."
    ],
    "Results": [
      "There was a big explosion",
      "But the crystals were pure",
      "We identified a new compound"
    ],
    "Conclusion": [
      "We proved some important things and we summarise them here.",
      "Further work is necessary"
    ]
  }
}

This endpoint extracts structured content from a local file. File formats supported are:

Please note that when sending a file, at least one additional parameter needs to be sent with the payload, e.g. engine=v1.

HTTP Request

POST https://api.scholarcy.com/api/metadata/extract

Query Parameters

Parameter Default Description
file null A file object.
url null URL of public, open-access document.
text null Plain text content to be processed.
start_page 1 Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page null Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
external_metadata false If true, fetch article metadata from the relevant remote repository (e.g. CrossRef).
parse_references false If true, parse into BibTeX and link each reference to its full text location.
reference_style ensemble Referencing style used by the document, if known, or use the default. Available values : acs, ama, anystyle, apa, chicago, ensemble, experimental, harvard, ieee, mhra, mla, nature, vancouver.
reference_format text Output references in plain text or bibtex format.
generate_summary true Create an extractive summary of the article.
summary_engine v1 v1: Best for articles. v2: best for book chapters.
replace_pronouns false If true, replace first-person pronouns in summary with third-person mentions (the author(s)?, they).
strip_dialogue false If true, remove dialog and quoted text from input prior to summarising.
summary_size 400 Length of summary in words.
summary_percent 0 Length of summary as a % of the original article.
structured_summary false If true, summarise each of the main sections separately, and then provide a summary structured according to those sections.
keyword_method sgrank+acr Available values : sgrank, sgrank+np, sgrank+acr, textrank, np, regex.
keyword_sample representative For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content.
keyword_limit 25 Target number of key terms to extract.
abbreviation_method schwartz Select an abbreviation extraction method. Available values: schwartz, statistical, ensemble.
wiki_links false If true, map extracted key terms to their Wikipedia pages.
extract_facts true Extract SVO-style factual statements from the article.
extract_claims true Extract specific claims made by the article.
key_points 5 The number of key points/key takeaway items to extract.
focus_terms null Semicolon separated list of terms around which the extracted highlights will focus.
citation_contexts false If true, extract the inline citation contexts (preceding and current sentences).
inline_citation_links false If true, link inline citations to their identifiers in the references.
extract_pico true Extract population, intervention, control, outcome data.
extract_tables false If true, extract tabular data as CSV/Excel files.
extract_figures false If true, extract figures and images as PNG files.
require_captions true Requires an accompanying caption to trigger figure/table extraction.
extract_sections true Extracts section headers and paragraphs.
include_bodytext true If extracting sections, includes the main body text content for each section.
unstructured_content false If true, include a raw, unstructured text dump of the file.
extract_snippets true If true, sample snippets from each section, otherwise, sample the full text.
engine v1 PDF text extraction engine. v1: best general purpose. v2: best for articles containing marginal line numbering or narrow column gutters.
image_engine v1 Image extraction engine. v1: best for bitmap images. v2: best for line images. Available values : v1, v2, v1+v2.

Output fields

Field Description
filename The filename of the uploaded document, or input URL slug
content_type The file or URL MIME type
metadata Structured article metadata
message Any error or status messages
title Article title
author List of authors
pages Number of pages in the document
date Article year
full_date Article date string in ISO 8601 format (where available)
affiliations Author affiliations
journal Journal title (from CrossRef)
abbreviated_journal Abbreviated journal title (where available)
volume Journal volume (from CrossRef)
page Journal page range (from CrossRef)
cited_by Citation count (from CrossRef)
identifiers Any identifier extracted from the document, such as DOI, ISBN, arXiv ID, or other identifier.
If an open-access version of the paper is available, the URL to that version will be displayed here.
abstract The author-written abstract, if available, or a proxy for the abstract, such as background, introduction, preface etc.
keywords Author-supplied keywords
references The plain reference strings extracted from the end of the article, or from the footnotes
emails Email addresses of the authors
type Article type: journal-article, book-chapter, preprint, web-page, review-article, case-study, report
references_ris RIS parse of the references
links Any URLs identified in the document
author_conclusions Author-stated conclusions/takeaways
funding Funding statement structured as follows: "award-group": [{"funding-source": "National Institutes of Health", "award-id": ["R43HL137469"] }]
table_captions Table captions
figure_captions Figure captions
tables_url Link to download the tables as Excel
figure_urls List of links to download extracted images as PNG files
word_count A range representing maximum and minimum estimated word count.
The maximum includes appendices and supplementary information.
The minimum includes the core article body text.
Both exclude references and footnotes.
is_oa Boolean flag if the document is open access or not.
This flag is only present if the input is a DOI URL, e.g. https://doi.org/10.1177/0846537120913497
oa_status Open access status: closed, bronze, green, or gold.
This flag is only present if the input is a DOI URL, e.g. https://doi.org/10.1177/0846537120913497
sections Snippets from each main section in the article
introduction, methods, results, conclusion If section headings can be mapped to standard names such as Introduction, Methods, Results, Conclusions, these snippets are shown here
funding Any funding statements
disclosures Any disclosures of conflicts of interest
ethical_compliance Any information about consent and ethical regulations
data_availability Any information about data and code availability related to this study
limitations Any discussion of study limitations
future_work Any information about further research needed and future work
registrations Any study registration identifiers
structured_content The section headings as they appear in the source document, along with their full section content.
participants Quantifiable information about the study subjects
statistics Information about statistical tests and analysis performed in the study
populations Quantifiable information about the population background
keywords A combination of the author-supplied keywords, plus new keywords or key terms extracted from the document
keyword_relevance keywords ranked by their relevance scores
species Any Latin species names detected
summary An extractive summary of the main points of the entire article.
structured_summary An extractive summary structured according to the main sections of the article.
reference_links Shown if reference parsing has been enabled.
This contains links to the full text for each of the references in the paper
facts Subject-predicate-object statements expressed in the article
claims Claims made by the authors of the study
findings Any important, quantitative findings extracted from the document, such as statistically significant results
key_statements A longer set of important sentences, from which the top_statements are selected.
top_statements The top 3-7 key points in the document.
Typically, these highlights will include introductory and concluding information, as well as the main claims and findings of the article
headline A short, one line summary of the entire article.
This headline attempts to express the main finding or main result of the paper.
abbreviations Abbreviations and their fully spelt out names, extracted from the document

GET structured content from a URL

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/metadata/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}

request = RestClient::Request.new(
          :method => :get,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :url => 'https://www.nature.com/articles/s41746-019-0180-3',
            :start_page => 1
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/metadata/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

payload = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'start_page': 1
}
r = requests.get(POST_ENDPOINT,
      headers=headers,
      params=payload,
      timeout=timeout)
print(r.json())

curl "https://api.scholarcy.com/api/metadata/extract" \
  -H "Authorization: Bearer abcdefg" \
  -d "url=https://www.nature.com/articles/s41746-019-0180-3" \
  -d "start_page=1"

The above command returns JSON structured as for the POST endpoint:

{
  "filename": "filename.pdf",
  "content_type": "application/pdf",
  "file_size": 123456,
  "metadata": {
    "title": "Article title",
    "author": "Smith, J.",
    "pages": "15",
    "date": 2021,
    "affiliations": [
      "Department of Silly Walks, University of Life, London, UK"
    ],
    "identifiers": {
      "arxiv": null,
      "doi": "10.1010/101010.10.10.1101010",
      "isbn": null,
      "doc_id": null
    },
    "abstract": "This is a very exciting paper. Please read it.",
    "references": [],
    "emails": ["author@email.com"],
    "funding": [],
    "table_captions": [],
    "figure_captions": []
  },
  "sections": {
    "introduction": [],
    "methodology": [],
    "findings": [],
    "conclusion": [],
    "limitations": [],
    "acknowledgements": [],
    "funding": [],
    "future_work": [],
    "objectives": []
  },
  "structured_content": [],
  "participants": [],
  "statistics": [],
  "keywords": [],
  "keyword_relevance": {},
  "abbreviations": {},
  "headline": "We prove some important facts in this paper",
  "top_statements": [],
  "findings": [],
  "facts": [],
  "claims": [],
  "summary": [],
  "structured_summary": {}
}

This endpoint extracts structured content from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint

HTTP Request

GET https://api.scholarcy.com/api/metadata/extract

Query Parameters

Parameter Default Description
url null URL of public, open-access document.
text null Plain text content to be processed.
start_page 1 Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page null Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
external_metadata false If true, fetch article metadata from the relevant remote repository (e.g. CrossRef).
parse_references false If true, parse into BibTeX and link each reference to its full text location.
reference_style ensemble Referencing style used by the document, if known, or use the default. Available values : acs, ama, anystyle, apa, chicago, ensemble, experimental, harvard, ieee, mhra, mla, nature, vancouver.
reference_format text Output references in plain text or bibtex format.
generate_summary true Create an extractive summary of the article.
summary_engine v1 v1: Best for articles. v2: best for book chapters.
replace_pronouns false If true, replace first-person pronouns in summary with third-person mentions (the author(s)?, they).
strip_dialogue false If true, remove dialog and quoted text from input prior to summarising.
summary_size 400 Length of summary in words.
summary_percent 0 Length of summary as a % of the original article.
structured_summary false If true, summarise each of the main sections separately, and then provide a summary structured according to those sections.
keyword_method sgrank+acr Available values : sgrank, sgrank+np, sgrank+acr, textrank, np, regex.
keyword_sample representative For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content.
keyword_limit 25 Target number of key terms to extract.
abbreviation_method schwartz Select an abbreviation extraction method. Available values: schwartz, statistical, ensemble.
wiki_links false If true, map extracted key terms to their Wikipedia pages.
extract_facts true Extract SVO-style factual statements from the article.
extract_claims true Extract specific claims made by the article.
key_points 5 The number of key points/key takeaway items to extract.
focus_terms null Semicolon separated list of terms around which the extracted highlights will focus.
citation_contexts false If true, extract the inline citation contexts (preceding and current sentences).
inline_citation_links false If true, link inline citations to their identifiers in the references.
extract_pico true Extract population, intervention, control, outcome data.
extract_tables false If true, extract tabular data as CSV/Excel files.
extract_figures false If true, extract figures and images as PNG files.
require_captions true Requires an accompanying caption to trigger figure/table extraction.
extract_sections true Extracts section headers and paragraphs.
include_bodytext true If extracting sections, includes the main body text content for each section.
unstructured_content false If true, include a raw, unstructured text dump of the file.
extract_snippets true If true, sample snippets from each section, otherwise, sample the full text.
engine v1 PDF text extraction engine. v1: best general purpose. v2: best for articles containing marginal line numbering or narrow column gutters.
image_engine v1 Image extraction engine. v1: best for bitmap images. v2: best for line images. Available values : v1, v2, v1+v2.

Extract Key Terms

The API endpoints at https://api.scholarcy.com/api/keywords/extract will pull out the key terms from an article.

POST a local file to extract key terms

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/keywords/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :file => File.new(file_path, 'rb'),
            :start_page => 24,
            :end_page => 37
          })
response = request.execute
puts(response.body)
import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/keywords/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

file_path = '/path/to/local/file.pdf'

params = {'start_page': 24, 'end_page': 37}
with open(file_path, 'rb') as file_data:
    file_payload = {'file': file_data}
    r = requests.post(POST_ENDPOINT,
          headers=headers,
          files=file_payload,
          data=params,
          timeout=timeout)
    print(r.json())

curl "https://api.scholarcy.com/api/keywords/extract" \
  -H "Authorization: Bearer abcdefg" \
  -F "file=@/path/to/local/file.pdf" \
  -F "start_page=24" \
  -F "end_page=37"

The above command returns JSON structured like this:

{
  "filename": "article2016.pdf",
  "abbreviations": {
    "U&G": "uses and gratification",
    "SNS": "social networking sites",
    "COBRA": "Consumer Online Brand-Related Activity",
    "AVE": "average variance extracted",
    "CLF": "common latent factor"
  },
  "keywords": [
    {
      "term": "Facebook",
      "url": "https://en.wikipedia.org/wiki/Facebook"
    },
    {
      "term": "typology",
      "url": "https://en.wikipedia.org/wiki/typology"
    },
    {
      "term": "social media",
      "url": "https://en.wikipedia.org/wiki/social_media"
    },
    {
      "term": "consumer behavior",
      "url": "https://en.wikipedia.org/wiki/consumer_behavior"
    },
    {
      "term": "average variance extracted",
      "url": "https://en.wikipedia.org/wiki/average_variance_extracted"
    },
    {
      "term": "online branding",
      "url": "https://en.wikipedia.org/wiki/online_branding"
    },
    {
      "term": "cluster analysis",
      "url": "https://en.wikipedia.org/wiki/cluster_analysis"
    },
    {
      "term": "social networking sites",
      "url": "https://en.wikipedia.org/wiki/social_networking_sites"
    },
    {
      "term": "brand manager",
      "url": "https://en.wikipedia.org/wiki/brand_manager"
    }
  ],
  "keyword_relevance": {
    "Facebook": 0.3834808259587021,
    "social media": 0.16224188790560473,
    "social networking sites": 0.08259587020648967,
    "brand interaction": 0.07964601769911504,
    "typology": 0.038348082595870206,
    "brand manager": 0.038348082595870206,
    "uses and gratification": 0.035398230088495575,
    "consumer interaction": 0.032448377581120944,
    "consumer behavior": 0.02359882005899705,
    "brand communication": 0.02064896755162242,
    "cluster analysis": 0.017699115044247787,
    "main motivation": 0.017699115044247787,
    "Consumer Online Brand-Related Activity": 0.014749262536873156,
    "average variance extracted": 0.011799410029498525,
    "common latent factor": 0.008849557522123894,
    "online branding": 0.008849557522123894
  }
}

The above command can also returns CSV structured like this:

"filename","key term","wikipedia_link"
"article2016.pdf","Facebook","https://en.wikipedia.org/wiki/Facebook"
"article2016.pdf","typology","https://en.wikipedia.org/wiki/typology"
"article2016.pdf","social media","https://en.wikipedia.org/wiki/social_media"
"article2016.pdf","consumer behavior","https://en.wikipedia.org/wiki/consumer_behavior"
"article2016.pdf","average variance extracted","https://en.wikipedia.org/wiki/average_variance_extracted"
"article2016.pdf","online branding","https://en.wikipedia.org/wiki/online_branding"
"article2016.pdf","cluster analysis","https://en.wikipedia.org/wiki/cluster_analysis"
"article2016.pdf","social networking sites","https://en.wikipedia.org/wiki/social_networking_sites"
"article2016.pdf","brand manager","https://en.wikipedia.org/wiki/brand_manager"

This endpoint extracts key terms from a local file. File formats supported are:

Please note that when sending a file, at least one additional parameter needs to be sent with the payload, e.g. output_format=csv.

HTTP Request

POST https://api.scholarcy.com/api/keywords/extract

Query Parameters

Parameter Default Description
file null A file object.
url null URL of public, open-access document.
text null Plain text content to be processed.
start_page 1 Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page null Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
external_metadata false If true, fetch article metadata from the relevant remote repository (e.g. CrossRef).
wiki_links false If true, map extracted key terms to their Wikipedia pages
sampling representative For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content.
extract_snippets true If true, sample snippets from each section, otherwise, sample the full text.
output_format json json or CSV.

GET key terms from a URL

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/keywords/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}

request = RestClient::Request.new(
          :method => :get,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :url => 'https://www.nature.com/articles/s41746-019-0180-3',
            :start_page => 1
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/keywords/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

payload = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'start_page': 1
}
r = requests.get(POST_ENDPOINT,
      headers=headers,
      params=payload,
      timeout=timeout)
print(r.json())

curl "https://api.scholarcy.com/api/keywords/extract" \
  -H "Authorization: Bearer abcdefg" \
  -d "url=https://www.nature.com/articles/s41746-019-0180-3" \
  -d "start_page=1"

The above command returns JSON structured as for the POST endpoint:

The above command returns JSON structured like this:

{
  "filename": "article2016.pdf",
  "abbreviations": {
    "U&G": "uses and gratification",
    "SNS": "social networking sites",
    "COBRA": "Consumer Online Brand-Related Activity",
    "AVE": "average variance extracted",
    "CLF": "common latent factor"
  },
  "keywords": [
    {
      "term": "Facebook",
      "url": "https://en.wikipedia.org/wiki/Facebook"
    },
    {
      "term": "typology",
      "url": "https://en.wikipedia.org/wiki/typology"
    },
    {
      "term": "social media",
      "url": "https://en.wikipedia.org/wiki/social_media"
    },
    {
      "term": "consumer behavior",
      "url": "https://en.wikipedia.org/wiki/consumer_behavior"
    },
    {
      "term": "average variance extracted",
      "url": "https://en.wikipedia.org/wiki/average_variance_extracted"
    },
    {
      "term": "online branding",
      "url": "https://en.wikipedia.org/wiki/online_branding"
    },
    {
      "term": "cluster analysis",
      "url": "https://en.wikipedia.org/wiki/cluster_analysis"
    },
    {
      "term": "social networking sites",
      "url": "https://en.wikipedia.org/wiki/social_networking_sites"
    },
    {
      "term": "brand manager",
      "url": "https://en.wikipedia.org/wiki/brand_manager"
    }
  ],
  "keyword_relevance": {
    "Facebook": 0.3834808259587021,
    "social media": 0.16224188790560473,
    "social networking sites": 0.08259587020648967,
    "brand interaction": 0.07964601769911504,
    "typology": 0.038348082595870206,
    "brand manager": 0.038348082595870206,
    "uses and gratification": 0.035398230088495575,
    "consumer interaction": 0.032448377581120944,
    "consumer behavior": 0.02359882005899705,
    "brand communication": 0.02064896755162242,
    "cluster analysis": 0.017699115044247787,
    "main motivation": 0.017699115044247787,
    "Consumer Online Brand-Related Activity": 0.014749262536873156,
    "average variance extracted": 0.011799410029498525,
    "common latent factor": 0.008849557522123894,
    "online branding": 0.008849557522123894
  }
}

The above command can also returns CSV structured like this:

"filename","key term","wikipedia_link"
"article2016.pdf","Facebook","https://en.wikipedia.org/wiki/Facebook"
"article2016.pdf","typology","https://en.wikipedia.org/wiki/typology"
"article2016.pdf","social media","https://en.wikipedia.org/wiki/social_media"
"article2016.pdf","consumer behavior","https://en.wikipedia.org/wiki/consumer_behavior"
"article2016.pdf","average variance extracted","https://en.wikipedia.org/wiki/average_variance_extracted"
"article2016.pdf","online branding","https://en.wikipedia.org/wiki/online_branding"
"article2016.pdf","cluster analysis","https://en.wikipedia.org/wiki/cluster_analysis"
"article2016.pdf","social networking sites","https://en.wikipedia.org/wiki/social_networking_sites"
"article2016.pdf","brand manager","https://en.wikipedia.org/wiki/brand_manager"

This endpoint extracts key terms from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint

HTTP Request

GET https://api.scholarcy.com/api/keywords/extract

Query Parameters

Parameter Default Description
url null URL of public, open-access document.
text null Plain text content to be processed.
start_page 1 Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
end_page null Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file.
external_metadata false If true, fetch article metadata from the relevant remote repository (e.g. CrossRef).
wiki_links false If true, map extracted key terms to their Wikipedia pages
sampling representative For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content.
extract_snippets true If true, sample snippets from each section, otherwise, sample the full text.
output_format json json or CSV.

Generate a Synopsis

The API endpoints at https://summarizer.scholarcy.com/summarize will generate a short, abstractive synopsis (70-100 words) or a mini-review (around 150-300 words), depending on the parameters chosen.

By default, output is in JSON format.

Alternatively, you can receive output in HTML format if you pass an Accept: text/html header with your request.

POST a local file to generate a synopsis

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'

request = RestClient::Request.new(
          :method => :post,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :multipart => true,
            :file => File.new(file_path, 'rb'),
            :wiki_links => true,
            :format_summary => true
          })
response = request.execute
puts(response.body)
import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

file_path = '/path/to/local/file.pdf'

params = {'wiki_links': True, 'format_summary': True}
with open(file_path, 'rb') as file_data:
    file_payload = {'file': file_data}
    r = requests.post(POST_ENDPOINT,
          headers=headers,
          files=file_payload,
          data=params,
          timeout=timeout)
    print(r.json())

curl "https://summarizer.scholarcy.com/summarize" \
  -H "Authorization: Bearer abcdefg" \
  -F "file=@/path/to/local/file.pdf" \
  -F "wiki_links=true" \
  -F "format_summary=true"

The above command returns JSON structured like this:

{
  "response": {
    "abbreviations": {
      "EPOSA": "European Project on OSteoArthritis",
      "GPS": "Global Positioning System",
      "ISD": "Integrated Surface Database",
      "OR": "odds ratio"
    },
    "headline": "Researchers have used smartphone data to investigate the relationship between pain and weather conditions, and found that there is a small but significant relationship.",
    "keywords": [
      {
        "term": "physical activity",
        "url": "https://en.wikipedia.org/wiki/physical_activity"
      },
      {
        "term": "osteoarthritis",
        "url": "https://en.wikipedia.org/wiki/osteoarthritis"
      },
      {
        "term": "atmospheric pressure",
        "url": "https://en.wikipedia.org/wiki/atmospheric_pressure"
      },
      {
        "term": "rheumatoid arthritis",
        "url": "https://en.wikipedia.org/wiki/rheumatoid_arthritis"
      },
      {
        "term": "Global Positioning System",
        "url": "https://en.wikipedia.org/wiki/Global_Positioning_System"
      },
      {
        "term": "fibromyalgia",
        "url": "https://en.wikipedia.org/wiki/fibromyalgia"
      },
      {
        "term": "arthritis",
        "url": "https://en.wikipedia.org/wiki/arthritis"
      },
      {
        "term": "smartphone app",
        "url": "https://en.wikipedia.org/wiki/smartphone_app"
      },
      {
        "term": "relative humidity",
        "url": "https://en.wikipedia.org/wiki/relative_humidity"
      },
      {
        "term": "chronic pain",
        "url": "https://en.wikipedia.org/wiki/chronic_pain"
      },
      {
        "term": "odds ratio",
        "url": "https://en.wikipedia.org/wiki/odds_ratio"
      },
      {
        "term": "Parkinson disease",
        "url": "https://en.wikipedia.org/wiki/Parkinson_disease"
      },
      {
        "term": "joint pain",
        "url": "https://en.wikipedia.org/wiki/joint_pain"
      },
      {
        "term": "wind speed",
        "url": "https://en.wikipedia.org/wiki/wind_speed"
      },
      {
        "term": "cohort study",
        "url": "https://en.wikipedia.org/wiki/cohort_study"
      }
    ],
    "message": "",
    "metadata": {
      "citation": "William G. Dixon, Anna L. Beukenhorst, Belay B. Yimer, Louise Cook, Antonio Gasparrini, Tal El-Hay, Bruce Hellman, Ben James, Ana M. Vicedo-Cabrera, Malcolm Maclure, Ricardo Silva, John Ainsworth, Huai Leng Pisaniello, Thomas House, Mark Lunt, Carolyn Gamble, Caroline Sanders, David M. Schultz, Jamie C. Sergeant, John McBeth (2019). How the weather affects the pain of citizen scientists using a smartphone app. npj Digital Medicine 2. https://www.nature.com/articles/s41746-019-0180-3",
      "citation_affiliation": "",
      "citation_author": "William G. Dixon et al.",
      "citation_date": 2019,
      "citation_title": "How the weather affects the pain of citizen scientists using a smartphone app",
      "citation_url": "https://www.nature.com/articles/s41746-019-0180-3"
    },
    "readership_level": "technical-readership-accurate",
    "summary": "<a class=\"has-tooltip\" title=\"Read the article\" target=\"_blank\" href=\"https://www.nature.com/articles/s41746-019-0180-3\">William Dixon et al. (2019)</a> studied how the weather affects the pain of citizen scientists using a smartphone app. Weather has been thought to affect symptoms in patients with chronic disease since the time of Hippocrates over 2000 years ago.\nMultivariable case-crossover analysis including the four state weather variables demonstrated that an increase in relative humidity was associated with a higher odds of a pain event with an OR of 1.139 (95% confidence interval 1.099\u20131.181) per 10 percentage point increase.\nThis study has demonstrated that higher relative humidity and wind speed, and lower atmospheric pressure, were associated with increased pain severity in people with long-term pain conditions.\nThe \u2018worst\u2019 combination of weather variables would increase the odds of a pain event by just over 20% compared to an average day.<br/><br/>There were 2658 patients involved in the research. Discussing potential improvements, \u201cThere are potential limitations to this study.\nIt is possible only people with a strong belief in a weather\u2013pain relationship participated.\nRain and cold weather were the most common pre-existing beliefs, authors say,\u201d they admit. ",
    "title": "How the weather affects the pain of citizen scientists using a smartphone app"
  }
}

This endpoint generates a synopsis from a local file. File formats supported are:

HTTP Request

POST https://summarizer.scholarcy.com/summarize

Query Parameters

Parameter Default Description
file null A file object.
url null URL of public, open-access document. Can be a DOI but must be qualified with a resolver domain, e.g. https://doi.org/10.1177/0846537120913497
input_text null You can pass a text string directly to the endpoint, instead of uploading a file or passing a URL.
structured_summary false Take the document structure into account, considering specific sections such as Introduction, Background, Methods, Results, Discussion, Conclusion.
summary_type combined Level of detail of summary: overview: abstractive synopsis of Scholarcy highlights. detail: abstractive synopsis of Scholarcy summary. combined: union of overview and detail. merged: an abstractive synopsis of the union of Scholarcy highlights and Scholarcy summary.
focus_level 4 This internal hyperparameter controls whether the summary takes a narrow focus on a specific fact or a wider focus on multiple facts within the source. 4: wide focus. 3: medium focus. 2: narrow focus. 1: narrowest focus
readership_level technical-readership- accurate This controls the level of language complexity and amount of paraphrasing in the output. technical-readership-accurate: output is for a technical/academic reader with a high level of factual accuracy in relation to the source text. technical-readership-fast: output is for a technical/academic reader and provides a little more paraphrasing, which may result in a slight loss in accuracy. However, it is 2x faster than technical-readership-accurate. lay-readership-accurate: output is for a lay/non-expert reader, with moderate paraphrasing and good level of accuracy in relation to the source text. lay-readership-fast: output is for a lay/non- expert reader, with much paraphrasing and reasonable level of accuracy in relation to the source text. However, it is 2x faster than lay-readership-accurate.
wiki_links false Map extracted key terms to Wikipedia entries.
format_summary false Format the summary so it can be more easily used as part of a referenced report: 1) Personal pronouns referring to the authors are replaced with the author names. 2) The summary is correctly cited with author and date. 3) A formatted reference to the source is generated
headline_type verbatim Determines how the headline is generated. verbatim (default): uses the main finding extracted directly from the paper. The other options are as for readership_level, i.e. technical-readership-accurate, technical-readership-fast, lay-readership-accurate and lay-readership-fast. If format_summary is true, then headline_type defaults to lay-readership-accurate unless otherwise specified.

GET a synopsis from a URL

require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}

request = RestClient::Request.new(
          :method => :get,
          :url => POST_ENDPOINT,
          :headers => headers,
          :payload => {
            :url => 'https://www.nature.com/articles/s41746-019-0180-3',
            :wiki_links => true,
            :format_summary => true
          })
response = request.execute
puts(response.body)

import requests
timeout = 30

AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

payload = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'wiki_links': True,
'format_summary': True
}
r = requests.get(POST_ENDPOINT,
      headers=headers,
      params=payload,
      timeout=timeout)
print(r.json())

curl "https://summarizer.scholarcy.com/summarize" \
  -H "Authorization: Bearer abcdefg" \
  -d "url=https://www.nature.com/articles/s41746-019-0180-3" \
  -d "wiki_links=true" \
  -d "format_summary=true"

The above command returns JSON structured as for the POST endpoint:

{
  "response": {
    "abbreviations": {
      "EPOSA": "European Project on OSteoArthritis",
      "GPS": "Global Positioning System",
      "ISD": "Integrated Surface Database",
      "OR": "odds ratio"
    },
    "headline": "Researchers have used smartphone data to investigate the relationship between pain and weather conditions, and found that there is a small but significant relationship.",
    "keywords": [
      {
        "term": "physical activity",
        "url": "https://en.wikipedia.org/wiki/physical_activity"
      },
      {
        "term": "osteoarthritis",
        "url": "https://en.wikipedia.org/wiki/osteoarthritis"
      },
      {
        "term": "atmospheric pressure",
        "url": "https://en.wikipedia.org/wiki/atmospheric_pressure"
      },
      {
        "term": "rheumatoid arthritis",
        "url": "https://en.wikipedia.org/wiki/rheumatoid_arthritis"
      },
      {
        "term": "Global Positioning System",
        "url": "https://en.wikipedia.org/wiki/Global_Positioning_System"
      },
      {
        "term": "fibromyalgia",
        "url": "https://en.wikipedia.org/wiki/fibromyalgia"
      },
      {
        "term": "arthritis",
        "url": "https://en.wikipedia.org/wiki/arthritis"
      },
      {
        "term": "smartphone app",
        "url": "https://en.wikipedia.org/wiki/smartphone_app"
      },
      {
        "term": "relative humidity",
        "url": "https://en.wikipedia.org/wiki/relative_humidity"
      },
      {
        "term": "chronic pain",
        "url": "https://en.wikipedia.org/wiki/chronic_pain"
      },
      {
        "term": "odds ratio",
        "url": "https://en.wikipedia.org/wiki/odds_ratio"
      },
      {
        "term": "Parkinson disease",
        "url": "https://en.wikipedia.org/wiki/Parkinson_disease"
      },
      {
        "term": "joint pain",
        "url": "https://en.wikipedia.org/wiki/joint_pain"
      },
      {
        "term": "wind speed",
        "url": "https://en.wikipedia.org/wiki/wind_speed"
      },
      {
        "term": "cohort study",
        "url": "https://en.wikipedia.org/wiki/cohort_study"
      }
    ],
    "message": "",
    "metadata": {
      "citation": "William G. Dixon, Anna L. Beukenhorst, Belay B. Yimer, Louise Cook, Antonio Gasparrini, Tal El-Hay, Bruce Hellman, Ben James, Ana M. Vicedo-Cabrera, Malcolm Maclure, Ricardo Silva, John Ainsworth, Huai Leng Pisaniello, Thomas House, Mark Lunt, Carolyn Gamble, Caroline Sanders, David M. Schultz, Jamie C. Sergeant, John McBeth (2019). How the weather affects the pain of citizen scientists using a smartphone app. npj Digital Medicine 2. https://www.nature.com/articles/s41746-019-0180-3",
      "citation_affiliation": "",
      "citation_author": "William G. Dixon et al.",
      "citation_date": 2019,
      "citation_title": "How the weather affects the pain of citizen scientists using a smartphone app",
      "citation_url": "https://www.nature.com/articles/s41746-019-0180-3"
    },
    "readership_level": "technical-readership-accurate",
    "summary": "<a class=\"has-tooltip\" title=\"Read the article\" target=\"_blank\" href=\"https://www.nature.com/articles/s41746-019-0180-3\">William Dixon et al. (2019)</a> studied how the weather affects the pain of citizen scientists using a smartphone app. Weather has been thought to affect symptoms in patients with chronic disease since the time of Hippocrates over 2000 years ago.\nMultivariable case-crossover analysis including the four state weather variables demonstrated that an increase in relative humidity was associated with a higher odds of a pain event with an OR of 1.139 (95% confidence interval 1.099\u20131.181) per 10 percentage point increase.\nThis study has demonstrated that higher relative humidity and wind speed, and lower atmospheric pressure, were associated with increased pain severity in people with long-term pain conditions.\nThe \u2018worst\u2019 combination of weather variables would increase the odds of a pain event by just over 20% compared to an average day.<br/><br/>There were 2658 patients involved in the research. Discussing potential improvements, \u201cThere are potential limitations to this study.\nIt is possible only people with a strong belief in a weather\u2013pain relationship participated.\nRain and cold weather were the most common pre-existing beliefs, authors say,\u201d they admit. ",
    "title": "How the weather affects the pain of citizen scientists using a smartphone app"
  }
}

This endpoint generates synopsis from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint

HTTP Request

GET https://summarizer.scholarcy.com/summarize

Query Parameters

Parameter Default Description
url null URL of public, open-access document. Can be a DOI but must be qualified with a resolver domain, e.g. https://doi.org/10.1177/0846537120913497
input_text null You can pass a text string directly to the endpoint, instead of uploading a file or passing a URL.
structured_summary false Take the document structure into account, considering specific sections such as Introduction, Background, Methods, Results, Discussion, Conclusion.
summary_type combined Level of detail of summary: overview: abstractive synopsis of Scholarcy highlights. detail: abstractive synopsis of Scholarcy summary. combined: union of overview and detail. merged: an abstractive synopsis of the union of Scholarcy highlights and Scholarcy summary.
focus_level 4 This internal hyperparameter controls whether the summary takes a narrow focus on a specific fact or a wider focus on multiple facts within the source. 4: wide focus. 3: medium focus. 2: narrow focus. 1: narrowest focus
readership_level technical-readership- accurate This controls the level of language complexity and amount of paraphrasing in the output. technical-readership-accurate: output is for a technical/academic reader with a high level of factual accuracy in relation to the source text. technical-readership-fast: output is for a technical/academic reader and provides a little more paraphrasing, which may result in a slight loss in accuracy. However, it is 2x faster than technical-readership-accurate. lay-readership-accurate: output is for a lay/non-expert reader, with moderate paraphrasing and good level of accuracy in relation to the source text. lay-readership-fast: output is for a lay/non- expert reader, with much paraphrasing and reasonable level of accuracy in relation to the source text. However, it is 2x faster than lay-readership-accurate.
wiki_links false Map extracted key terms to Wikipedia entries.
format_summary false Format the summary into a 'mini review' so it can be more easily used as the basis of a referenced report: 1) Personal pronouns referring to the authors are replaced with the author names. 2) The summary is correctly cited with author and date. 3) A formatted reference to the source is generated.
headline_type verbatim Determines how the headline is generated. verbatim (default): uses the main finding extracted directly from the paper. The other options are as for readership_level, i.e. technical-readership-accurate, technical-readership-fast, lay-readership-accurate and lay-readership-fast. If format_summary is true, then headline_type defaults to lay-readership-accurate unless otherwise specified.

Errors

The Scholarcy API uses the following error codes:

Error Code Meaning
400 Bad Request -- Your request is invalid.
401 Unauthorized -- Your API key is wrong.
403 Forbidden -- The API endpoint requested is hidden for administrators only.
404 Not Found -- The API endpoint could not be found.
405 Method Not Allowed -- You tried to call the API with an invalid method.
406 Not Acceptable -- You requested a format that isn't JSON.
429 Too Many Requests -- You're making too many API requests.
500 Internal Server Error -- We had a problem with our server. Try again later.
503 Service Unavailable -- We're temporarily offline for maintenance. Please try again later.
504 Gateway Timeout -- Serving your request took longer than expected. Please try again.