Table of Contents
Introduction
This Playbook is informed by personal experience learning the ASpace API and Python + other skills that help in these types of projects. If you already have some of this experience, there may still be something here to guide your journey forward.
Get Access to an API for Testing
You have a few options. Read them all before deciding.
Option 1: Use a Copy of your Production Data
Ask the following to your IT department or hosting provider:
- Do/can we have a sandbox? i.e., a copy of your Production data in a separate database.
- Is the API enabled?
- What’s the URL? This is probably going look something like the address you already use to log into ArchivesSpace, but different.
Pro-tip: Once you have the URL, you can test it by navigating to it in your browser! If you see something like the following, your API URL is ready to go:
Option 2: Download and Run AS Locally on Your Machine
Please note that the following instructions may become outdated; you should consider these instructions the canonical text on this.
Windows
- Read the instructions first; the below assumes you have read them.
- Download the ASpace zip file from the Releases page (your choice of ASpace version) to your default Downloads folder. Don’t save it in a crazy location because you will need to navigate there via the command line; so the easier the location, the fewer navigation commands.
- Download it directly to your Windows Downloads folder.
- Unzip it directly to your Windows Downloads folder.
- After unzipping, open the Windows Command Prompt:
- Read this section all the way through before proceeding and refer to the screenshot below.
- Per the instructions, the first thing to do is determine whether you have Java 1.8 installed. This version of Java is referred to as Java 8 despite the fact that its version number is 1.8. Please note that 1.8 is not the current version of Java, and if you're using an older version you will have to update to 1.8; if you're using a later version, you will have to revert back to 1.8. Google instructions on how to revert to 1.8 if necessary.
You only have to do this the first time; once you have Java 1.8 installed, you’re good to go any time after.
java -version
- After determining your version of Java (or pausing to install it), now your goal is to get to your Downloads folder > move into the version of ArchivesSpace you just unzipped > then move yet further into a directory called archivesspace. I did that by typing these three commands in order. You can see all three commands in the screenshot that follows:
cd Downloads
cd archivesspace-v2.8.1
cd archivesspace
- Start ArchivesSpace!
archivesspace.bat
- Per the instructions, the first thing to do is determine whether you have Java 1.8 installed. This version of Java is referred to as Java 8 despite the fact that its version number is 1.8. Please note that 1.8 is not the current version of Java, and if you're using an older version you will have to update to 1.8; if you're using a later version, you will have to revert back to 1.8. Google instructions on how to revert to 1.8 if necessary.
- Wait a few minutes and then open a browser and go to http://localhost:8080 and you should see the familiar staff interface. The username/password is admin/admin by default. Log in! The first time doing this you will need to create a new repository.
The API for this local instance is http://localhost:8089 - To shut ASpace down, go back to the command prompt and hit ctrl + c. Type in Y to terminate the batch job, or, just close the command prompt, and AS should shut down automatically.
- To start ASpace up again, repeat Steps 6.2 and 6.3; you don’t need to check for Java every time unless you’re doing this on a new computer.
Linux and OSX
- Read the instructions first; the below assumes you have read them.
- Download the ASpace zip file from the Releases page (your choice of ASpace version) to your default Downloads folder. Don’t save it in a crazy location because you will need to navigate there via the command line; the easier the location, the fewer navigation commands.
- Download it directly to my Downloads folder.
- Unzipped it directly to my Downloads folder.
- After unzipping, open the Terminal.
- Read this section all the way through before proceeding and refer to the screenshot below. The screenshot below was done in Ubuntu, but the commands are the same in OSX.
- Per the instructions, the first thing to do is determine whether you have Java 1.8 installed. This version of Java is referred to as Java 8 despite the fact that its version number is 1.8. Please note that 1.8 is not the current version of Java, and if you're using an older version you will have to update to 1.8; if you're using a later version, you will have to revert back to 1.8. Google instructions on how to revert to 1.8 if necessary.
You only have to do this the first time; once you have Java 1.8 installed, you’re good to go any time after.
java -version
- After determining your version of Java (or pausing to install it), now your goal is to get to your Downloads folder > move into the version of ArchivesSpace you just unzipped > then move yet further into a directory called archivesspace. I did that by typing these three commands in order. You can see all three commands in the screenshot that follows:
cd Downloads
cd archivesspace-v2.8.1
cd archivesspace
- Start ArchivesSpace!
./archivesspace.sh
- Per the instructions, the first thing to do is determine whether you have Java 1.8 installed. This version of Java is referred to as Java 8 despite the fact that its version number is 1.8. Please note that 1.8 is not the current version of Java, and if you're using an older version you will have to update to 1.8; if you're using a later version, you will have to revert back to 1.8. Google instructions on how to revert to 1.8 if necessary.
- Wait a few minutes and then open a browser and go to http://localhost:8080 and you should see the familiar staff interface. The username/password is admin/admin by default. Log in! The first time doing this you will need to create a new repository. The API URL for this local instance is http://localhost:8089
- To shut ASpace down, just close the terminal and AS should shut down automatically.
- To start ASpace up again, repeat Steps 6.2 and 6.3; you don’t need to check for Java every time unless you’re doing this on another computer.
Option 3: Use the ASpace Sandbox
The ASpace sandbox has the API on by default! You can see it here: http://sandbox.archivesspace.org/staff/staff/api/
Use the authentication slides PDF (at the very bottom of the article) to experiment with the AS Sandbox. Just remember that the ASpace sandbox clears data out periodically, so you will have to continue to create records to experiment with. But the stakes are super low! You cannot mess anything up. Go crazy.
Get an API Client and Practice your Endpoints
Download Postman, a Free API Client
There are plenty of API clients out there, this just happens to be the one I use and can make slides for: https://www.postman.com/downloads/
Experiment. A lot.
Use the slides attached to the bottom of this article to authenticate and GET your first record.
Then try the following:
- GET all the major record types
- Resources
- Accessions
- Archival Objects
- Agents
- Locations
- Read the JSON you get back. Look for familiar things.
- Explore arrays
- Compare what you see in the interface to what you see in the JSON
- Show your colleagues what you’re learning
Practice Using and Reading Endpoints
Move on to endpoints you don’t recognize.
- It’s okay if you can’t figure some out.
- Even people who use the API have no idea sometimes.
- Remember that the API documentation reflects the most recent version of ASpace, which might not be the version you’re testing with.
Begin Your Scripting Journey
Acknowledge that it’s a journey. Pack snacks. Give yourself time.
Big Picture Classes and/or Tutorials
Python is not your only option by far, but since it does appear to be the scripting language most commonly used in the AS community, it is the best place to start if you don't already know a scripting language.
No matter the language, you'll have to learn scripting more broadly before you can narrow down to using the AS API.
There are literally dozens of Python 3 courses/tutorials out there. Please note that you should focus on Python 3; Python 2 is depreciated and more, ArchivesSnake requires Python 3.4 or higher, and it is my humble suggestion that we should all (myself included) be aiming for ASnake. Once you start searching for advice online, it will matter which version you’re working in, or else you’ll be following advice for the wrong version.
Any quality Python tutorial is going to have to walk you through the following as prerequisites to setting up a "development environment", and these are essential to know no matter what you plan to do with Python. Treat these like a checklist that you need no matter how you get there. You must:
- Install Python 3 on your operating system of choice
- Pick and install an IDE (Integrated Development Environment). An IDE is to code what Oxygen XML is to XML/EAD: an intelligent text editor that knows what you’re working with and can aid you. I use Visual Studio Code.
- Write and run your first scripts. This means any script, like the basic ones you may learn in a beginners course, or the Python scripts available at the end of this article.
- And finally, you should learn how to install a package manger, libraries, and modules. Here are what you might care about related to the AS API in those categories:
Disclaimer: The following is simply a statement and not a paid or endorsed recommendation. I took Modules 1-4 of the Python for Everybody course by Dr. Charles Severance (Dr. Chuck), School of Information at the University of Michigan and found it useful and enjoyable. I took a paid version through Coursera that featured graded projects, but Dr. Chuck also makes it available for free: this is a link to his 13-hour YouTube video through freeCodeCamp.org. He walks you through Python 3 installation for both Mac and Windows.
You do not need all 13 hours:
This is not the only course out there, but I don’t want to fill this page with recommendations that will overwhelm you. Try that one first (either free or paid) and if you don’t like it, search for “Python 3 for beginners” tutorials. Just make sure that whatever you choose walks you through the four steps above.
Troubleshooting
One of my biggest struggles in learning scripting was that I was always looking for a formal way to troubleshoot things. In my mind, I should know something from the ground-up and that understanding should be how I solved things. I was looking to find “The Class” or “The Tutorial” that would solve all my problems. What I want to tell you is that you will have to troubleshoot your experiences with agility: there isn’t one video, there isn’t one course, there isn’t one help article. You will need to learn to cobble together troubleshooting sources. How do you do that? The secret to most coding is that developers search the internet for their problem and then go read how others solved it. Thus, how to form that keyword search is more important than memorizing books on Python.
One key to searching more efficiently is knowing your variables:
For the rest of your time with Python, the bolded words in the section above are your variables. For example:
- You could be running Python 2 on Mac and using PyCharm as your IDE.
- You could be running Python 3 on Windows and using Atom as your IDE.
- You could be like me, running Python 3 on Ubuntu and using VS Code as your IDE.
- You could be trying to use the Requests library or the JSON module.
Note that, as you start to troubleshoot what you’re doing, as you watch videos, as you search online, and as you communicate with other people, these variables matter. Learn to add search terms to your searches:
- Instead of searching for “how to install python 3” you should add “on Windows”.
- “Add-ons for reading JSON” might be “add-ons for reading json in Atom”.
- “Troubleshooting python versions” might be “troubleshooting python versions in Linux”.
- “parsing JSON with requests in python 3”.
Share it. Cultivate Buy-in
Are there others in your organization with whom you can learn?
- Take classes together.
- Have meetups, virtual or otherwise.
- Inevitably you will teach each other.
Even if you’re solo, share it.
- Demonstrate using the API to your manager.
- Present about it at your next staff meeting.
- Make sure others know that you’re learning.
- Buy-in is important.
Managers:
- Consider funding and time for Python classes for staff, and know that this is going to take awhile
- Advocate for new relationships inside your organization, e.g., having an IT/archivist working group where there was none before.
- Advocate that staff members have admin privileges on their machines, or advocate that IT installs what they need.
- Advocate and/or fund an AS Sandbox server.
Focus your skills
Once you have familiarized yourself with Python more broadly, then you can start to narrow your focus to APIs, the AS API specifically, and ultimately, the end-goal specialization of using ASnake.
Here is where I can recommend more specific avenues:
More videos!!!
- How to install Python packages.
- Here are what you might care about related to the AS API in those categories: libraries (Requests, ArchivesSnake) | modules (JSON, RegEx)
- Python 3 and APIs/the Requests library. Try this one. Keep watching videos with these search terms.
- Python 3 and JSON.
- Once you start working with JSON, you’ll want to set up your IDE for JSON. Just Google/YouTube “how to set up [my IDE name] for JSON”.
- Fun fact: there’s a Python library for MARC21! SURPRISE!! Intro video 1 | video 2.
Archivists + GitHub = Ideas
When I was learning Python, an archivist friend of mine always used to say, “No one opens a blank text editor and just starts writing code.” You probably will actually do that, but her point was: you are now a member of a community where the norm is to share. As such, there is already stuff out there, and I don’t mean “Python stuff,” I mean information professionals working with archival data (just like you) writing in Python 3 (just like you) using the AS API (just like you).
- Search GitHub for the word “archivesspace” and limit to Python: check it out.
- Go look at their code! Really! Most GitHub pages have a license, and most of the time it's Apache-2.0. Check the license, use this site to figure out what it means and reassure yourself, then copy their code or fork their repositories (if you know what that means). That’s why they put it there. I promise you, it’s okay, just give credit back to them. Find them on Twitter, email them, ask, “Hey can I use this?” “What does this line do?”.
- Aim for ArchivesSnake, developed and championed by people inside the AS community (for archivists, by archivists):
- Start with the main page ReadMe.
- Then Getting Started Guide on the Wiki.
- Greg Wiedeman’s GitHub pages for his Intro to ASnake Workshop.
- If you’re really hardcore: the detailed API Docs.
Build Your Non-Python Toolbox
The API and Python alone probably won’t do everything you need. Consider these other seriously useful skills.
SQL
The API isn’t the only way to make bulk changes or see data all at once. Though it brings more risk you can also get data / make changes directly to the tables that are behind AS.
I used to work exclusively with the API; now I do about 50% of my cleanup work in SQL. Why use one or the other?
You probably want the API |
You can probably use SQL in the db |
|
|
Recommended reading/watching (these two sources opened my eyes to these possibilities):
- Mucking around in ArchivesSpace Locally by Maureen Callahan - Blog
- includes a sample request to your IT/hosting provider
- ArchivesSpace Reporting and MySQL by Alicia Detelich - Slides | Recording
Regular Expressions
Need to detect patterns in order to make decisions? Trust me, you want to learn Regular Expressions even if you don’t use Python. Watch tutorials, and I highly recommended https://regex101.com/ once you’re ready.
- Use the Re library in Python to deploy RegEx in your API work.
- You can use RegEx in SQL, too.
- Use Oxygen XML to use RegEx for EAD.
- There’s a learning curve, it's a steep one, it’s worth it.
OpenRefine
Need to undertake massive data cleanup and you’d rather see it on your screen? If you haven’t already heard of OpenRefine, you can totally use OpenRefine for JSON. OpenRefine Tutorial from Library Carpentry and yet more YouTube videos.
- You can even use OpenRefine to reconcile data with other APIs, like loc.gov reconciliation for agents and subjects.
Git
Git is a topic unto itself. You may have likely heard of Git or been directed to a GitHub. If you plan to get serious about scripting, you need to manage version control and Git is a standard way to do that. One day you will care about that, but probably not right now. When you do care, there are dozens of tutorials that you can follow. If Git doesn’t make sense to you at first, that is okay, it is a confusing topic.
Your First Script
The following script was demonstrated in the API Workshop. As written, it will grab all records of a certain type in a single repository and save them to a single JSON file. If you change the endpoint, you can get all of any record type, such as accessions, top containers, agents, etc.
This script uses the Requests library and does not use ArchivesSnake.
By following the advice above you should learn how to run and edit a simple Python script. Once you are ready to get started with the API, try this as your first script.
# Save this scipt as name_of_file.py in order to run it as Python3 in your local environment.
# Disclaimer: This script is being provided as an example and may need to be modified for local use.
# Modifying and using this script is the responsibility of the individual using it.
# Do not test against Production!
# Authentication based on a script by ehanson8 (https://github.com/MITLibraries/archivesspace-api-python-scripts)
# Significant aspects:
# - How to set variables
# - How to get all ids of a single record type
# - How to check your status code
# - How to save an output to a local file
#Import the following libraries
import csv
import json
import requests
#Set your authentication info, baseurl, and repository info (if relevant)
baseURL = 'http://localhost:8089' #<-- Enter your real API URL between the ''
user = 'admin' #<-- Enter your real username between the ''
password = 'secure_password' #<-- Enter your real password between the ''
repository = '101'
#Authorize and store your session key in your header
auth = requests.post(baseURL + '/users/' + user + '/login?password=' + password).json()
session = auth['session']
headers = {'X-ArchivesSpace-Session': session, 'Content_Type': 'application/json'}
print('Your session key is: ' + session)
#Optional way to create your endpoint with variable below
record_type = 'resources'
endpoint = '/repositories/' + repository + '/' + record_type + '?all_ids=true'
#Note that the above endpoint includes repository, hence, it will not work for non-repo endpoints
#Note also that this endpoint is only gathering the record IDs, not the records themselves
#Those ids are later stored using the 'ids' variable below
print('This endpoint will gather every id for ' + record_type + ': ' + endpoint)
test_endpoint = requests.get(baseURL + endpoint, headers=headers) #Here we begin to test your endpoint, this is good to know
if test_endpoint.status_code !=200: #If the status code is NOT 200, your GET above did not work and the script stops
print('That did not work. Do you have the correct endpoint? --> ' + endpoint)
quit()
else: #If the status code IS 200, your GET above did work and the script continues
ids = requests.get(baseURL + endpoint, headers=headers).json()
print('There are ' + str(len(ids)) + ' records of the requested type in this repo')
print('Their ids are ' + str(ids))
records = [] #Open a empty list
for id in ids: #For each id in the id list....
endpoint = '/repositories/' + repository + '/' + record_type + '/' + str(id) #Create an endpoint per id...
output = requests.get(baseURL + endpoint, headers=headers).json() #GET that record...
records.append(output) #Append that record to the empty list
filename = record_type + '.json' #Set the filename
f = open(filename, 'w') #Create a new file
json.dump(records, f) #Encode the records list as JSON and save to that file
f.close() #Close the file
print('The JSON for all ' + str(len(ids)) + ' records has been written to a file named ' + f.name)
Your First API Project
You may already have a good idea of what you want to do with the API, or you have no idea. Either way, I have some general advice for your first API project. It mostly comes down to taking small familiar steps before your big leap into scripting.
- Pick a record type you wish you could change or create. Even if you have to change 1,000 records, just pick one. This will be a template for your project and a proof of concept for your approach.
-
- If you are changing a record type, use Postman to GET an existing record out and save it somewhere. Then make your changes through the staff interface and GET it again, now with the changes. These are your templates.
- If you are creating something from scratch, create it in the staff interface first, then GET it out using Postman. This is your template.
-
- Depending on what you're doing, attempt to POST a new or altered record back through Postman to test your template. Be careful on proceeding before you have confirmed that what you want to do is possible. This is also a good time to experiment with what fields are strictly required. If you're working with existing data, POST everything back or risk losing some fields; if you're working with new data, start with the record requirements and aim for the fewest fields possible.
- Focus intensely on the exact steps you took to create or alter your data. Even down to "I clicked this field" or "I used my human brain to conform this field to what I wanted it to look like."
- Go step-by-step through what you did and design your script to match those discrete steps. Don’t go “whole picture” on yourself, break down the point of your project into these steps, and focus on each step one at a time.
- For example, for cleanup:
- “I navigated the interface until I saw a record that needed to be cleaned up.” = How did you know which record needed to be cleaned up? How can your script make the same decision?
- "I clicked Edit" = GET the record out through the API.
- "I made the ten changes" = break that down further. What were they, exactly? Does the order matter? Can your script make each change to each record instead of ten changes to one record? If you need to make one change 100 times, make it 100 times, then go back and make the second change 100 more times. That’s not how a human would do it, but it is how to script.
- "Okay, I specifically changed the date form. That was my first change." = Great, how did you change it? How did you know?
- “Well my brain saw a hyphen and I knew that meant a date range. My brain knew that the value on the left side of the hyphen was the Begin date and the right side was an End date.” = How do you teach your script to see that same pattern? (Pro-tip: learn Regular expressions).
- "Okay, I specifically changed the date form. That was my first change." = Great, how did you change it? How did you know?
- For example, for cleanup:
- Repeat. Focus on doing one thing at a time, not everything at once. Everything is a victory.