Welcome - Troy A. Griffitts
...or, Inter-institutional Collaboration
A standard which enables a user to say anything… is not a collaboration solution.
I've recently heard more than once that the solution to cooperation between organizations who work on overlapping datasets is to use RDF, or more generally that the "Semantic Web" and "linked data" are the solutions to all standardization of data. Don't get me wrong. I don't hate RDF or the ideas of the Semantic Web and linked data. Having said that, they don't solve the problems of inter-institutional collaboration.
RDF essentially prescribes a Subject, Predicate, Direct Object syntax. Basically, that’s the grammar of a two year old. I don’t say that to minimize RDF; I say that to maximize it. RDF has done nothing to standardize anything, anymore than XML has done to standardize data representation. Consider this: If I have to learn all the RDF Predicates a data repository might use (note: RDF Predicates are not really English language predicates because RDF Predicates exclude the direct object and often imply a simple ‘is-a’ verb) and figure out what exactly an author means by those RDF Predicates, then how is that any different than the need to read and understand a set of XML Schema definitions for objects I might find in an XML repository? Let me use a different adjective: how is that any better?
<http://example.org/#spiderman> <http://www.perceive.net/schemas/relationship/enemyOf> <http://example.org/#green-goblin> . <http://example.org/#uncle-ben> <http://www.perceive.net/schemas/relationship/guardianOf> <http://example.org/#spiderman> .
<character id=”spiderman”> <enemyOf>green-goblin</enemyOf> <guardian>uncle-ben</guardian> </character>
Sure, you’ve atomized data into bite-sized chunks and can now build very simple data browsing tools based on these granular chunks of data. I can look up Spiderman and see all known predicates assigned to Spiderman and if those predicates mostly make human sense to me without forcing me to go read their definition, then I can click on one of their direct objects and learn all the predicates related to that object, ad infinitum. Cool right? Sure. Is it more useful for programming a valuable user experience than an XML Schema? Probably not so much.
Don’t miss the problem. The hard part of data interoperability is political agreement. It’s not a technical problem that hasn’t been solved in many ways. Telling people, hey, here’s a standard which allows you to create noun/predicate/direct object syntax and here’s a browser and search tool for that syntax, and thinking that you’ve solved any real problem without getting people to agree on exactly what RDF Predicates to use and what exactly they mean, and forcing them to adopt a unique object identification system and to agree on exactly what each of those object identifiers specify, but then calling it an interoperability standard is worse than TEI. I mean, I love TEI, but no one uses TEI in any standards-compliant way which allows real useful interoperability because TEI allows a user the freedom to build their own schema from many different modules of tags, and-- I can guess because I’ve been there before-- the reason the tags are not more strictly defined is because participating members of the standards team had heartfelt and different usages in mind for how they would apply those tags. The end result is that basically there is no interoperability. There is familiarity, but that’s not the same thing. The hard work EpiDoc has done to wrangle political agreement on a subset of the TEI and strict usage definitions shared between organizations, that is work toward inter-institutional collaboration. Back to RDF and the Semantic Web, sure, work has been done to define very large "vocabularies" and "ontologies" (if you find an authoritative source with a clear definition of the difference between "ontology" and "vocabulary", send me an email) for specialized domains... This is parallel to the work of specializing XML into the TEI and is a step toward interoperability, but are we in the same place as TEI, with different organizations opting to use different subsets of these ontologies? Are the common elements which institutions might share used in the same way? Do the same noun instances common in these organizations use the same unique key? In short, do we have a large body of data available in one of these ontologies (besides the ontologies themselves)? Do we have TWO large bodies of data from TWO different institutions available in the same ontology, both using all the terms in exactly the same way (parallel: TEI -> EpiDoc) and identifying proper nouns with exactly the same keys? Do we have any specialized (= useful) software systems developed based on this ontology which work with BOTH datasets? These are the hard parts-- the time consuming parts-- of inter-institutional collaboration, and they are not strictly technical in nature.
Yeah, so what exactly am I saying? I am saying that once you adopt a unique naming scheme for objects and have multiple institutions agree on that naming scheme and what exactly those objects mean; and once you specifically define predicates which can be used with those object types (e.g., adjectives and relationships) and get more than one institution to agree to actually adopt and implement that schema internally, and finally convince them to make those resources available to other institutions, then you’ve developed a useful standard for interoperability. And then that standard can be described in RDF or XML Schema or a number of other ways. Saying that you’ve adopted RDF is like saying you’ve adopted XML or JSON. Are RDF, XML, and JSON all standards? Sure. Does simply adopting RDF, XML, or JSON mean that you are interoperable? It doesn’t even mean that you’ve begun.
Let me preface this rant with a declaration:
Web services are great.
... but, I won’t beat around the bush. I hate REST. REST is the most asinine attempt to ignorantly hand-jam arrogant assumptions about my desires as a programmer into the most unsuited mechanism possible: a transport protocol.
Let’s start with HTTP. No matter what the authors’ intent, today HTTP is simply not much more than a high level transport protocol (I use “authors’ intent” because no matter what anyone tells you, HTTP as we know it today wasn’t conceived and formulated into specification by Tim Berners-Lee. His team’s initial specification was sane and only included one verb: GET). HTTP effectively is a client/server request/response mechanism perfectly suited for a browser to request a resource from a server. PERIOD. And to all those who would say, “but no, it’s so much more.” I would say, you’re full of crap. It’s not. Stop. No, really... Stop. 99.9999% of all HTTP traffic (I’d tack 10 more ‘9’s to the end of that number to be more accurate, but you get my point) is simply browser requests for a resource / server responses with the resource. And that’s OK. So just stop. HTTP does a good job assuring a persistent URI reference to, and means to retrieve, a web resource; it allows me to assume a resource will remain at a specific location such that I can reference that resource from another resource, and now we have... The World Wide Web! I love HTTP. I would even say I love both GET and POST. What I don’t love is REST trying to cram my entire programming model into 9 verbs. Now don’t get me wrong, I think adding other verbs (besides GET) to maintain resources on the web is fine-- static HTML pages, images, XSD specs, even iCal through WebDAV is almost coherent. But trying to expose a public RPC interface for an entire complex software system architecture via 9 verbs is the closest thing to insane I can imagine. It’s like trying to hack a programming paradigm out of the TCP packet header control bits! If URG is high then my calendar appointment is very important. If ACK is low, then I’m denying your friend request. It’s INSANE!
Here’s the deal. We’re all complex programmers. We dream up a lot of crazy crap. Don’t try to shove our creativity into your world of CRUD (Create, Read, Update, Delete). We’re not simply storing stuff on the web; no, we are doing stuff.
I mentioned before that GET and POST have their merits. GET provides me with a nice, human-readable URL which can be used to repeat a simple request:
Great. I know I’m referencing arbitrarySubcontext on server someserver.com and sending 2 parameters. Wonderful.
What if the information in the parameters is big, and useless for human viewing-- like a uuencoded image, or the entire body of an email message. Would I want that as part of the address in human sight? Would I likely want to repeat that exact same request and receive the exact same response? No, probably not; hence, POST.
But what about the other 7 HTTP verbs?
HEAD, PUT, DELETE, TRACE, OPTIONS, CONNECT, PATCH
If you can accurately describe more than 2 of the above HTTP verbs, then you probably endorse REST. Enough said.
Sure, we all get DELETE. Some of us get PUT, but I’d bet hard cash you won’t get 2 identical answers when you ask us the difference between PUT and POST. HEAD, TRACE, OPTIONS, CONNECT? Transport verbs for special cases or querying metadata! PATCH is a nice nerdy Computer Science addition to U in CRUD.
So, basically we’re left with GET, PUT, POST, and DELETE to map my entire complex system software architecture into. Back to my earlier rant: we don’t simply STORE THINGS... we DO THINGS! My nouns don’t just exist! They breath and join and search and jump and fart! They DO THINGS!
The world of Computer Science has seen many leaps in design. We went from Procedural Programming, to Object Oriented Programming, to Functional Programming (we kindof skipped Aspect Oriented Programming) and I’m sure in between I’m missing some favorites which you’ll all tell me about. But hear me clearly. RPC has been around and seen umpteen iterations, and with the exception of SOAP, REST is the absolute worst incarnation! Summarily, REST is:
RPC + Object Oriented Programming stripped down to only allow classes 4 methods (GET, PUT, POST, DELETE).
Imagine if I asked you to design a complex system using an Object Oriented design pattern restricting your objects to 4 methods.
Now, here’s the sad thing. REST advocates think REST is cool!
They think they’ve discovered some hidden knowledge the rest of us (no pun intended) haven’t figured out! And they go around jacking with apache config files and playing in their little console windows running curl, hand crafting HTTP headers and they think it’s cool! I hear things like: The Web was originally built to handle all of this, but no one ever uses it! No frickin’ duh. No one ever uses it because it sucks!
Look, web services in moderation (that’s another rant for a later time) are cool. Noun/Verb URL semantics are a useful convention. But please, please, just please stop trying to derive some closest-to-CRUD meaning from each method in my complex software system and pigeon hole that into the closest representation of GET, PUT, POST, or DELETE! It adds nothing to the usefulness of the implementation and only takes away functionality.
“How does it take away functionality?” you might ask. Well, let me tell ya 2 ways. Just about all dynamic web content programming systems easily support POST and GET and most support retrieving provided parameters in the same way for either:
PHP - $_REQUEST['param1'] JSP - request.getParameter("param1"); ASP - OK, so ASP is stupid: Request.Form("param1"), or Request.QueryString("param1") .Form for POST? Really? What if my <form method="get">? Stupid ASP.
All browsers support GET. duh.
What does this mean? Well, it means that I can easily program all my web services to accept both GET and POST (normally without changing any of my server side code, to seamlessly accept either), and can test it simply with my browser… if I use a sane noun/verb URL convention like:
http://someserver.com/api/widget/get?widgetID=1&format=json http://someserver.com/api/widget/get?category=jigs&detail=headers&format=xml http://someserver.com/api/widget/recall?widgetID=1&urgency=high http://someserver.com/api/widget/sendpromotion?widgetID=1&minPurchaseDate=20151231&promoWidgetID=2 http://someserver.com/api/widget/delete?widgetID=1
I can easily handle these without any web server tweaks, and using my server side environment de jour easily with, e.g.,
webapp/ROOT/api/widgets/get/index.jsp webapp/ROOT/api/widgets/recall/index.jsp webapp/ROOT/api/widgets/sendpromotion/index.jsp webapp/ROOT/api/widgets/delete/index.jsp
I can add any number of verbs (methods) to my nouns (classes) because they are simply the last segment of my URL path, by convention-- the penultimate segment being the noun-- all providing a relatively straightforward class/method approach to exposing as web services most software systems. It's easy to understand. Easy to discover and browse and poke around by a potential user by simply turning on folder listings. Adding "usage" output for endpoints when no parameters are passed can self document the API. I can test these all out with my web browser. I can demo these to would-be users of my API in a browser and give them examples for a method call simply by giving them example URLs. Easy peasy-- except I hate peas. Mushy peas are the worst. They remind me of baby food. Silly Brits. You know they try to eat non-mushy peas balanced on the backs of their forks! That’s almost as insane as REST! Almost.
Now, imagine the above example as a REST implementation:
http://someserver.com/api/widgets/widget/1 HTTP Type: GET; Header: Accept: application/json http://someserver.com/api/widgets/ HTTP Type: GET; Header: Accept: text/xml; Content-Type: application/x-www-form-urlencoded Body: category=jigs&detail=header recall ? send promotion ? http://someserver.com/api/widgets/widget/1 HTTP Type: DELETE
The first URL is not obviously a search call and probably ambiguous if I wanted to simply list all widgets-- hack. Clients wishing to designate the response format must construct an HTTP Header. Delete requires the complexity of handling an HTTP type other than the commonly supported GET or POST. The recall and sendpromotion methods do something other than simple CRUD on my object and since REST is built around the assumption that I wish to merely perform persistence (Create, Read, Update, Delete) and that actions on my nouns will do nothing more, then if I do have special server-side code to do something more, the client certainly wouldn’t expect it! Every one of these methods requires jacking around with my web server to handle URL rewriting, so I can basically convert the url-part ‘widget/1’ into (what it should be) a parameter, ‘widgetID=1’, and then choosing a different service handler based on the HTTP method. None of this can be tested or demoed from a browser and all require you to drop to a shell and construct by hand HTTP Headers and Bodies and to become intimate with cryptic curl arguments.
Why? Well because the Web was made to handle all this! What a bunch of hog crap.
Look, it takes a sane, unenlightened human being about 2 minutes to come up with a useful OO-to-web-services mapping:
Sure, it doesn’t handle stateful objects, but neither does REST “because the web should be stateless”-- yeah, go tell that to jsessionid, aspsessionid, and phpsessid... and I bet a few bright minds and a few more minutes could come up with some sort of UUID-per-object-instance-reference-stored-in-a-session-map convention to handle stateful in-session objects. It’s not rocket science. Stop trying to make it rocket science!
Web services are great, but please, let REST RIP.
New November Release
Lots of new goodies in this bundle refresh. Some highlights:
- support for per document versification system selection allowing documents with varying versifications to live within the same VMR CRE instance,
- improved editing experience in the Online Transcription Editor gadget,
- new example versifications: Homer and Ladder of Divine Ascent,
- transcription copyright text pushed to system configuration file,
- helpful backup and mirror scripts included as examples for your system,
- better drag and drop experience in image management,
- updated to latest jquery-ui,
- added ability to see history of published transcriptions,
- transcription import tool greatly improved,
- and, of course, the set of usual bug fixes.
All three OS bundles have been refreshed with this latest version and you can find them on the Download Page.
2016-09-30 VMR CRE
Multiverse collation, a new CSV alignment table download option, and a ton of menu cleanups are available from the Edition Display page. We've also added a nice hover-over option for the verse numbers in the transcription display which will show a marker exactly where in the line that verse begins. You'll find a new "Digital Critical Edition Project" site template to help you get new projects up and running quickly. The tour on the front page has been expanded to walk you through more features, if you're new to the VMR CRE platform. All three OS bundles have been refreshed with this latest version and you can find them on the Download Page.