Generative ai and assessment design

Preliminary guidance for turning principles into practice in higher education

PRODUCED BY

Dr Carmen-Elena Dorobat

Andrew Larner

Jack Sutherst

Professor Sarah Underwood


FUNDED BY





Centre for Learning Enhancement and Educational Development

The Higher Education sector's current guidelines on AI emphasize the ​importance of staff and students developing AI literacy while ​upholding academic integrity. Educators must distinguish between ​genuine student work and AI-generated content. Practical guidance on ​assessment design is lacking, especially concerning ethical education ​about AI use, but alignment between sector standards, institutional ​policies, and student expectations is crucial.

The guidance outlines a structured strategy to implement AI principles in assessments, promoting cooperation with AI tools and preventing collusion. Present challenges include detecting AI collusion and balancing academic integrity with real-world demands. The goal is to foster AI collaboration in assessments to prepare students for future workplaces while upholding academic credibility and aligning with AI advancements.

Zone of desired AI use

The deliberate structuring of assessments that renders the use of Generative AI impractical or impossible.


For example, unseen exams or professional discussions, where the immediate application of AI tools for generating or accessing information is restricted or irrelevant. 

The surreptitious use of Generative AI tools to produce academic work without transparently acknowledging the AI's involvement.  AI is employed as a substitute for genuine learning efforts. 


This constitutes a form of academic dishonesty and undermines the principles of integrity, fairness, and originality in scholarly pursuits.

No AI

AI Collaboration

AI Exclusion

AI Collusion

All AI

A partnership between students and Generative AI tools to enhance the creative and cognitive processes involved in academic work. AI may be used to aid in idea generation, research, and refinement of projects while maintaining the students' responsibility for critical decision-making and intellectual contributions.


AI is a supplementary resource within the bounds of academic integrity, and its use can be confidently detected by staff.

ASSESSMENTS WITH AI IN 3 EASY STEPS

AI USER LEVELS

How easily can your students collude with AI?

Assessment parameters

What makes assessments AI vulnerable or AI resilient?

MULTIDIMENSIONAL ASSESSMENT

Combine parameters to foster AI collaboration

STEP 1: AI USER LEVELS

Different assessment designs have a direct impact on how students utilise Generative AI tools. However, this may not necessarily correlate with the complexity of the assessment or the level of individual reflection required in the submission. To gain a comprehensive understanding of how assessments can be vulnerable to AI collusion, a more nuanced perspective is necessary.


A first step in this guidance, we introduced a three-tiered classification system for categorising assessments based on the ease with which students can use AI tools to generate the required assessment outputs.


This classification aligns with current HE guidelines promoting AI literacy and academic integrity while adding depth to what constitutes ethical and appropriate AI usage. The categorisation emerged from an 'AI stress test' carried out on assessments within a business management curriculum (for more details, please refer to the FAQs).

USER LEVEL 1: Text generator

No or negligible editing, little or no learning necessary

USER LEVEL 2: Text editor

Significant editing, including some peripheral learning necessary 

USER LEVEL 3: Concept curator

Finding sources and explaining concepts, some text generation possible but not suitable without significant user curation (core learning required)

Bracket Hand Drawing
Bracket Hand Drawing

Considered AI-Collusion

Considered AI-Collaboration

We view assessments prone to collusion as tasks that can be inputted directly into generative AI, allowing students at user level 1 to create submissions with minimal editing or, at user level 2, with slight editing and peripheral learning. Assessments prone to collaboration are tasks structured to necessitate students to use AI solely as a 'concept curator', emphasising the need for fundamental learning to occur.

STEP 2: ASSESSMENT PARAMETERS

The development of this guidance began by categorising assessments into 9 different parameters. These parameters are not only common in business management assessments but also form the foundation of higher education assessments in various disciplines (refer to Appendix 1 for more information).

In our analysis of the correlation between these parameters and the achievable AI user levels in the stress test, we have observed that format and foundation play a crucial role in enhancing AI user levels, closely followed by size. Practical assessments (those that apply knowledge to specific problems, oral assessments, and artefacts) promote AI collaboration at Level 3, even though they do not entirely eliminate AI use. These types of assessments also tend to be larger in size, i.e. requiring greater student effort.

Context, criticality and reflectivity, continuity, and groupwork moderately influence AI user levels. Assessments integrated into specific contexts, reflective assessments, whether in group work or as part of a series of modules with prerequisites, allow students to collaborate with AI at levels 2 or 1 when a practical or oral component is not required.


Assessment structure or turnaround time has a minimal impact on increasing AI user levels. This means students can still collaborate with AI, even with assessments lasting only a few days or hours, if they lack practicality, reflectivity, or contextual embedding, for example. Lastly, a detailed assessment structure might inadvertently facilitate collusion if it unintentionally serves as a comprehensive list of prompts for a generative AI tool.

More Info Button with Arrow

Relative significance

FORMAT

(written – artifact – oral)

FOUNDATION

(theoretical – mixed – practical)

SIZE

(small – medium – large)

CONTEXT

(broad – focused – embedded)

CRITICALITY & REFLECTIVITY

(analytical – evaluative – reflective)

CONTINUITY

(yes or no)

GROUPWORK

(yes or no)

STRUCTURE

(open – scaffolded – closed)

TURNAROUND

(hours – days – months)

STEP 3: MULTIDIMENSIONAL ASSESSMENT

However, while format and foundation are essential, the key lies in the combined impact of individual parameters. Merely transforming a standard essay into a reflective one will not enhance AI user levels or foster authentic learning.


This approach involves crafting assessments as multifaceted challenges, blending various cognitive processes and skills. Each assessment component should demand students to exhibit expertise across multiple knowledge dimensions and applications. By employing these combinations, educators can determine the extent of genuine learning and allocate suitable roles for AI tools to enhance literacy while upholding academic integrity.


Multidimensional assessments involve creating tasks that engage a range of distinct skills, address integrated learning objectives, and reflect real-world scenarios. We provide below two examples on how to (re)design assessments following these findings. You can also watch a more detailed explanation from our AI partner Thomas from Synthesia and refer to Appendix 2 for additional suggestions.



To eliminate and detect collusion, and to promote collaboration, it is the combination and leveraging of three or more assessment parameters that has more significance.

Read more button icon

How To

Redesign

Assessments for AI Collaboration

Have a look at our examples

EXAMPLE 1: A marketing and sales report

Original Assessment

A report will explain how to use marketing and sales principles and strategies to target your selected target market. Analyse opportunities in relation to segmentation, branding and communications, and identify ways to market considering sales strategy, distribution, implementation, measurement and control using relevant market research tools.

New Assessment

Use a product or service your peer has designed in class. In pairs you will present a 10-minute briefing of your product or service articulating the principles and strategies to target your selected target market. Using your partners briefing, produce a 1500-word report. Your report should contain a 500-word action plan using SMART objectives for the business.

Using your partners action plan write a 500 word reflection on its content. You should reflect on which objectives you think will be valuable to your business and why and which objectives you would change and why.


Changes to the Assessment

Impact

Transition from working on a generic product or service to refining a product or service designed by a peer in class.

Adds collaboration (groupwork) and contextual relevance (context)

Move from an individual report to a group presentation and an action plan.

Diversifies the assessment format and foundation, thereby increasing student effort (size) and embedding the assessment deeper into context.

Include a reflective component on the earlier content generated by a peer.

Further embeds the assessment in an experience (context), elevates criticality and reflection, and adds further application to the expected output (foundation).

From 2 parameters in the AI collaboration zone to 6 parameters.

The assessment becomes multidimensional, evaluating a blend of skills and knowledge within a contextually-embedded framework.

EXAMPLE 1 EXPLAINED

Before

Brushstroke Arrow Rapid Curved Long

After

EXAMPLE 2: A strategic plan

Original assessment

Create a strategic plan of 2,500 words based on a company of choice or a case study. This plan is designed for your line manager and should include an explanation and analysis of the underlying factors and theories. Your tutor will provide a detailed structure for the plan to assist in writing.

New assessment

Write your own case study for a public company in the UK North West. Use data from a company database like Fame or Mintel. Create a strategy plan for this enterprise, then justify the plan to your line manager get their feedback and redo the plan. Include case study (500 words), both plans (1,500 words), the plan justification (500 words), and a reflection on how you implemented the manager’s feedback (500 words)

Changes to the Assessment

Impact

Change from a generic case study or company to developing their own case study

Deepens the context of the assessment and compels criticality for content curation

Move focus on a specific company within a narrower geographical area.

Enhancing the assessment’s context further.

Require the use of data from databases behind paywalls accessible only through the institutional library.

Promotes research skills independent of AI tools and deepens the foundation on more application of theory.

Change the output to one that evolves iteratively, with students asking for feedback and then revising the strategic plan.

Shifts format and foundation of the assessment, from a theoretical written output to an applied artefact with a formative oral component.

Incorporate a plan justification and reflective component.

Adds a critical and reflective layer to the assessment

From 1 parameter in the AI-collaboration zone to 4 parameters in the collaboration zone

Encourages students to apply creativity and personal insights, and focus on authentic, workplace-relevant skills that will set them apart, enhancing their learning experience.

EXAMPLE 2: EXPLAINED

Before

Brushstroke Arrow Rapid Curved Long

After

TOOLKIT FOR ACADEMIC EXPLORATION

This toolkit comprises of 3 activities that academics may undertake to further their understanding of AI use in assessments and to test their own assessments AI vulnerabilities. They are ideal for a workshop setting, where exploration of these ideas can be discussed with colleagues.

Electronic Digital Stopwatch. Timer 4 Seconds Isolated on Gray Background.Stopwatch Icon Set. Timer Icon. Time Check. Seconds Timer, Seconds Counter. Timing Device. Two Options.

#1

SPOTting AI USE

This task highlights the challenge of distinguishing between AI-generated content and authentic student work, especially when requiring educators to provide evidence for academic misconduct investigations.


Task Summary:


  • In groups, carefully assess four submitted assignments along with the attached Turnitin reports. You are providing the task as well as the marking scheme.
  • You can find these here
  • Focus on identifying instances where students have collaborated with artificial intelligence (AI) tools, such as text generators or text editors.
  • If you can form an opinion based on the evidence, please complete an academic misconduct form to document your findings - also available in the folder above.
  • Remember to approach this task objectively and adhere to academic integrity guidelines to maintain the academic rigor and fairness of the evaluation process.
Electronic Digital Stopwatch. Timer 9 Seconds Isolated on Gray Background. Stopwatch Icon Set. Timer Icon. Time Check. Seconds Timer, Seconds Counter. Timing Device. Two Options.

#2

YOUR AI SCORE

This is a valuable resource for educators to improve the assessment design process.


Task summary:


  • We have devised an AI score calculator based on our user level nomenclature and our assessment parameters.
  • You can take the online quiz here or scan the QR code. You will map your assessment across the nine parameters and obtain an AI Score, which reflects the vulnerability level of your assessment.
Electronic Digital Stopwatch. Timer 11 Seconds Isolated on Gray Background. Stopwatch Icon Set. Timer Icon. Time Check. Seconds Timer, Seconds Counter. Timing Device. Two Options

#3

USING AI TOOLS

This task will familiarise you with a generative AI tool, and demonstrate how versatile and powerful it can be, how it can be used and its potential drawbacks.


Task summary:


  • Access Bing Copilot
  • We recommend you sign in using your institutional credentials. If your institution is a subscriber, this will grant you full access to the platform.
  • Use the three assessment tasks provided below. These are generic tasks used across undergraduate business degrees.
  • Begin by copying and pasting the tasks into Copilot.
  • Use its assistance to formulate your answers.
  • Then prompt Copilot for further insights.

You can create a fourth task to vary the difficulty level.


01

You work as an entry-level strategist in a large corporation. Write a 300-word blog about the challenges and opportunities that arise from such a job.

02

You have undertaken a project with a local small company as a consultant for their new product launch. Write a 300-word reflective essay discussing the challenges and opportunities you encountered along the way.

03

You have your own business selling goat’s milk cosmetic products in the Greater Manchester area. Write a 300-word pitch to Dragon’s Den to help you get some investment or some mentoring from them and reflect on how the pitch went.

AI STRESS TEST

The research team devised an AI stress test for assessments from theoretical and experiential modules from our business school curriculum.


A Digital Education Specialist had two hours allocated to separate the assessment task into prompts (inputs) and generate the required assessment submission (output), e.g. essay, report, presentation, reflection, video recording, poster, artefact, professional dialogue, blog, online seen exam, etc.



See in detail how we conducted our stress test below


FAQs

  • Why is AI collusion problematic?

In such assessments, the remaining student contribution is insufficient to meet the learning outcomes for higher education, which makes them not only a form of academic malpractice, but academically undesirable. What we teach, practice and assess in our modules should go beyond these lower levels of engagement both with learning and with AI.


  • Why is AI exclusion problematic?

While sound from an academic integrity point of view, such assessments are operationally difficult or time-intensive to implement and are not reflective of real-world expectations in many disciplines. Assessments that preclude the use of AI also stifle the development of AI literacy, and thus of the critical skills our students will need in the workplace.


  • Why is AI collaboration desired?

These assessments are designed to both encourage AI use but more crucially to allow staff to detect whether genuine core learning has been employed alongside the use of AI as a tool. These assessments represent the authentic assessments of the future, where assessment tasks preserve the integrity of university degrees, whilst simultaneously preparing students for the future workplace.


  • What did the stress test show?

The stress tests showed that around 45% of the analysed assessments were encouraging AI collaboration at an AI user level 3, while the remainder were collusion-prone, with 20% at an AI user level 1, and 35% at an AI user level 2. None of the tested assessments, which included oral presentations and reflections, excluded AI use.


  • Why multidimensional assessments?

Balancing academic integrity with proficiency in AI can set students apart in a rapidly advancing technological landscape. To achieve this, it is essential to create assessment tasks that engage a range of distinct skills and address integrated learning objectives. Adjusting specific criteria methodically allows for the incorporation of AI in assessments while maintaining academic integrity and assessment credibility.




LEARN MORE ABOUT MULTIDIMENSIONAL ASSESSMENT


Multifaceted or multi-dimensional assessments are assessments in which students are expected to utilise various aspects of their learning to demonstrate their comprehensive understanding, accumulated knowledge, and ability to integrate and apply what they have learned.


For example, a project-based assessment requires students to research a topic, develop a solution to a problem, and then present their findings to the class. Or a group presentation followed by individual reflection papers, allowing students to demonstrate their collaborative skills as well as their individual understanding of the topic. Alternatively, a case study could be used as a basis for the assessment where students work in teams to analyse a real-world scenario, develop recommendations, and present their findings to a panel of experts.


Some further ideas:


  • Case-based group projects with individual reflections: Students work in groups to analyse and solve a complex case study, applying concepts and theories. After completing the group project, each student submits an individual reflection discussing their contributions to the project, lessons learned, and personal insights gained.


  • Online quizzes with follow-up discussion forums: Educators administer online quizzes to assess students' understanding of module material. After completing the quizzes, students participate in follow-up discussion forums where they can ask questions, discuss challenging concepts, and engage in peer-to-peer learning.


  • Simulation-based assessments with written reflections: Students participate in a simulation exercise, such as a business simulation or a virtual laboratory experiment, to apply theoretical concepts in a simulated real-world scenario. Following the simulation, students write reflective essays.




  • Peer review of creative projects with educator feedback: Students create creative projects, such as multimedia presentations, artwork, or digital portfolios, to demonstrate their understanding of module concepts. After submitting their projects, students engage in peer review activities where they provide feedback on their peers' work. Additionally, instructors provide feedback and assessment based on established criteria.


  • Performance-based assessments with self-assessment and reiteration: Students participate in performance-based assessments, such as oral presentations, debates, or live demonstrations. Following their performances, students engage in self-assessment activities where they evaluate their own performance, identify areas for improvement, and reiterate the activity either in writing or orally.


  • Inquiry-based research projects with poster presentations: Students conduct inquiry-based research projects on topics of interest within the module topics. After completing their research, students create poster presentations summarising their findings and present their posters at a module symposium or exhibition, where they engage with peers and educators to discuss their research.



LEARN MORE ABOUT ASSESSMENT PARAMETERS


1. Size (small – medium – large).

Size is defined as the size of assessments in terms of student effort hours (applicable across all assessment types). Examples:

  • Small assignments: 15-20 hours (equivalent 1,000 – 2,000 words)
  • Medium assignment: 30-40 hours (equivalent to 2,500 – 3,500 words)
  • Large assignments: 60-70 hours (equivalent to 4,000 – 5,000 words)


2. Turnaround (hours – days – months).

Turnaround refers to the time available to students between receiving the assessment briefs and the deadline for their submission, and during this time students have access to either generative AI directly, or access to materials prepared with generative AI. Examples:

  • Hours: quizzes, MCQs, open-book exams, pre-seen online exams, ad-hoc presentations, practical tasks, where students have anywhere between 3h to 24h – 48h.
  • Days or weeks: live client briefings, pre-seen online exams, practical assessments, blogs, concept-maps, field reports, mini-practical, learning logs, diaries, simulations (up to 4 weeks)
  • Months: essays, reports, portfolios, presentations, research projects, group projects, journal articles, annotated bibliographies, blog series, dissertations, reflective essays, practical reports, live-client consultancy reports.


3. Structure (open – scaffolded - closed).

Structure refers to whether the assessment tasks allow students to structure their submissions freely, whether it provides a scaffold but some flexibility, or whether the structure is closed and must be followed precisely. Examples:

  • Open: assignment tasks where tutors only specify the broad topic/area of investigation, and allow students to create their own structure (e.g. no structure instructions are provided, no specific headings or templates are provided).
  • Scaffolded: assignment tasks where tutors recommend a structure (e.g. ‘a good assessment would/must include the following headings’, ‘your assignment should /must contain the following elements..’, ‘consider the following questions’) but the content remains flexible
  • Closed: assignment tasks where tutors stipulate and expect a structure (e.g. ‘your assignment should include the following headings’, ‘you must use the enclosed template/pro-forma’) and the expected content is underpinned by specific requirements (specific data or business information, specific questions that must be answered)



4. Context (broad – focused - embedded).

Context refers to whether the assessment task is related to any specific context such an actual company, a student’s workplace or placement, a specific situation or training opportunity they were asked to participate in, a specific live client brief, etc. Examples:

  • Broad: The assignment stipulates no context or allows students to choose the context (e.g. ‘discuss a theory or apply a model with examples’, ‘compare these viewpoints using examples’)
  • Focused: The assignment stipulates a case study, company, or specific example to focus on, but the information is accessible online (e.g. companies such as Google, Apple, etc; industries such as banking, pharmaceuticals; country analyses such as ‘doing business in Peru’, etc.)
  • Embedded: The assignment is based on a local small company (with little online information available), a specific sources case study (e.g. previous consultancy projects), the company where the students have completed an internship or placement, or a live-client brief designed specifically for the unit and with a task to address a real client’s needs.


5. Format (written – artefact – oral).

Format refers to the type of communication in which assessment is required to take place, whether in spoken form, in writing, or in an artefact such as video presentation, infographic, performance, etc. Examples:

  • Written: test or piece of work which involves writing rather than doing something practical or giving spoken answers
  • Artefact: object, poster, or a piece of work which involves creating, building, or doing something practical rather than writing or giving spoken answers
  • Oral: assessment which involves giving spoken answers rather than writing or doing something practical


6. Foundation (theoretical – mixed – practical).

Foundation is the basis, the premise, and the groundwork required for the assessment. It correlates with the type of unit (whether academic or experiential), and programmes should usually have a mix of foundation across their assessment strategy. Examples:

  • Theoretical: the assessment requires reading, studying, and academic research (journal articles, literature reviews, argumentative essays, theoretical critiques)
  • Mixed: a combination of academic research and application (contextualisation) to real-world examples and cases (live client briefs, case study methods, portfolios of evidence, market research).

Practical: The assessment requires hands-on experience and experimentation (empirical testing, laboratory testing, performance).

7. Criticality and reflectivity (analytical – evaluative – reflective).

This parameter looks at whether the assessment requires interpretation of the evidence and source material, how students have used that information to demonstrate their understanding, and their subsequent position on the topic. Examples:

  • Analytical: the assessment requires engaging in academic debates and research happening in the subject area, and a comparison of viewpoints.
  • Evaluative: the assessment requires a synthesis of academic debates and research and a nuanced exposition of the relative importance of viewpoints and their implications.
  • Reflective: the assessment requires students to make connections between experience and prior learning, reflect on how and why their understanding may have changed, refer to relevant theoretical framework to support their arguments


8. Continuity (YES or NO).

Continuity looks at the connection between an assessment and the line of development/the assessment strategy in the programme. Examples:

  • YES: the assessment is connected to the assessment in previous modules in the programme requiring previous knowledge, or the module has pre-requisites on which the current assessment can build
  • NO: the assessment is not related to previous modules in the programme, does not require previous knowledge, the module is standalone with no pre-requisites on which the current assessment can build



9. Groupwork (YES or NO)

Groupwork looks at whether the assessment requires students to prepare the submission and be assessed individually or in groups. Examples:

  • YES: the students must collaborate, produce the assessment as a group, and the work is assessed as a group and/or with some individual component
  • NO: the students can collaborate, but the work must be produce individually and is assessed individually


WHAT EDUCATORS SAY

Quote Icon. Quote Symbol and Mark Quote. Quote Marks Outline Vector.

Students are very positive about the redesigned multidimensional assessment and it definitely ​has increased their engagement. It has allowed them to focus more on learning the concepts ​and how to apply them in their work. The assessment prompted them to use different skills ​across the different dimensions, and so they had ‘second chances’ to prove themselves. Tutors ​could also see clearer highlights: what a particular student is good at, what they need ​improvement on, and we could really lean into their thinking process and assess that rather ​than just the output. Adding these layers to the assessment, like using peer feedback, voice ​presentation, reflection, meant we could clearly see the individualised learning articulation ​points, and in between those learning points students could use AI confidently in a way that we ​could monitor and encourage as well.

APPRENTICESHIPS MODULE LEADER

TRIALED THE FRAMEWORK IN JULY 2024

Quote Icon. Quote Symbol and Mark Quote. Quote Marks Outline Vector.

Your session totally supercharges our plans for the next academic year. So, can't smile wide enough and many congratulations on such amazing work!

WORKSHOP participant

March 2024

Quote Icon. Quote Symbol and Mark Quote. Quote Marks Outline Vector.

I really like that it is proactive rather than reactive approach, [as this guidance] considers 'collaboration' with AI as a valuable skill, and does not have the language that is 'blaming' or 'threatening' but encouraging with caution. [...] When used in a 'collaborative' way, it will actually allow students to engage with it with their knowledge and prompt in a smarter manner.”

WORKSHOP participant

March 2024

Quote Icon. Quote Symbol and Mark Quote. Quote Marks Outline Vector.

I feel reassured. It makes complete sense to not expel or work against AI, but rather embrace what it can do and tailor its usages into transferable skills for our students to use when they reach industry.


DEPARTMENTAL EDUCATION DIRECTOR

APRIL 2024

Quote Icon. Quote Symbol and Mark Quote. Quote Marks Outline Vector.

This was the first session on AI where I’ve actually learned something I can put in practice tomorrow. Looking forward to get started!

LEED Conference participant, June 2024

Carmen dorobat

Associate Professor

c.dorobat@mmu.ac.uk

CONTACT US

ANDREW LARNER

JACK SUTHERST

Digital Education Specialist

a.larner@mmu.ac.uk

Digital Education Specialist

j.sutherst@mmu.ac.uk

Sarah UNDERWOOD

Professor

s.underwood@mmu.ac.uk