Your tasks will involve writing adversarial prompts to identify weaknesses in various cutting-edge AI models, including Large Language Models (LLMs), Text-to-Image, Text-to-Video, Multi-Modal models, AI Agents and beyond. You’ll also manage and analyze datasets to ensure the generation of high-quality outputs and actionable insights that contribute to AI safety research.
Key Responsibilities
• Design adversarial prompts to test AI systems across multiple modalities.
• Identify, categorize, and document model weaknesses or unsafe outputs.
• Support data annotation, curation, and quality control processes.
• Summarize findings into structured reports or data templates.
Requirements
• Proven experience with Generative AI models is essential, though direct technical experience is not a prerequisite.
• Understanding of risk taxonomies (e.g., harm categories, policy tiers).
• Command of English at a near-native level.
• Attention to detail, organizational capabilities
• Ability to manage multiple tasks simultaneously and meet deadlines.
Additional Wants:
• Familiarity with various model types (Text-to-Text, Text-to-Image) is desirable.
• Experience with prompt injection techniques, jailbreaks and red-teaming techniques.
• Prior work in model evaluation,prompt engineering, or safety analysis.
• Regional expertise or cultural fluency in specific geopolitical areas.
About ActiveFence
null
Apply Now
Apply Now